Wednesday, August 29, 2007

Accessing Prosper's Performance Data Part 2

This is the second post in a series on accessing Prosper's Marketplace Performance engine. Disclaimer: If you're not at least a bit geeky, this will bore you to tears. That's the danger of reading a blog run by an engineer. This post, specifically, covers non-financial topics like programming. You've been warned.

In the last post, I outlined how to get specific performance information from Prosper and documented some parameters that can be accessed. Now to suggest a use for this. Imagine that you want to trace the default history for loans from January, 2006, and see how they've aged over time. This can be done by modifying the origination start date, origination end date, and the observation date. We set the origination start (osr) to 01%2f01%2f06, or 01/01/2006, and the origination end (oer) to 01%2f31%2f06, or 01/31/06. To collect all the data, sweep the observation date from 02/01/06 (02%2f01%2f06) to 08/01/07 (08%2f01%2f07). Remember, the observation date says "pretend it's this date when looking at the data". From here, read the data into Excel, OpenOffice, or your favorite spreadsheet, and process away. Sounds difficult? It's not, and here's how to do it.

To collect all the various sets of data from Prosper, use a tool that'll automate downloading web pages, like cURL. It lets you specify the URL and the saved file. If you're running Linux (like we here at Prosperous Land do), it can be accomplished with a bash script looking something like this:

curl -o od_feb06.html http://www.prosper.com/lend/performance.aspx?osr=01%2f01%2f06&oer=01%2f31%2f06&od=02%2f01%2f06
curl -o od_mar06.html http://www.prosper.com/lend/performance.aspx?osr=01%2f01%2f06&oer=01%2f31%2f06&od=03%2f01%2f06
curl -o od_apr06.html http://www.prosper.com/lend/performance.aspx?osr=01%2f01%2f06&oer=01%2f31%2f06&od=04%2f01%2f06

And so on until you've collected all the necessary dates. If you're creative, a script could automate most of this. And, to convert the html files into comma separated values (.csv) files that can be imported into any spreadsheet, the a Python program will do the trick. It will extract the performance information from the html files. If we call the program table_extract.py, use it as 'table_extract.py mar06.html > mar06.csv'

From here, you can munge and process the data like a statistical fiend. Just remember to be kind to Prosper's server and not request searches too quickly.

The next installment will wander away from the programming realm and cover even more parameters that can be searched. Stay tuned!

No comments: