Lending Club Investing Strategy Duel
In the last post of this series (Improving Investment Returns Using Default Prediction) we saw how a very simple strategy (investing in every high interest rate loan) combined with our default prediction algorithm yields pretty good returns. But how does this compare to a more sophisticated strategy guided by common sense and manual tuning? This post will investigate one specific example.
The Challenger
A quick search of P2P Lending sites uncovered several reasonably good looking strategies to use as a benchmark. The site sociallending.net, a highly respected source of information on Peer-to-Peer lending has some of the best. In the post How I Invest, two strategies are proposed for Lending Club investors, one conservative (B1-C5 grade loans) and one aggressive (D1-G5 grade loans). Both strategies are the result of a significant amount of historical analysis and guide the authors' investment decisions. They make sense intuitively and are reportedly producing good results. This seems like a suitable benchmark to gauge the machine learned default prediction algorithm.
For our purposes here, we'll use the aggressive strategy since we are interested in achieving the highest possible returns. The strategy is summarized below.
| Credit Grade | Inquiries Last6mo | Open Credit Lines | Total Credit Lines | Delinquincies Last 2yrs | Public Records On File | State | Loan Purpose | Employment Length |
| D1-G5 | Max 0 | Min 6 | Min 15 | Max 2 | Max 0 | exclude CA | credit card, debt consolidation, educational, home improvement, house, moving, renewable energy, vacation, wedding | 10+ years, 5 years, 6 years, 7 years, 8 years, 9 years, 4 years |
Performance Comparison
Replicating and testing this strategy using the tools here at Smart Peer Lending is relatively simple. Running a search with all of these parameters in our Loan Search Tool and setting Age Min = 75% yields a Projected ROI of 6.01%. To see the results yourself, first sign in and the click the following link. Social Lending Aggressive Loan Search.
By comparison, simply investing in all D1-G5 loans that passed our default detection algorithm yields an ROI of 6.89%. Click the following link to run the search (Credit Grade D1-G5, w/Default Protection Search)
This is a clear improvement and evidence that our machine learning approach has captured many loan attributes which predispose it to default better than the best manually created filters. It has the added benefit of automatically learning from past data and alleviating the need for trial and error exploration of parameters that achieve the best returns. We can also easily retrain the algorithm as often as we want and it will adjust for changing conditions.
Combining Both Methods
Both filters produce relatively good returns. But can the combination do even better? Starting with all the loans that passed the SocialLending.Net Aggressive filter and then removing those our algorithm predicted would default results in a whopping 9.93% ROI. Click the following link to run the search (Social Lending Aggressive w/Default Protection Search)
That's a 65% improvement on the original filter. Clearly both of these filters are picking up different signals in the data and by combining them we are able to achieve a superior ROI. This is a pretty common effect among classifiers. Combining multiple models often produce better returns than any single model.
The table below summarizes our results.
| Strategy | ROI Age>=75% |
| SocialLending.net Aggressive | 6.01% |
| Naive Low Credit Grade w/Default Protection | 6.89% |
| SocialLending.net Aggressive w/Default Protection | 9.93% |
Used on its own, our default prediction algorithm has been shown to perform better than one of the best publicly available strategies today. But used together the combined effect creates one of the best returns I've seen.
Your filter has a total result of 51 loans, which is not enough loans to say anything substantial about the results. To give another example, I'm currently working on a genetic algorithm filter to optimize Lending Club filters. In my first few trial runs, I was getting results of 11+% ROI per year, but <100 total loans out of the data set qualified for the filter. The algorithm was just optimizing on loans that happened to pay off by chance by very selectively filtering out single loans that defaulted. In order to counter this, you need to ensure that you have a minimum number of loans that meet the filter requirements. I try to have more than 500, personally.
Very interesting. So, now you have me even more intrigued. I agree that my criteria can be certainly improved upon and that looks like you have done that. I am interested to find out the details.
It would be useful to see the number of loans and how many default protection removed from the original populate of loans.
@Michael, the original Social Lending Aggressive filter returns 82 loans. Subtracting those predicted to default from that set results in 51 loans. By comparison the "Naive Low Credit Grade w/Default Protection" returns 1105 loans.
@Brett, I agree that 51 loans is a pretty small sample. Given that the original filter only returns 82 loans, we were starting with an already limited pool. I have run the necessary tests (chi-squared) to verify that the default prediction model results are statistically significant (95% confidence). My main goal in this post was to show that it was able to pull out the most damaging loans resulting in a much higher ROI.
Mike, If you are only looking at loans that are at least 75% complete you are ignoring the vast majority of loans on the platform. Is there a reason why you chose that 75% number? I would never have based on investment decision on a pool of just 82 loans.
@Peter, statements can be made about the performance of a strategy using any age threshold. Some people use every loan issued (age > 0%). But this is misleading because it mostly ignores the effect of default on huge numbers of newly issued loans. For this comparison, I selected 75% because most loans that are going to default do so before this point in their life. And the purpose of the exercise was to see the performance benefits of using an algorithm to filter out the defaults. Which it does. If I had run the same test on newer loans we would see the same benefits. But the difference in performance is smaller because there were fewer defaults to detect.