A white paper reviewing the online marketplace lending industry was published by the U.S. Treasury Department.
The white paper titled, Opportunities and Challenges in Online Marketplace Lending contains an important point for formulative and causal research:
- Use of Data and Modeling Techniques for Underwriting is an Innovation and a Risk.
In this essay, mainly we explore the cause and effect relationship between the most important variables in a online marketplace lending (the variables will be defined in the sequel) using two approaches:
- Experimentation using the published data by Lending Club platform.
- Simulation using our developed mathematical model that represents the key characteristic elements of online marketplace lending platform.
Introduction to Lending Club Datasets:
The data provided by Lending Club platform is very useful for a detailed descriptive statistics and to do more elaborated machine learning statistics. Many authors used these datasets to report specific points about the online marketplace lending platforms, see for instance this recent work as a follow up to this work.. Most of these studies nailed down the story by checking all the factors explaining loan default.
In fact, these datasets from Lending Club platform have some characteristic features that need to be clarified, explained and interpreted before to consider the datasets suitable to practice machine learning statistics.
In general, data-driven decision-making is considered the best way to go, but it can be costly when some elements that appear to be true are not totally true. Even with the best of intentions, data collection is challenged by the used business model, i.e. if the data is biased then any statistical process may ended up with a skewed results. Inconsistent data analysis leads to biased conclusions and may be too bad for business outcomes.
Characteristic Features in Lending Club Datasets:
Let's first explain the most important equation used to calculate the monthly installment payment:
Ip = Principal*(1+Interest)Term *Interest / [(1+Interest)Term-1]
where
- Ip is represented in the data as installment and defined as the monthly payment owed by the borrower if the loan originates.
- Principal is represented in the data as loan_amnt and defined as the listed amount of the loan applied for by the borrower.
- Interest is represented in the data as int_rate and defined as the interest rate on the loan. Note that as the the data reports the annual interest as percentage, for any calculation the interest should be divided by 1200 to reflect the monthly used in the equation above.
- Term is the number of payments on the loan. Values are in months and can be either 36 or 60.
We consider this equation the first fundamental principle in the world of lending.
For instance from the dataset in the file="LoanStats_2016Q3.csv", the borrower with id 88804313, member_id = 95147116, loan_amnt = 3800 US$, term = 36, int_rate 12.79% has installment =127.66 US$.
Comparing our calculation with this installment number is a straightforward process.
Magical Business Model: Pay off your loan in full at any time without prepayment fees!
Lending club uses a very interesting principle to make it easy for borrowers. A borrower can make extra payments or pay off his loan in full at any time. Lending club never charges any fees or penalties for making additional payments or paying the loan off early.
Using the feature loan_status which is the current status of the loan, borrowers paid early their loans are designed with
"Fully Paid" as loan_status.
Moreover, Lending Club showed a "substantive fairness", the monthly installment payment was recalculated using only the term between
issue_d, the month which the loan was funded, and last_pymnt_d, Last month payment was received. The borrower is enabled to save more money if he is willing to pay early his loan. This process is fair and acceptable. However, the lender(investor) is to some extend the main loser in this process.
More early fully paid loans means more money for Lending Club business, the organization charges a percentage on repaid principal as service fee.
The most important questions to examine are
-
Q1: Who are these borrowers paying early their loans?
-
Q2: What is the cause( or reason)?
-
Q3: How these borrowers will benefits from this process and the subsequent ones?
Lets first discuss questions Q2 and Q3.
If we assume that a borrower can succeed to pay his next monthly installment payment with a probability, p, the expected number of monthly installment payments payable consecutively is equal to:
Pc = 1/(1-p)
.
We consider this equation the second fundamental principle in the world of lending.
The derivations of the two equations Ip, and Pc are full of "mathematical fun". The equation Ip is well-established formula in Banking Sector. However, the equation Pc is less known or used. I derive it using simple geometric series rules.
Let's assume p = 0.5, not sure to pay next monthly installment payment, the borrower will only pay two consecutive periods on average.
Let's assume p = 0.9, highly sure to pay next monthly installment payment, the borrower will pay 10 consecutive periods on average.
Let's assume p = 1, the perfect borrower desired by any lender in the world of lending.
Professionals designing this new generation of online marketplace lending are aware about these two equations and are the backbone
of their business models. The equations Ip, and Pc can be considered as the modern lending mechanics.
Specially for online marketplace lending, the formulated installment payment and the associated probability to pay during the next period would be constrained by the financial resources that the borrowers have at their disposal. The borrowers may not necessary have large enough resources and are under non-negligible financial constraints. It is reasonable to assume that as the borrower is relatively risk-neutral as the loan are not backed by nothing from the borrower except his provided information, the investor may be more risk-averse even she/he is often richer or better diversified than the borrowers.
The magical business model: pay off your loan in full at any time without prepayment fees is an important condition for the short-term stability of the platform.
Let's consider the case that our borrower is Grade G. These kind of borrowers are subject to
higher interest rates, see Rate information.
His probability to pay also the next monthly installment payment may be very small and his Pc may not surpass one year!
Our borrower will use the magical business model: pay off your loan in full at any time without prepayment fees.
After a number of periods paying his monthly installment payment in time, let say 6 months, our borrower is now having an enhanced credit score. Our borrower will look for a second loan from the same platform with a lower interest rate to pay his first loan which is having a higher interest rate to handle for a long period of time. Great possibility to have this second loan as our borrower is more appreciated or trusted by the platform as she/he was paying his debt in time for 6 months in a row.
Using Numbers, for illustrative purposes
His first loan is: loan_amnt = 5000 US$, int_rate=30%, initial term = 60 months. The monthly installment payment is 161.77 US$, over 60 months, the borrower has to pay 9706.10 US$ for principal plus interest.
The borrower will consider the Fully paid option after 6 months. The borrower has to pay now only this amount 5446.49 US$ to exit his first loan.
Lending Club will have a fee of 1% of 5446.49 US$ which is 54.46 US$ from the investors.
The borrower succeeds to have a second loan to pay the first one, loan_amnt = 5446.49 US$ , int_rate = 15%, term = 60 months.
The borrower will try his best to pay it fully in 54 months, to make to comparison with the first loan more concise:
-
Cost of the first loan with principal + interest = 9706.10 US$. Lending Club fee is = 97.10 US$
-
Cost of the second loan with principal + interest = 7522.653 US$. Lending Club fee is = 75.22 US$
-
Our borrower will save : 2183.366 US$.
Lending Club is making ~32.63 US$ more from investors, 54.46 + 75.22 - 97.10. In fact the Lending Club platform is mainly helping the borrower not the investors but note that the investors operating on on-line marketplace lending are better diversified than the borrowers. An investor can set up a profile in less than a day and invest as little as $25 per loan. Once invested in a loan, investors usually start receiving payments within 30 days. The investor forms portfolio with hundreds of loans(borrowers) where each loan(borrower) is a fraction of the total portfolio. These diversified portfolios give the greatest return and minimize risk.
The magical business model: pay off your loan in full at any time without prepayment fees, is a great asset to make the borrower and the platform more rich. Nevertheless, this business model has an important point, it helps to prevent often defaults, or at least to minimize the default rates making the platform more stable and running smoothly for the best interest of investors too, "paid less not nothing at all".
Any borrower can follow this process for her/his best interest to minimize the financial effect of bearing the burden of loan.
This is why using this kind of collected data is not suitable to practice machine learning statistics to nail down the story by checking all the factors explaining loan default, the provided default rates are "artificially" corrected by the business model for the best interest of all parties.
Now to answer the first question,
We will analyze the provided datasets by Lending Club and just compute simple statistics using the feature loan_status which is the current status of the loan, borrowers paid early their loans are designed with "Fully Paid" as loan_status. The proportion of grades populating the feature grade which is the LC assigned loan grade provides the answer.
LoanStats_2016Q1 |
7 |
8 |
9 |
10 |
10 |
10 |
11 |
LoanStats_2016Q2 |
5 |
5 |
6 |
7 |
7 |
9 |
7 |
LoanStats_2016Q3 |
3 |
3 |
3 |
4 |
5 |
5 |
6 |
LoanStats3c |
44 |
42 |
37 |
34 |
31 |
32 |
30 |
LoanStats3d |
18 |
18 |
18 |
18 |
18 |
19 |
22 |
Key questions now :
-
Is this a sustainable business model process?
-
Any strategic improvement of this process (short-term stability of the platform) in place in the event of an economic downturn, for instance?
-
What are the emergent characteristics or behaviors generated by this business model process on the long-term stability of the platform?
To answer these key questions, we need more mathematical methods and numerical tools to study the sustainability of this process, The continuation is in Modelling Online Marketplace Lending Platform: Applied Mathematical Modelling.