|
To measure the users' credit risk of Internet finance, it is defined as the probability that users cannot repay on time. Considering the type of data may be continuous or discrete, the integrated regression estimation model GBDT is used. To adapt the increasing scale of data, this paper proposes a GBDT coupled with SVM model (SVM-GBDT), where SVM is used to select important training data first, and then a GBDT model is trained on the data corresponding to the support vectors of SVM. To test the model's effect, this paper analyzes the credit risk of an Internet financial loan institution's user data, which are offered by the "East Securities Futures Cup" Chinese University Statistical Model Contest. On the test set, the accuracy (A) and harmonic mean (F1) and running time (t) are respectively 0.9427 and 0.970035 and 4.5726 seconds for SVM-GBDT model. Then the SVM-GBDT model are compared with other pure models such as Logistic, SVM, CART, RF, and GBDT models, and the comparing results shows that the SVM-GBDT model has great performance than other models. It's the accuracy (A) and harmonic mean (F1) are slightly higher and the running efficiency are far faster than other five models. This model can help Internet financial companies make loan decisions under the background of big data, and also provide a new practice reference for data mining. |
|
Keywords:Internet finance; credit risk; SVM-GBDT model; model's efficiency |
|