Use Cases of Modeling Tests – Comparison between YModel and Manual Model Building
Case 1
Data set description: 2.9 million rows, 37 columns, a size of 477MB.
Target variable: IsDefaulted.
Test content:
1. Model performance indexes over the test data set: AUC, lift in top 10%, and model attenuation level.
2. Model building duration.
3. Skill requirements.
Test result:
1. Model performance
Modeling type |
AUC for training data set |
AUC for test data set |
Model attenuation |
Lift in top 10% for test data set |
|
Manual model Building |
Model 1 |
1 |
0.973 |
0.027 |
9.22 |
Model 2 |
0.999 |
0.971 |
0.028 |
9.18 |
|
Model 3 |
0.999 |
0.968 |
0.031 |
9.09 |
|
Model 4 |
0.998 |
0.922 |
0.076 |
7.9 |
|
Model 5 |
0.996 |
0.965 |
0.031 |
8.63 |
|
Model 6 |
0.995 |
0.959 |
0.036 |
8.77 |
|
Model 7 |
0.993 |
0.927 |
0.066 |
7.99 |
|
Model 8 |
0.988 |
0.956 |
0.032 |
8.63 |
|
Model 9 |
0.982 |
0.928 |
0.054 |
7.99 |
|
Model 10 |
0.976 |
0.914 |
0.062 |
7.76 |
|
Model 11 |
0.969 |
0.919 |
0.05 |
7.85 |
|
Model 12 |
0.961 |
0.924 |
0.037 |
7.95 |
|
YModel |
0.918 |
0.911 |
0.007 |
8.0 |
Note: Manual model building produces a series of intermediate models (Models 1-12) as a result of model tuning while YModel generates the desired final model directly.
Result explanation:
1) The first several manually-made models have high AUC on training data set. It’s apparently they are overfitting. A more suitable model (model 12) is created after multiple tunings.
2) Compared with YModel, Model 12 has higher AUC on test data set but much higher model attenuation level. So it is overfitting too. YModel has very small model attenuation level and thus will perform better on scoring unknown data.
3) YModel is slightly higher than Model 12 in lift in top 10% on test data set.
Summary: This is a close contest in terms of the above indexes, but YModel has better generalization ability.
2. Model building duration
Manual model building: About three weeks for manual preprocessing and model tuning.
YModel: 13 minutes for automatic preprocessing and model building.
3. Skill requirements
Manual model building: Professional statistical knowledge.
YModel: General knowledge.
Case 2
Task: According to the data of defaulted corporate loans for a bank, predict the probability of default (PD) among micro and small corporate users.
Data set description: 36000 rows, 5500 columns, a size of 453MB; high dimensional and sparse.
Target variable: IsDefaulted.
Test content:
1. Model performance indexes over the test data set: AUC, lift in top 10%, and model attenuation level.
2. Model building duration.
Test result:
YModel |
Manual model building |
|
Model building duration |
17 minutes (data preprocessing & model building) |
2 weeks |
Model number |
1 |
1 |
AUC for training data set |
0.996 |
0.998 |
AUC for test data set |
0.987 |
0.972 |
Lift in top 10% for test data set |
9.8 |
9.6 |
1) YModel has higher AUC and lift and lower attenuation level on test data set.
2) YModel is fast and efficient, even in handling high dimensional data; manual model building is slow and inefficient, particularly complicated in dealing with high dimensional data.
Case 3
Task: Predict claim settlement risk for the insurance company.
Data set description: 1.38 million rows, dozens of columns, a size of 4G; high proportion of missing data and high-cardinality categorical variables.
Target variable: ClaimOccured
Test content:
1. Gini index on test data set.
2. Model building duration.
Test result:
YModel |
Manual model building |
|
Model building duration |
60 minutes (data preprocessing & mode building) |
1 month |
Model performance (Gini) |
0.683 |
0.608 |
Key derived variables |
3 |
- |
1) YModel has higher Gini index on test data set.
2) YModel can automatically handle missing data and high-cardinality categorical variables and auto-generate derived variables. It is much faster and more efficient.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL