Comparison of Automatic Modeling Effects of Ymodel, Weka, Rapidminer
Objective:To compare the automatic modeling effects of Weka, Rapidminer, and Ymodel
Data to be used:5 pieces of data in total, 3 pieces of classification, and 2 pieces of regression
2 classic Kaggle cases and 3 real business data
Titanic Data |
Classification |
Kaggle |
House Price Prediction |
Regression |
Kaggle |
Credit Company User Overdue Prediction |
Classification |
|
Claims prediction of insurance company policies |
Classification |
|
Second-hand car transaction price prediction |
Regression |
Due to the limited data size of Rapidminer's free version of 10000 items, three real business data was sampled, with sample sizes controlled within a few thousand items. It is not possible to conduct large data volume testing.
Product introduction: Weka is open source, and the automatic modeling function is an extension module of Weka, which is free to use. Rapidminer is a commercial software. Although it has a free version, the auto model function will be charged.
Overall user experience:Ymodel has the fastest modeling speed. Rapidminer is relatively fast in model building, and when there are many variables, the modeling time increases significantly. Weka modeling requires setting the modeling time beforehand, and the modeling speed is also relatively slow. In Weka, sometimes it is necessary to manually handle some variable types in order to be recognized by automatic modeling. In terms of automatic modeling functionality, Weka's experience is relatively poor.
Testing method:All data is divided into a training set and a prediction set, and the prediction results are exported and scored uniformly.
Test results:
1. Titanic Survival Prediction - Classification
Training data: 802 items, 12 variables
The ratio of positive and negative samples is approximately 3:5
Weka |
Rapidminer |
Ymodel |
|
Accuracy |
0.722 |
0.787 |
0.775 |
Precision |
0.862 |
0.809 |
0.857 |
Recall |
0.556 |
0.756 |
0.667 |
Specificity |
0.909 |
0.818 |
0.886 |
F1 |
0.676 |
0.782 |
0.75 |
AUC |
0.793 |
0.847 |
|
Ranking |
3 |
2 |
1 |
It is unable to output probability values in Weka (or possibly not finding how to output), therefore unable to calculate AUC.
2. House Price Prediction - Regression
Weka |
Rapidminer |
Ymodel |
|
Mse |
4.17E8 |
1.41E9 |
9.85E8 |
Rmse |
20430 |
37539 |
31385 |
Mae |
14164 |
19459 |
16378 |
Mape |
9.108 |
11.317 |
9.921 |
R2 |
0.889 |
0.755 |
0.829 |
Ranking |
1 |
3 |
2 |
3. Credit Company User Overdue Prediction - Classification
Training data: 8938 items, 56 variables
The ratio of positive and negative samples is approximately 1:8
Weka |
Rapidminer |
Ymodel |
|
Accuracy |
0.878 |
0.880 |
0.804 |
Precision |
- |
0.471 |
0.281 |
Recall |
0 |
0.063 |
0.409 |
Specificity |
1 |
0.99 |
0.858 |
F1 |
- |
0.111 |
0.333 |
AUC |
0.729 |
0.742 |
|
Ranking |
3 |
2 |
1 |
On this data, the Weka model failed and did not capture any positive sample.
4. Claims prediction of insurance company policies - classification
Training data: 3470 items, 29 variables
The ratio of positive and negative samples is approximately 1:7
Weka |
Rapidminer |
Ymodel |
|
Accuracy |
0.905 |
0.949 |
0.882 |
Precision |
0.051 |
0.033 |
0.022 |
Recall |
0.264 |
0.069 |
0.139 |
Specificity |
0.916 |
0.965 |
0.895 |
F1 |
0.086 |
0.045 |
0.038 |
AUC |
0.642 |
0.638 |
|
Ranking |
1 |
2 |
3 |
5. Second-hand car transaction price prediction
Weka |
Rapidminer |
Ymodel |
|
Mse |
2779927 |
8466716 |
9429967 |
Rmse |
1667 |
2910 |
3070 |
Mae |
835 |
1580 |
1537 |
Mape |
27 |
75 |
54 |
R2 |
0.941 |
0.821 |
0.801 |
Ranking |
1 |
2 |
3 |
Overall evaluation:Among the 5 data samples used in this testing, the rankings vary depending on the data, but the difference in indexes is not significant, and the overall performance of Ymodel is quite good. In comparison, Weka performs well in regression model, Ymodel performs well in classification model, and Rapidminer is in the middle.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version