"Objective:To compare the automatic modeling effects of Weka, Rapidminer, and Ymodel Data to be u .."

Lina RaqForum 43 No.
1 Reply • 509 View • 1 Years ago

Comparison of Automatic Modeling Effects of Ymodel, Weka, Rapidminer

Objective:To compare the automatic modeling effects of Weka, Rapidminer, and Ymodel

Data to be used:5 pieces of data in total, 3 pieces of classification, and 2 pieces of regression

2 classic Kaggle cases and 3 real business data

Titanic Data	Classification	Kaggle
House Price Prediction	Regression	Kaggle
Credit Company User Overdue Prediction	Classification
Claims prediction of insurance company policies	Classification
Second-hand car transaction price prediction	Regression

Due to the limited data size of Rapidminer's free version of 10000 items, three real business data was sampled, with sample sizes controlled within a few thousand items. It is not possible to conduct large data volume testing.

Product introduction: Weka is open source, and the automatic modeling function is an extension module of Weka, which is free to use. Rapidminer is a commercial software. Although it has a free version, the auto model function will be charged.

Overall user experience:Ymodel has the fastest modeling speed. Rapidminer is relatively fast in model building, and when there are many variables, the modeling time increases significantly. Weka modeling requires setting the modeling time beforehand, and the modeling speed is also relatively slow. In Weka, sometimes it is necessary to manually handle some variable types in order to be recognized by automatic modeling. In terms of automatic modeling functionality, Weka's experience is relatively poor.

Testing method:All data is divided into a training set and a prediction set, and the prediction results are exported and scored uniformly.

Test results:

1. Titanic Survival Prediction - Classification

Training data: 802 items, 12 variables

The ratio of positive and negative samples is approximately 3:5

	Weka	Rapidminer	Ymodel
Accuracy	0.722	0.787	0.775
Precision	0.862	0.809	0.857
Recall	0.556	0.756	0.667
Specificity	0.909	0.818	0.886
F1	0.676	0.782	0.75
AUC		0.793	0.847
Ranking	3	2	1

It is unable to output probability values in Weka (or possibly not finding how to output), therefore unable to calculate AUC.

2. House Price Prediction - Regression

	Weka	Rapidminer	Ymodel
Mse	4.17E8	1.41E9	9.85E8
Rmse	20430	37539	31385
Mae	14164	19459	16378
Mape	9.108	11.317	9.921
R2	0.889	0.755	0.829
Ranking	1	3	2

3. Credit Company User Overdue Prediction - Classification

Training data: 8938 items, 56 variables

The ratio of positive and negative samples is approximately 1:8

	Weka	Rapidminer	Ymodel
Accuracy	0.878	0.880	0.804
Precision	-	0.471	0.281
Recall	0	0.063	0.409
Specificity	1	0.99	0.858
F1	-	0.111	0.333
AUC		0.729	0.742
Ranking	3	2	1

On this data, the Weka model failed and did not capture any positive sample.

4. Claims prediction of insurance company policies - classification

Training data: 3470 items, 29 variables

The ratio of positive and negative samples is approximately 1:7

	Weka	Rapidminer	Ymodel
Accuracy	0.905	0.949	0.882
Precision	0.051	0.033	0.022
Recall	0.264	0.069	0.139
Specificity	0.916	0.965	0.895
F1	0.086	0.045	0.038
AUC		0.642	0.638
Ranking	1	2	3

5. Second-hand car transaction price prediction

	Weka	Rapidminer	Ymodel
Mse	2779927	8466716	9429967
Rmse	1667	2910	3070
Mae	835	1580	1537
Mape	27	75	54
R2	0.941	0.821	0.801
Ranking	1	2	3

Overall evaluation:Among the 5 data samples used in this testing, the rankings vary depending on the data, but the difference in indexes is not significant, and the overall performance of Ymodel is quite good. In comparison, Weka performs well in regression model, Ymodel performs well in classification model, and Rapidminer is in the middle.

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

YModel