Select variables using correlation coefficients
Correlation coefficient is a statistic that measures the degree of correlation between two variables. Pearson correlation coefficient and Spearman correlation coefficient are commonly used, and their values are both distributed between [-1,1]. When the value is 0, the two variables are not correlated; when the value is 1 or -1, it indicates that the two variables are completely positively correlated or negatively correlated. The greater the absolute value, the stronger the correlation between the two variables
For example, for variables in credit card data, correlation coefficient method is used to select variables, and the absolute value of Pearson or Spearman is greater than 0.5.
A |
B |
C |
|
1 |
=file("D://test//creditcard_b.csv").import@tc() |
||
2 |
=A1.fname() |
||
3 |
=A2.delete(A2.pos("Class")) |
||
4 |
for A2 |
=pearson(A1.(${A4}),A1.(Class)) |
|
5 |
=spearman(A1.(${A4}),A1.(Class)) |
||
6 |
>B1=B1|[A4|B4|B5] |
||
7 |
=if(abs(B4)>0.5 || abs(B5)>0.5,A4) |
||
8 |
>C1=C1|B7 |
A2-A3 Get the field name except for the target variable
A4-B8 All independent variables were looped, and the correlation coefficients between them and target variables were calculated and stored in B1, and variable names with pearson or spearman correlation coefficient greater than 0.5 were screened out and stored in C1.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL