Flag missing information for multiple variables
When the data set contains a large number of variables with missing values, the flagging method of single variable will greatly increase the complexity of the model. In this case, the missing values of each sample on all variables can be identified by establishing only one new variable. Although this method cannot reflect the influence of specific variables, it has little effect on the complexity of the model.
In SPL, A.mvp(T) or P.mvp(cns, T) can integrate multiple MI variables, generate the mvp variable to represent the missing information of the multivariate, and furthermore and automatic preprocessing operations are performed on the mvp variable.
For example, there are 81 variables in the house price data, and the mvp() function is used to mark the missing information for multiple variables
A |
|
1 |
=file("D://house_prices_train.csv").import@qtc() |
2 |
=A1.fname() |
3 |
=A2.(A1.mi(~)) |
4 |
=A3.group(!~) |
5 |
=to(A4(1).len()).("A4(1)("/~/")(#).field(1):MI_"/~).concat@c() |
6 |
=A1.derive(${A5}) |
7 |
=to(A4(1).len()).("\""/"MI_"/~/"\"").concat@c() |
8 |
=A6.mvp([${A7}],A1.(SalePrice)) |
9 |
=A1.derive(A8(1)(#).field(1):mvp) |
A2 Get the filed names
A3 Mark the missing value for each variable. Variables with missing values return MI indicators, and variables without missing values return null.
A4 Divided into two groups based on whether the MI indicator is null
A5-A6 Adds the MI indicators to table A1
A7 Extract all MI indicator field names as input parameters to A8
A8 An mvp variable is generated to represent the missing information of multiple variables, and automatic preprocessing operations are performed on the mvp variable. For example, Pow2 represents power transformation as in the figure.
A9 Add the mvp variable to the modeling data A1
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL