Statistic filling
Mean filling
Use the mean value to fill the variable "Age" of Titanic data.
A |
|
1 |
=file("D://titanic.csv").import@qtc() |
2 |
=A1.avg(Age) |
3 |
=A1.run(Age=if(!Age,A2,Age)) |
Before filling After filling
Automatic filling according to the data type of the variable
Using the mean to fill requires that the data type must be quantitative, and if it is not quantitative data, it needs to be filled with other statistics. For example, for integer variables, either the mean or the median can be used; For floating-point variables, only the mean can be used; For character variables, the mode is generally used to fill. Conveniently, the A.impute()and P.impute() functions are provided in SPL which can automatically select different statistics to fill in the missing values, depending on the data type.
A |
|
1 |
=file("D://titanic.csv").import@qtc() |
2 |
=A1.impute@N("Age") |
3 |
=A1.fname() |
4 |
=A3.(A1.impute@c(~)) |
A2 Fill the variable "Age", return the fill result and fill record Rec, @N indicates that the variable type is a number
The impute()function performs either mode fill or mean fill or impute the missing value into a new class depending on the variable type specified. When the variable type is not specified, impute() will automatically detects the variable type to fill in.
A3 Get the field names of A1
A4 Automatically fill all the fields. @c indicates that the original data is modified to the filling data, and there is no missing value in table A1.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL