Linear regression model filling
Linear regression model filling is a kind of model filling, which takes the variable to be filled as the dependent variable and other variables as the independent variable. The model is trained with the samples corresponding to the non-missing part of the dependent variable, and then the data of the missing part is filled with the model.
For example, fill the Age variable in titanic_impute.csv using a linear regression model
Here we use the linefit() function to build a linear regression model
A |
|
1 |
=file("D://titanic_impute.csv").import@qtc() |
2 |
=A1.select(Age!=null) |
3 |
=A2.array().to(2:) |
4 |
=A3.(~(4)) |
5 |
=A3.(~.delete([4,8])) |
6 |
=linefit(A5,A4) |
7 |
=A1.array().to(2:) |
8 |
=A7.(~.delete([4,8])) |
9 |
=mul(A8,A6).conj() |
10 |
=A1.run(Age=if(Age==null,A9(#),Age)) |
A2 Select samples where the variable Age is not missing
A3 The data is converted to vector form and the title is removed
A4 Take Age as y
A5 Take the required argument x
A6 Using the least square linear fitting function, x and y are fitted, and return the fitting coefficient.
A7 Take the vector form of all the data
A9 Make predictions
A10 Use the predicted value to fill in the missing value. As in the figure, each missing value is filled with a different value
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL