Oversampling
Over sampling is to achieve sample balance by increasing the data amount of small sized samples. Among them, the simpler way is to copy small samples directly to form a quantitative equilibrium.
The Titanic sample data is sampled as follows:
A |
|
1 |
=file("D://titanic.csv").import@qtc() |
2 |
1 |
3 |
=A1.group@p(Survived) |
4 |
=A3.sort(~.len()) |
5 |
=A4(2).len()/A2-A4(1).len() |
6 |
=if(A5>0,A5,0) |
7 |
=A6.(A4(1)(rand(A4(1).len())+1)) |
8 |
=(to(A1.len()))|A7.sort() |
9 |
=A1(A8) |
A5 Calculate the number of samples of the fewer classes that need to be replicated based on the balance ratio
A7 Randomly select the sample to be copied from the small number of samples
A8 Merge the sample location of the original samples and the copied samples
A9 Take the sample of the corresponding position and complete the sampling
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL