Equi-frequency binning
The variable values are arranged in the order of small to large. According to the number of samples in the dataset, they are equally divided into k parts. Each part is treated as a bin. For example, if the number of bins is 10, each bin contains about 10% of the samples.
Equi-frequency binning the “Fare” variable
A |
|
1 |
=file("D://titanic.csv").import@qtc() |
2 |
=A1.ranks(Fare) |
3 |
3 |
4 |
=ceil(A1.len()/A3) |
5 |
=A3.(~*A4) |
6 |
=A1.derive(if(A2(#)<A5(1),"low",if(A2(#)>=A5(2),"hign","middle")):Fare_equifre_binning) |
A2 Sort “Fare”, return the sorted position
A3 Set the number of bin
A4 Calculate the frequency of each bin
A5 Calculate the frequency boundary value for each bin
A6 Binning Fare according to the position sorted
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL