"There are often some variables in the data that are of poor quality or are not meaningful to the .."

Lina RaqForum 43 No.
252 View • 1 Years ago

Variable preliminary filtering

There are often some variables in the data that are of poor quality or are not meaningful to the model. You can define some rules and delete them directly to reduce the amount of computation. Such as:

（1）Variables with high missing rate

（2）Unary variable

（3）Variable with too many categories

	A
1	=file("D://house_prices_train.csv").import@qtc()
2	=A1.fname()
3	=A2.((y=~,A1.align@a([true,false],!eval(y))))
4	=A2.new(~:col,A3(#)(1).len():null_no,round(null_no/A1.len(),3):null_rate,A1.field(#).icount():icount,if(!sum(A1.field(#)),"string","num"):type)
5	=A4.select(null_rate<0.9 && icount>1 && !(type=="string" && icount>100)).(col).concat(",")
6	=A1.new(${A5})