"Selecting data by segment refers to finding the data in a group according to the specified inter .."

pjf RaqForum 22 No.
1 Reply • 441 View • 3 Years ago

SPL: selecting data by segment

Selecting data by segment refers to finding the data in a group according to the specified interval. For example, return to assessments (excellent, good, medium, poor) according to the test results; return to the levels of assets (poor, well-off, middle-class, rich, etc.) according to the annual income of the family.

1. The segment number of a member in sequence

We first define an interval sequence for the segment. For example, we define the age sequence of [0,18,45,60], which represents minors, youth, middle-ages, and old-ages respectively. Next by calculating the sequence number of the value of the age field in the sequence of the age range, we can get what age range the person is in.

The A.pseg() function is provided in SPL, which returns the segment number of a member in sequence，and the default interval is left-open and right-closed. For example, the 17-year-old belongs to the age range of [0,18), so the sequence number returned by the A.pseg()function is 1; whereas the 18-year-old belongs to the age range of [18,45), then the sequence number returned by the A.pseg()function is 2. The A.pseg() function provides the @r option to use left-open and right-closed intervals when calculating the segment numbers. Take the same case for an example, 18-year-old is assigned to the interval of (0,18] by A.pseg() function,so the sequence number returned is 1.

In the case of employees, some of the data are as follows:

ID	NAME	ENTRYDATE	STATE	DEPT	SALARY
1	Rebecca	2005/03/11	California	R&D	7000
2	Ashley	2008/03/16	New York	Finance	11000
3	Rachel	2010/12/01	New Mexico	Sales	9000
4	Emily	2006/08/15	Texas	HR	7000
5	Ashley	2004/07/30	Texas	R&D	16000
…	…	…	…	…	…

[e.g. 1] Divide the employees into groups of less than 8,000, form 8,000 to 12,000 and more than 12,000 according to their salaries, and count the number of employees in each group.

The SPL script looks like this:

	A
1	=T("Employee.csv")
2	[0,8000,12000]
3	=A1.align@a(A2.len(),A2.pseg(SALARY))
4	=A3.new(A2(#):SALARY,~.count():COUNT)

A1: import employee table.

A2: define the intervals of salaries.

A3: use the A.pseg() function to get the segment numbers of the salaries, and then group them according to the numbers

A4: calculate the number of people in each group.

[e.g. 2] Divide the employees into three segments in terms of their working years, 1 to 10 years, 10 to 20 years and 20 years respectively, and count the average salary of each group.

The SPL script looks like this:

	A
1	=T("Employee.csv")
2	[0,10,20]
3	=A1.align@a(A2.len(),A2.pseg@r(interval@y(ENTRYDATE,now())))
4	=A3.new(A2(#):EntryYears,~.avg(SALARY):AvgSalary)

A1: import employee table.

A2: define the intervals of working years.

A3: use the A.pseg() function to get the segment numbers of the working years, where the @r option represents the left-open and right-closed interval. And then group them by the numbers.

A4: calculate the average salary of each group.

2. Returning the corresponding member of a sequence based on the segment number

Sometimes we need to get the corresponding member in the set by calculating the sequence number of the field value (or expression) in the segment.

The A.segp() function is provided in SPL, which returns the corresponding member in the sequence based on the segment number.

[e.g. 3] According to the score table, count the number of excellent, pass and fail students in the English subject. Some of the data are as follows:

Class	StudentID	Subject	Score
1	1	English	95
1	1	Math	90
1	1	PE	80
1	2	English	75
1	2	Math	84
…	…	…	…

The SPL script looks like this:

	A
1	=T("Scores.csv").select(SUBJECT:"English")
2	=create(Assessment,Score).record(["Fail",0,"Pass",60,"Excellent",90])
3	=A1.derive(A2.segp(Score,SCORE).Assessment:Assessment)
4	=A3.groups(Assessment;count(1):Count)

A1: import the score table and select the data of the English subject.

A2: create a check table of score and assessment:

Assessment	Score
Fail	0
Pass	60
Excellent	90

A3: use the A.segp() function to obtain the corresponding assessment according to the segment numbers of the score in the check table.

A4: group and calculate the number by assessment.

Employee.csv

Scores.csv

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/cFTcUNs7

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc

pjf • 441 View • 3 Years ago

SPL: selecting data by segment

1. The segment number of a member in sequence

2. Returning the corresponding member of a sequence based on the segment number

ToC