"Sometimes we group and sort the data in order of sequence number, grouping members with the same .."

pjf RaqForum 22 No.
1 Reply • 330 View • 3 Years ago

SPL: grouping and sorting aligned by sequence number

Sometimes we group and sort the data in order of sequence number, grouping members with the same sequence number into the same group. For example, calculate the total sales of each month last year in order from January to December, calculate the numbers of accessors to a website according to the order from Monday to Sunday, and so on.

This kind of grouping operation, which is aligned to a specified criterion, is collectively referred to as alignment grouping. Grouping by sequence alignment is a special case of alignment grouping where the base set is a sequence of integers starting from 1. Alignment groups may have empty groups or members that are not assigned to any of the groups.

1. Sorting by sequence number

Sort the data in the specified sequence order, and retain up to one matching member per group. This is suitable for situations where we want to query or use data in a specified order.

[e.g. 1] Query the sales records in order from Monday to Sunday according to the daily sales table. Some of the data are as follows:

Week	Day	Amount
5	Sunday	1101.2
5	Saturday	538.6
5	Friday	2142.4
5	Thursday	1456.0
5	Wednesday	48.0
5	Tuesday	1376.0
5	Monday	676.0
4	Sunday	448.0
4	Saturday	4031.0
4	Friday	364.8
…	…	…

The A.align(n,y) function in SPL is used to align the groups which are directly divided into n groups (from 1 to n), and calculate straight the group number of each member by grouping expression y. By default, each group retains up to one matching member.

The SPL script looks like this:

	A
1	=T("DailySales.csv")
2	["Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"]
3	=A1.group(Week; ~.align(7,A2.pos(Day)):WeekSales)
4	=A3.conj(WeekSales)
5	=A4.select(~)

A1: query the daily sales table.

A2: define the sequence from Monday to Sunday.

A3: when grouping the sales data by week, use the A.align(n,y) function to align sales records of each week in the order defined by A2. It is important to note here that aligned groups may have empty groups. For example, when no sales were recorded on Friday and Saturday of the second week, there are still seven groups per week:

Week

[2,Monday,1194.0]

[2,Tuesday,1622.4]

[2,Wednesday,319.2]

[2,Thursday,802.0]

(null)

[2,Sunday,2123.2]

A4: concatenate the sorted sales records of each week.

A5: select non-empty records from the result set.

The execution results of A5 are as follows:

Week	Day	Amount
1	Monday	3063.0
1	Tuesday	3868.6
1	Wednesday	2713.5
1	Thursday	1005.9
1	Friday	1675.0
1	Saturday	400.0
1	Sunday	2018.2
2	Monday	1194.0
2	Tuesday	1622.4
2	Wednesday	319.2
…	…	…

2. Retaining all matching members of each group

Group the data in a specified sequence order, each retain all matching members. This applies to situations where we care about the information of members in each group, or where we need to continue calculating with these member records.

[e.g. 2] According to the sales table, list the total sales of each month in 2014 in order. Some data of the sales table are as follows:

ID	CustomerID	OrderDate	Amount
10248	VINET	2013/07/04	2440
10249	TOMSP	2013/07/05	1863.4
10250	HANAR	2013/07/08	1813.0
10251	VICTE	2013/07/08	670.8
10252	SUPRD	2013/07/09	3730.0
…	…	…	…

option @a of the A.align(n,y) function in SPL is used to keep all matching members of each group while aligning the groups.

The SPL script looks like this:

	A
1	=T("Sales.csv")
2	=A1.select(year(ORDERDATE)==2014)
3	=A2.align@a(12,month(ORDERDATE))
4	=A3.new(#:Month,~.sum(AMOUNT):AMOUNT)

A1: query the sales table.

A2: select the records of 2014 from the sales table.

A3: use the A.align@a(n,y) function to sort the months of the order table into 12 groups in order from 1 to 12, with the option @a retaining all matching members of each group.

A4: calculate the total sales of each month.

The execution results of A4 are as follows:

Month	Amount
1	66692.8
2	52207.2
3	39979.9
4	60699.39
…	…

3. Overlapped grouping by sequence number

Sometimes the sequence number of groups calculated by each record is more than one, and we want the record to be repeatedly assigned to multiple groups according to the sequence numbers.

[e.g. 3] According to the posting record table, group the posts by label, and count the frequency of each label. Some of the data in the posting record table are as follows:

ID	Title	Author	Label
1	Easy analysis of Excel	2	Excel,ETL,Import,Export
2	Early commute: Easy to pivot excel	3	Excel,Pivot,Python
3	Initial experience of SPL	1	Basics,Introduction
4	Talking about set and reference	4	Set,Reference,Dispersed,SQL
5	Early commute: Better weapon than Python	4	Python,Contrast,Install
…	…	…	…

option @r of the A.align(n,y) function in SPL is used to create overlapped groups by the sequence numbers when aligning groups.

The SPL script looks like this:

	A
1	=T("PostRecord.txt")
2	=A1.conj(Label.split(",")).id()
3	=A1.align@ar(A2.len(),A2.pos(Label.split(",")))
4	=A3.new(A2(#):Label,~.count():Count).sort@z(Count)

A1: query the posting record table.

A2: separate the labels by comma and concatenate them into a sequence to get all the labels with no overlap values.

A3: use the A.align@r(n,y) function to group the posts by their labels’ position in all labels with the option @r.

A4: calculate the number of posts per label,and sort them in descending order.

The execution results of A4 are as follows:

Label	Count
SPL	7
Excel	6
Basics	5
…	…

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc

pjf • 330 View • 3 Years ago

SPL: grouping and sorting aligned by sequence number

1. Sorting by sequence number

2. Retaining all matching members of each group

3. Overlapped grouping by sequence number

ToC