SPL: TopN and TopN in group
TopN query is to look up the first Ns / last Ns from the data. TopN query can not only directly get a value, but also inquire the detailed information of the record where TopN is located, and sometimes it also looks for the row number of TopN. Additionally, TopN query may also be used in groups to query the first Ns / last Ns within the group.
1. TopN
We can divide taking the first Ns / last Ns into three types of requirements to describe them in detail, respectively taking their values, taking their row numbers and taking their records.
In the case of the NASDAQ, some of the data are as follows:
Date |
Open |
Close |
Amount |
2019/01/02 |
6506.910156 |
6665.939941 |
2261800000 |
2019/01/03 |
6584.77002 |
6463.5 |
2607290000 |
2019/01/04 |
6567.140137 |
6738.859863 |
2579550000 |
2019/01/07 |
6757.529785 |
6823.470215 |
2507550000 |
2019/01/08 |
6893.439941 |
6897.0 |
2380290000 |
… |
… |
… |
… |
1.1 Taking the first Ns / last Ns values
Check out the top three volume values of the NASDAQ Index in 2019.
The SPL script looks like this:
A |
|
1 |
=T("IXIC.txt") |
2 |
=A1.select(year(Date)==2019) |
3 |
=A2.top(-3,Amount) |
A1: import NASDAQ index data
A2: select the data of 2019
A3: employ the A.top(n,x) function to get the three highest volume values. If n is a positive number, take the first Ns; if N is a negative number, take the last Ns. In particular cases, if n is ±1, return to the single value,which is similar to taking a maximum/minimum value.
We can also look up the four lowest volume values of the NASDAQ in 2019:
A |
|
3 |
=A2.top(4,Amount) |
A3: employ the A.top(n,x) function to get the four lowest volume values
1.2 Taking the row numbers of first Ns / last Ns
In an order-related set, we can operate inter-row calculations by taking the row numbers of the first N / last N members.
[e.g. 2] Inquire the increasing rate in trading volumes relative to the previous days of the top three highest closing prices of NASDAQ in 2019.
The SPL script looks like this:
A |
|
1 |
=T("IXIC.txt") |
2 |
=A1.select(year(Date)==2019).sort(Date) |
3 |
=A2.ptop(-3,Close) |
4 |
=A3.(A2(~).Amount/A2(~-1).Amount-1) |
A1: import NASDAQ data.
A2: select the data of 2019 and sort them by date.
A3: employ the A.top(n,x) function to take the row numbers of the top three closing prices.
A4: based on the selected row number, calculate the the increasing rate by comparing the trading volumes of current day with those of the previous day.
1.3 Taking the records of first Ns / last Ns
Sometimes, we don’t care about what the specific values of first Ns / last Ns are, we care more about the records where the values are. For example, search the names of the top three students in a math final exam, the top five sales customers’ names in 2020, and so on.
[e.g. 3] Check the trading volumes of the lowest closing prices of NASDAQ index in 2019.
The SPL script looks like this:
A |
|
1 |
=T("IXIC.txt") |
2 |
=A1.select(year(Date)==2019) |
3 |
=A2.top(5;Close) |
4 |
=A3.new(Date,Amount) |
A1: import NASDAQ data.
A2: select data of 2019
A3: employ the A.top(n,x) function to take records of the five trading days with the lowest closing prices.
A4: take the dates and volumes from the records of the five days.
2. TopN in group
The query of first Ns / last Ns in group is a very common requirement. For example, find out what the top two math scores are in each class, what the top three single sales customers are for each month, and so on. In this section, we’ll break down how to solve the problems of using TopN in group.
2.1 performing TopN after grouping
We can regard the TopN query as a kind of aggregation operation. First, the data is grouped according to certain conditions, and then the TopN query is performed on the result sets of each group. Let’s talk about it in terms of values and records respectively.
Take the student score table as an example. Some of the data are as follows:
Class |
StudentID |
Subject |
Score |
1 |
1 |
English |
95 |
1 |
1 |
Math |
90 |
1 |
1 |
PE |
80 |
1 |
2 |
English |
75 |
1 |
2 |
Math |
84 |
… |
… |
… |
… |
[e.g. 4] Check the scores of the top two students in each class.
The SPL script looks like this:
A |
|
1 |
=T("Score.txt") |
2 |
=A1.select(Subject:"Math") |
3 |
=A2.group(Class;~.top(-2,Score):TOP2) |
4 |
=A3.new(Class,TOP2(1):First,TOP2(2):Second) |
A1: import score table.
A2: select the math grades.
A3: group by class, and use the A.top() function to count the top two math scores in each class.
A4: create the result table. The first column is the class, the second column is the first student, and the third column is the second student.
[e.g. 5] Check the information about the top three students of each subject in every class.
The SPL script looks like this:
A |
|
1 |
=T("Score.txt") |
2 |
=A1.group(Class,Subject;~.top(-3;Score):TOP3) |
3 |
=A2.conj(TOP3) |
A1: import score table.
A2: divide them into groups by class and subject and take the records of the top three scores in each group.
A3: concatenate the records of the top two students of each subject in all classes.
2.2 TopN operation by accumulation
To perform TopN operation using accumulation does not produce the result set of a grouping, which is often used when the amount of data is big. Let’s nevertheless talk about it in terms of values and records.
Take the sales table for example. Some of the data are as follows:
OrderID |
Customer |
OrderDate |
SellerId |
Amount |
81182311 |
VINET |
2013/07/04 |
5 |
2440.0 |
98807954 |
TOMSP |
2013/07/05 |
6 |
1863.4 |
65550721 |
HANAR |
2013/07/08 |
4 |
1813.0 |
37311312 |
VICTE |
2013/07/08 |
3 |
670.8 |
80138612 |
SUPRD |
2013/07/09 |
4 |
3730.0 |
… |
… |
… |
… |
… |
[e.g. 6] Check the top two sales amounts of each month in 2014.
The SPL script looks like this:
A |
|
1 |
=file("Sales.txt").cursor@t().select(year(OrderDate)==2014) |
2 |
=A1.groups(month(OrderDate):Month;top(-2,Amount):TOP2) |
3 |
=A2.news(TOP2;Month,~:Amount) |
A1: generate cursors for the sales table and select the data of 2014.
A2: group them by month and get the top two sales amounts of each month.
A3: create the results table, the first column is the month and the second one is the amount.
[e.g. 7] calculate the sales records of the top three sales in each month of 2014.
The SPL script looks like this:
A |
|
1 |
=file("Sales.txt").cursor@t().select(year(OrderDate)==2014) |
2 |
=A1.groups(month(OrderDate);top(-3;Amount):TOP3) |
3 |
=A2.conj(TOP3) |
A1: generate cursors for the sales table and select the data of 2014.
A2: group them by month and get the records of the top three sales amounts per month.
A3: concatenate the records of the top three sales amounts of each month.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL