" Taking out the first N rows of data from each group is a common operation, such as: which time .."

pjf RaqForum 22 No.
1 Reply • 2,049 View • 5 Years ago

SPL Simplified SQL - Calculate the first N rows of each group

Taking out the first N rows of data from each group is a common operation, such as: which time is the highest salary increase per employee; which three times is the worst golf performance; and which five days is the highest sales of each product per month. In SQL, it is difficult to process the code indirectly with advanced techniques such as window function and keep/top/rownumber. Moreover, many databases (such as MySQL) do not have these advanced functions, so they can only be implemented with complex JOIN statements and nested sub-queries. If it is multi-level grouping and multi-level association, the calculation process will be more complex.

The top function of SPL can extract the first N lines in the grouping by line number, maximum and minimum. It will be easier to solve this kind of problem. Here is an example to illustrate.

The database table Golf stores golf scores of several members. Some of the data are as follows:

ID	USER_ID	SCORE	DATETIME
1	1	35	2014-07-01 11:00:00
2	1	17	2014-07-01 12:00:00
3	2	36	2014-07-01 11:00:00
4	2	27	2014-07-01 12:00:00
5	1	66	2014-07-02 11:00:00
6	1	77	2014-07-02 12:00:00
7	2	93	2014-07-02 12:00:00
8	1	27	2014-07-03 12:00:00
9	1	48	2014-07-03 18:00:00
10	1	36	2014-07-04 18:00:00
11	3	77	2014-07-01 12:00:00
12	3	68	2014-07-02 13:00:00
13	4	25	2014-07-02 13:00:00

Please take out the three best scores of each member.

SPL codes：

	A
1	=db.query("select * from golf")
2	=A1.group(USER_ID)
3	=A2.(~.top(-3;SCORE))
4	=A3.union()
5	>file("golf.csv").export@ct(A4)

A1：Take the data from the database. Click on the cell to see the results of the fetch:

undefined

A2：=A1.group(USER_ID) The results of A1 calculation are grouped according to USER_ID. The results are as follows:

undefined

Each row represents a group. Double-click on the light blue grid to see the members of the group, as follows:

undefined

A3：= A2.(~.top(-3;SCORE)) Calculate the first three records of SCORE field for each group of data. Here "~" denotes each group of data, and ~.top() denotes the application of function top to each group of data in turn. Function top can get the first N records of data set, such as top (3; SCORE) which is arranged in ascending order according to SCORE, the first three (i.e. minimum value); top (- 3; SCORE) which is arranged in descending order, the first three (i.e. maximum value). The results of this step are as follows:

undefined

A4：=A3.union() Merge groups of data, and the results are as follows:

undefined

A5：>file("golf.csv").export@ct(A4)

This code exports the calculation results to the "golf. csv" file, which can be viewed by Excel and other tools:

undefined

In addition, SPL can also be directly invoked by reporting tools or Java programs. The method of invoking SPL is similar to that of ordinary database. The JDBC interface provided by SPL can return the results of calculation in the form of ResultSet to the Java main program. Specific methods can refer to relevant documents. 【How to call an SPL script in Java】

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/cFTcUNs7

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc