Subgroup Aggregation and Getting Top N
【Question】
Source: SQL Query with Time Interval
I want to get best 3 day of users between “2014-07-01” and “2014-08-01”
Could someone help me? I’ve been stuck here for 3 days.
In real score table entries are from 10:00 to 22:00 and 1 entry for each hour.
A total of 12 entries each day and each player (Sometimes it could be less 1 or 2).
This is the output I’m trying to get:
ID | User_ID | Username | Sum(Score) | Date
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
1| 1| Xxx | 52| 2014-07-01
2| 1| Xxx | 143| 2014-07-02
3| 2| Yyy | 63| 2014-07-01
...
Score table:
ID | User_ID | Score | Datetime
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
1| 1| 35| 2014-07-0111:00:00
2| 1| 17| 2014-07-0112:00:00
3| 2| 36| 2014-07-0111:00:00
4| 2| 27| 2014-07-0112:00:00
5| 1| 66| 2014-07-0211:00:00
6| 1| 77| 2014-07-0212:00:00
7| 2| 93| 2014-07-0212:00:00
...
User table:
ID | Username
\-\-\-\-\-\-\-\-\-\-\-\-\-\-
1| Xxx
2| Yyy
3| Zzz
...
A solution:
I think you need to aggregate first by date, and then choose the first three usingrow_number(). To do the aggregation:
select s.user_id,sum(s.datetime,'day')as theday,sum(score)as score,
row_number()over(partitionby s.user_id orderby sum(score)desc)as seqnum
from scores s
groupby s.user_id;
To get the rest of the information, use this as a subquery or CTE:
select u.*,s.score
from(select s.user_id,sum(s.datetime,'day')as theday, sum(s.score)as score,
row_number()over(partitionby s.user_id orderby sum(s.score)desc)as seqnum
from scores s
groupby s.user_id
)s join
users u
on s.user_id =u.users_id
whereseqnum <=3
orderby u.user_id,s.score desc;
【Answer】
This is a typical intra-group computation. Here is the logic:
1. Join the two source tables and group records by User_ID. Each group includes all records of a user.
2. For each group, group records by Datetime and sum the scores each day.
3. Find the records containing top 3 scores in each subgroup.
4. Concatenate the selected records.
The logic is clear but SQL has difficulty in expressing the algorithm. Here I choose to get it done in SPL. The Structured Process Language can phrase an intra-group computation conveniently. Here’s the SPL script:
A |
|
1 |
=dataSource.query("select s.ID ID,s.User_ID User_ID,u.Username Username,s.Score Score,s.Datetime Datetime from Scoretable s join Usertable u on s.User_ID = u.ID") |
2 |
=A1.group(User_ID) |
3 |
=A2.(~.groups(User_ID,Username,date(Datetime):day; sum(Score):sumScore)) |
4 |
=A3.(~.top(-3;sumScore)) |
5 |
=A4.conj() |
1. The sign “~” represents each group of records.
2. SPL’s group() function groups data but won’t aggregate it (A2).
3. SPL’s groups() function groups data while performs aggregation (A3).
4. An SPL script is easily integrated with a Java program or a reporting tool.
More similar examples can be found in SPL Simplified SQL Case Details: Intra-group Computing.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL