"【Question】 Source: [url] I want to get best 3 day of users between “2014-07-01” and “2014-08-01” .."

blackduckie RaqForum 28 No.
489 View • 4 Years ago

Subgroup Aggregation and Getting Top N

SQL(179)

【Question】

Source: SQL Query with Time Interval

I want to get best 3 day of users between “2014-07-01” and “2014-08-01”

Could someone help me? I’ve been stuck here for 3 days.

In real score table entries are from 10:00 to 22:00 and 1 entry for each hour.

A total of 12 entries each day and each player (Sometimes it could be less 1 or 2).

This is the output I’m trying to get:

ID | User_ID | Username | Sum(Score) | Date

\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-

1| 1| Xxx | 52| 2014-07-01

2| 1| Xxx | 143| 2014-07-02

3| 2| Yyy | 63| 2014-07-01

...

Score table:

ID | User_ID | Score | Datetime

\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-

1| 1| 35| 2014-07-0111:00:00

2| 1| 17| 2014-07-0112:00:00

3| 2| 36| 2014-07-0111:00:00

4| 2| 27| 2014-07-0112:00:00

5| 1| 66| 2014-07-0211:00:00

6| 1| 77| 2014-07-0212:00:00

7| 2| 93| 2014-07-0212:00:00

...

User table:

ID | Username

\-\-\-\-\-\-\-\-\-\-\-\-\-\-

1| Xxx

2| Yyy

3| Zzz

...

A solution:

I think you need to aggregate first by date, and then choose the first three usingrow_number(). To do the aggregation:

select s.user_id,sum(s.datetime,'day')as theday,sum(score)as score,

row_number()over(partitionby s.user_id orderby sum(score)desc)as seqnum

from scores s

groupby s.user_id;

To get the rest of the information, use this as a subquery or CTE:

select u.*,s.score

from(select s.user_id,sum(s.datetime,'day')as theday, sum(s.score)as score,

row_number()over(partitionby s.user_id orderby sum(s.score)desc)as seqnum

from scores s

groupby s.user_id

)s join

users u

on s.user_id =u.users_id

whereseqnum <=3

orderby u.user_id,s.score desc;

【Answer】

This is a typical intra-group computation. Here is the logic:

1. Join the two source tables and group records by User_ID. Each group includes all records of a user.

2. For each group, group records by Datetime and sum the scores each day.

3. Find the records containing top 3 scores in each subgroup.

4. Concatenate the selected records.

The logic is clear but SQL has difficulty in expressing the algorithm. Here I choose to get it done in SPL. The Structured Process Language can phrase an intra-group computation conveniently. Here’s the SPL script:

	A
1	=dataSource.query("select s.ID ID,s.User_ID User_ID,u.Username Username,s.Score Score,s.Datetime Datetime from Scoretable s join Usertable u on s.User_ID = u.ID")
2	=A1.group(User_ID)
3	=A2.(~.groups(User_ID,Username,date(Datetime):day; sum(Score):sumScore))
4	=A3.(~.top(-3;sumScore))
5	=A4.conj()