How to find the same values that appear in each group
Key words:Group by week; Intersection of groups; Intergroup calculation
After grouping data, you need to find a field value that appears in each group.
Using SQL to solve this problem is not easy!
SQL is lack of pure grouping operation, and it can only return the statistical value of each group, but can't perform more detailed intersection, difference, union and other set operations on the members of each group. To solve this problem, we need multi-layer nested subquery to realize it in an indirect way.
For example:Check the employees who work overtime every week for recognition. The existing overtime data table overtime_work is as follows:
WORKDATE |
NAME |
2019-07-01 |
Tom |
2019-07-02 |
John |
2019-07-03 |
Tom |
2019-07-04 |
Cart |
2019-07-08 |
Tom |
2019-07-09 |
Jordan |
2019-07-10 |
Kate |
2019-07-10 |
John |
2019-07-15 |
Leon |
2019-07-16 |
John |
2019-07-17 |
Tom |
2019-07-18 |
John |
2019-07-22 |
Jim |
2019-07-23 |
Tom |
2019-07-24 |
John |
2019-07-25 |
Cart |
… |
… |
Take Oracle as an example. It is written in SQL as follows:
WITH A AS
(SELECT COUNT(*) NUM FROM
(SELECT DISTINCT TO_CHAR(WORKDATE,'IW') FROM OVERTIME_WORK )
),
B AS
(SELECT TO_CHAR(WORKDATE,'IW') W, NAME
FROM OVERTIME_WORK
GROUP BY TO_CHAR(WORKDATE,'IW'), NAME
),
C AS
(SELECT NAME, COUNT(*) NUM FROM B GROUP BY NAME )
SELECT NAME FROM C WHERE NUM=(SELECT NUM FROM A);
Here, A finds out how many weeks in total. B finds out different employees who work overtime each week. C finds out the number of times that each employee in B appears. Finally, select the employees whose number of appearing times is equal to the total number of weeks from C. The solution is rather convoluted. This kind of SQL is hard to write and understand.
esProc SPL supports set operations, and it would be much simpler to solve this problem. You only need to calculate the intersection of employees who worked overtime each week, and only one line is needed:
connect("mydb").query("SELECT * FROM OVERTIME_WORK").group((WORKDATE-date("2019-07-01"))\7).(~.(NAME)).isect()
[question extension]
*Query employees who work overtime at least twice each week.
*Query the customers whose consumption amount is within top 20 for every month.
SPL is good at solving such grouped subsets and ordered calculation within groups, please refer to Calculate the first N rows of each group, Intra-group Computing, Interline Computing and SPL Set
esProc SPL is a professional script language to solve difficult SQL problems. It has simple syntax, conforms to natural thinking, and is a natural step-by-step, procedure oriented computing language. It uses a unified syntax independent of database, and the algorithm can be seamlessly migrated between databases. A desktop computing tool, esProc is ready to use, with simple configuration and complete debugging functions: breakpoints and single step execution can be set, and the result of each step can be viewed. Please refer to: http://www.raqsoft.com/html/sql-enhancer.html
It's also easy to embed esProc into Java applications,please refer to How to Call an SPL Script in Java
For specific usage, please refer to Getting started with esProc
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL