Take the top three of each group from Hive in one line
Hive uses window functions and subqueries to implement this, which is not easy, as follows:
SELECT *
FROM (
SELECT g, ROW_NUMBER() OVER(PARTITION BY g
ORDER BY x) group_id RN FROM A
)
WHERE group_id <= 3
ORDER BY g ASC, group_id ASC
In this scenario, it would be easier to use esProc SPL, with only one line:
=connect(”HiveDB”).query(“select * from A”).group(g;~.top(-3;x)).conj(#2)
*How to retrieve rows larger than the median in a group?
*What is the largest increase in the group?
*Which groups have experienced continuous rise?
SPL is good at solving such grouped subsets and ordered calculation within groups, such as 《SPL Simplified SQL Case Details: Calculate the first N rows of each group》 ; please refer to《SPL Simplified SQL Case Details: Intra-group computing》 and 《SPL Simplified SQL Case Details: Interline Computing》
When the data is not in the database, it is still convenient for SPL to perform complex calculations:
=file(“d:/t.csv”).import(;,",").group...
It's also easy to embed esProc into Java applications,please refer to How to Call an SPL Script in Java
For specific usage, please refer to Getting started with esProc
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL