How to Calculate Median in Each Subgroup in SQL
The median is the middle value in sorted list of numbers. If the number count is odd, the median is the middle number; if it is even, the median is the average of the middle two numbers.
How to get the median in each subgroup after a data set is grouped? That’s one of the traditional SQL headaches.
SQL is based on unordered sets and thus doesn’t have the concept of position. To handle position-relevant calculations, you need to write program to number the records. But it’s hard to number records in every subgroup. SQL2003 introduces window functions to do the job but the median calculation is still complicated.
For instance, to get the median score for each subject based on SOCORES table:
COURSE |
SCORE |
History |
68.5 |
History |
79.0 |
History |
82.5 |
History |
88.0 |
History |
93.5 |
Maths |
75.5 |
Maths |
83.0 |
Maths |
85.0 |
Maths |
95.5 |
The expected query result:
COURSE |
SCORE |
History |
82.5 |
Maths |
84.0 |
The Oracle SQL solution:
WITH A AS
( SELECT COURSE, SCORE,
ROW_NUMBER()OVER ( PARTITION BY COURSE ORDER BY SCORE) AS RN,
COUNT(*) OVER (PARTITION BY COURSE) AS CNT
FROM SCORES ),
B AS
(SELECT * FROM A WHERE RN>(CNT-0.5)/2 AND RN<(CNT+2.5)/2 )
SELECT COURSE, AVG(SCORE) AS SCORE FROM B
GROUP BY COURSE
ORDER BY COURSE;
A assigns numbers to records in each subgroup; B gets the middle record in each subgroup and calculates average. The solution is roundabout and difficult to write. Solutions are more complicated if you don’t use window functions.
esProc SPL supports intra-group calculation and offers median function to make it easy. You just need a one-liner:
connect("mydb").query("select * from scores order by course, score").group(COURSE).new(~.COURSE,~.(SCORE).median():SCORE)
It’s convenient to handle subsets and order-based intra-group calculations in SPL. Find more examples in Get Top N Rows in Each Subgroup, Intra-group Calculations and Inter-row Calculations.
esProc SPL is a scripting language good at offering simple/ and easy solutions to SQL headaches. It has simple syntax and is process-oriented with inherent support of step-by-step programming that agrees with human way of thinking. Its syntax is database independent but universal, which enables algorithms that can be seamlessly migrated between different database products. It is desktop based, ready to use, simple to configure and convenient to debug with breakpoint setting and step-by-step execution during which you can view the result of each step. See SQL Enhancer to learn more.
SPL is integration-friendly with a Java program. Read How to Call an SPL Script in Java to learn details.
About how to work with esProc, read Getting Started with esProc.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL