Aggregation by Group and Value Completion
【Question】
I query data by type and year in C# and SQL respectively and store it in table1, as shown below:
id, type, year
1 33 2014 second half-year
2 33 2014 second half-year
3 33 2013 first half-year
4 34 2013 first half-year
The following is what I want. The data is sorted first by year and then by type, and each year value is summed (the sum is 0 if there is no corresponding year value).
year, no, type
2014 second half-year 2 33
2014 first half-year 0 33
2013 second half-year 0 33
2013 first half-year 1 33
2014 second half-year 0 34
2014 first half-year 0 34
2013 second half-year 0 34
2013 first half-year 1 34
【Answer】
This is a calculation where data is aligned and aggregated by a certain sequence. To do it, SQL will use a subquery to compose a base table and then perform a JOIN to align, as shown below:
select x._year, IFNULL(x._no,0), x._type from (
select a._year _year,b._no _no, a._type _type from (
with yt as (
select ‘2013 first half-year’ _year from dual union all
select ‘2013 second half-year’ _year from dual union all
select ‘2014 first half-year’ _year from dual union all
select ‘2014 second half-year’ _year from dual
),tt as (
select ‘33’ _type from dual union all
select ‘34’ _type from dual
) select yt._year, tt._type from yt join tt on 1 = 1
)a
left join (
select _year , count(_type) ‘_no’, ‘33’ as ‘_type’ from test5 where _type=33 group by _year
union all
select _year , count(_type) ‘_no’, ‘34’ as ‘_type’ from test5 where _type=34 group by _year
)b
on a._year = b._year and a._type = b._type) x
It’s much easier to do it in SPL. We can calculate the aggregates and generate the result set by loop:
A |
|
1 |
=connect(“db”) |
2 |
=query(“select _type,_year from test5”) |
3 |
=A2.id(left(_year,4)).conj([+“first half-year”,+“second half-year”]) |
4 |
=A2.group(_type) |
5 |
=A4.news(A3;:year,A4..count(year==~._year):no,A4._type:type) |
A1: Connect to database db;
A2: Query data from the original data table;
A3: Get the year and compose the strings of “first half-year” and “second half-year”. id() function gets distinct year values. left() function gets a string whose length is n from a year value beginning from the leftmost. conj() function performs concatenation over a sequence of sequences to generate a new sequence. Here expression A2.id(left(_year,4)) represents a sequence. Joining up its members and string “first half-year” or string “second half-year” is A3’s result;
A4: Group A2’s data by _type;
A5: Perform news() function over the table sequence in A4’s each group while calculating the field values according to A3’s year and concatenate them to generate a new table sequence. The first field value is represented by the sign ~, which corresponds to parameter A3 and which is the year value and named year. The second field value is calculated from counting records of the table sequence in A4’s each group according to the condition that the year value in a record is equivalent to the new year value; this second field value is named no. The last field is the grouping field, whose values are A4’s _type field values; the field name is type.
A2
A3
A4
A5
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL