Group and Summarize Data in a String
Problem description & analysis
There is a SURVERY table in the database. It has two fields – QID and ANSWERS, as shown below:
QID |
ANSWERS |
1 |
A|B|A|C|D|E|A|E|C|D |
2 |
A|C|E|D|A|E|D|B|E|D |
3 |
C|D|E|A|B|C|A|C|E|A |
4 |
E|E|B|A|D|B|C|A|B|D |
5 |
D|A|C|B|E|A|B|D|D|A |
6 |
A|C|A|C|C|E|D|B|E|C |
7 |
E|B|A|A|A|B|A|C|D|D |
8 |
B|C|D|C|C|E|C|A|C|A |
9 |
B|C|A|E|E|B|A|A|D|E |
We are trying to group each ANSWERS value and sum members in each group. Below is the expected result:
QID |
ANSWER |
NUM |
1 |
A |
3 |
1 |
B |
1 |
1 |
C |
2 |
1 |
D |
2 |
1 |
E |
2 |
2 |
A |
2 |
2 |
B |
1 |
… |
… |
… |
Solution
We write the following script p1.dfx in esProc:
A |
|
1 |
=connect("demo") |
2 |
=A1.query@x("SELECT * FROM SURVEY") |
3 |
=A2.news(ANSWERS.split("|");QID,~:ANSWER) |
4 |
=A3.group(QID,ANSWER;count(~):NUM) |
Explanation:
A1 Connect to demo database.
A2 Perform SQL, return query result as a table sequence, and automatically close database connection.
A3 Loop through each row of A2’s table sequence. Split each ANSWERS into a sequence by the vertical line "|" and generate a row for each member. The row consists of two columns. The first column is QID, and the second one, named ANSERE, is the current member (represented by ~) of the split sequence,
A4 Group A3’s table sequence by QID and ANSWERE and count rows in each group.
Read How to Call an SPL Script in BIRT to learn about the method of integrating the SPL script into BIRT.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL