Row-to-Column Transposition & Dynamic-column-based Enumerated Grouping
【Question】
I am trying to accomplish this in MongoDB. Didn’t think it could get this complicated. Thought the problem was interesting to solve.
I am trying to get a count of students by scores for various subjects.
The example below shows 2 subjects. In reality we could run this ad-hoc query for 1 or more subjects (So cannot “can” and should be real-time). E.g. run the query for a bunch of schools and a bunch of subjects.
The grades are always 1-5 without a decimal point.
use students
db.studentsummary.insert ({school:‘atl1’, sname : ‘Sean’ , sub1: 4, sub2 :5})
db.studentsummary.insert ({school:‘atl1’, sname : ‘chris’ , sub1: 4, sub2 :3})
db.studentsummary.insert ({school:‘atl1’, sname : ‘becky’ , sub1: 5, sub2 :4})
db.studentsummary.insert ({school:‘atl1’, sname : ‘sam’ , sub1: 5, sub2 :4})
db.studentsummary.insert ({school:‘atl2’, sname : ‘dustin’ , sub1: 2, sub2 :2})
db.studentsummary.insert ({school:‘atl2’, sname : ‘greg’ , sub1: 3, sub2 :4})
db.studentsummary.insert ({school:‘atl2’, sname : ‘peter’ , sub1: 5, sub2 :1})
db.studentsummary.insert ({school:‘atl2’, sname : ‘brad’ , sub1: 2, sub2 :2})
db.studentsummary.insert ({school:‘atl2’, sname : ‘liz’ , sub1: 3, sub2 :null})
Desired Output: (Would like to see how close we could get to the desired output below)
Show how many got a 5; how many got a 4 and so on…
I tried quite a bit - trying to group by each subject and run different pipelines based on the subjects chosen for query and let the front end manage the merge and pivot, but performance was unacceptable.
Any help will be highly appreciated.
【Answer】
MongoDB doesn’t offer a direct support for row-to-column transposition and it takes a very roundabout route to implement an enumerated grouping operation. It retrieves data out to handle it in Java or PHP or other programming languages. In this case a dynamic column is involved. It’s difficult to handle such a set operation. It would be much easier if you could implement them in Structured Process Language (SPL). The language has mongo_open() function to connect to the MongoDB database and uses mong_shell() function to get the source data structure.
A |
B |
C |
|
1 |
=mongo_open(“mongodb://localhost:27017/local?user=test&password=test”) |
||
2 |
=mongo_shell(A1,“student.find()”) |
||
3 |
=[5,4,3,2,1] |
>subs=[“sub1”,“sub2”] |
|
4 |
=A2.group(school) |
>newfields=[] |
|
5 |
for subs |
>newfields=newfields|A3.(A5+“_”+string(~)) |
|
6 |
>result=create(${“school,”+newfields.concat@c()}) |
||
7 |
for A4 |
>temp=[] |
|
8 |
for subs |
>temp=temp|A7.align@a(A3,${B8}).(~.len()) |
|
9 |
>temp=[A7.school]|temp |
||
10 |
>result.record(temp) |
||
11 |
>mongo_close(A1) |
A1,A2: Get the source data to return as a table sequence.
A3,B3: Define dynamic grade intervals and the column names.
A4: Group A2’s table sequence by schools.
A5,B5,A6: Generate the result table sequence made up of the following fields - school, sub1_5, sub1_4…sub1_1、sub2_5、sub2_4…sub2_1.
A7: Loop each of A4’s group to get ready to insert values to a record (B10).
B8,C8: For each of A4’s school, A7 loops each of B8’ subject to get the grades, group them by A3’s grade intervals, and join them up into a sequence in order.
B9: Add the school name before each sequence to get in line with the fields in the result table sequence.
B10: Populate values into the result table sequence.
If the columns are fixed and there are only several subjects, the code can be simpler but the logic is similar:
A |
|
1 |
=mongo_open(“mongodb://localhost:27017/local?user=test&password=test”) |
2 |
=mongo_shell(A1,“student.find()”) |
3 |
=A2.group(school) |
4 |
=[5,4,3,2,1] |
5 |
=A3.new(school,.align@a(A4,sub1).(.len()):sub1,.align@a(A4,sub2).(.len()):sub2) |
6 |
=A5.new(school,.sub1(1):sub1_5,.sub1(2):sub1_4,.sub1(3):sub1_3,.sub1(4):sub1_2,.sub1(5):sub1_1,.sub2(1):sub2_5,.sub2(2):sub2_4,.sub2(3):sub2_3,.sub2(4):sub2_2,.sub2(5):sub2_1) |
7 |
>mongo_close(A1) |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL