* How to Group & Summarize MongoDB Subdocuments
Sometimes we have the requirement of dividing MongoDB documents into multiple segments by specified intervals and performing an aggregation over each segment. Now we have collection scores. We need to group it by intervals of scores and count the students in each interval for both Chinese and English. Below is related information:
The score intervals are [60, 70, 80, 90], which are [0,60), [60,70), [70,80), [80,90), and [90,).
name |
age |
province |
subject |
zhou gao gao |
24 |
guang xi |
[Chinese,80],[Math,84][English,84],[Chemic,98] |
li chao |
16 |
shan dong |
[Chinese,94],[Math,88][English ,75],[Chemic,73] |
yang hao hao |
26 |
guang xi |
[Chinese,85],[Math,83][English ,64],[Chemic,71] |
… |
… |
… |
The expected result:
Segment |
Chinese |
English |
0 |
3 |
5 |
1 |
… |
|
2 |
… |
|
3 |
||
4 |
As MongoDB doesn’t provide related interface, we use MapReduce to traverse each document and count students of each score interval for the two subjects respectively. The solution is complicated.
It would be easy if we could use esProc SPL to group records by subjects and score interval and then concatenate result.
Download esProc installation package HERE.
Directions:
1. Write SPL script score.dfx in esProc:
A |
B |
|
1 |
[60, 70, 80, 90] |
/ Define score intervals |
2 |
=mongo_open("mongodb://127.0.0.1:27017/raqdb") |
/ Connect to MongoDB database |
3 |
=mongo_shell(A1,"scores.find(,{_id:0})") |
/ Get data of score collection |
4 |
=A3.groups(A1.pseg(subject.(score)(1)):Segment; count(1): Chinese ) |
/ Count students of each score interval for Chinese |
5 |
=A3.groups(A1.pseg(subject.(score)(3)):Segment; count(1): English) |
/ Count students of each score interval for English |
6 |
=A4.join(Segment, A5:Segment, English) |
/ A4Concatenate results of A4 and A5 |
7 |
>A1.close() |
/ Close database connection |
2. Execute the script to return the final result:
A6 |
Segment |
Chinese |
English |
0 |
1 |
3 |
|
1 |
2 |
3 |
|
2 |
8 |
5 |
|
3 |
6 |
8 |
|
4 |
3 |
1 |
Pseg() function returns the sequence number of interval a score falls in and accumulate the number of students score by score according to the corresponding sequence number of interval.
esProc provides JDBC interface to let you integrate the script into a Java program conveniently:
public static void doWork() {
Connection con = null;
java.sql.Statement st;
try{
Class.forName("com.esproc.jdbc.InternalDriver");
con = DriverManager.getConnection("jdbc:esproc:local://");
// Call script score.dfx
st=con.createStatement();
ResultSet rst = st.executeQuery("call score");
System.out.println(rst);
} catch(Exception e){
System.out.println(e);
} finally{
// Close database connection
if (con!= null) {
try {
con.close();
} catch(Exception e) {
System.out.println(e);
}
}
}
}
Read How to Call an SPL Script in Java to learn more about integration of esProc SPL script into a Java program.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL