"Sometimes we have the requirement of dividing MongoDB documents into multiple segments by specif .."

blackduckie RaqForum 28 No.
368 View • 3 Years ago

* How to Group & Summarize MongoDB Subdocuments

mongodb(19) subdocument(2) segmentation(1) merge(3) group(18)

Sometimes we have the requirement of dividing MongoDB documents into multiple segments by specified intervals and performing an aggregation over each segment. Now we have collection scores. We need to group it by intervals of scores and count the students in each interval for both Chinese and English. Below is related information:
The score intervals are [60, 70, 80, 90], which are [0,60), [60,70), [70,80), [80,90), and [90,).

name	age	province	subject
zhou gao gao	24	guang xi	[Chinese,80],[Math,84][English,84],[Chemic,98]
li chao	16	shan dong	[Chinese,94],[Math,88][English ,75],[Chemic,73]
yang hao hao	26	guang xi	[Chinese,85],[Math,83][English ,64],[Chemic,71]
…	…	…

The expected result:

Segment	Chinese	English
0	3	5
1	…
2	…
3
4

As MongoDB doesn’t provide related interface, we use MapReduce to traverse each document and count students of each score interval for the two subjects respectively. The solution is complicated.

It would be easy if we could use esProc SPL to group records by subjects and score interval and then concatenate result.
Download esProc installation package HERE.

Directions:

1. Write SPL script score.dfx in esProc:

	A	B
1	[60, 70, 80, 90]	/ Define score intervals
2	=mongo_open("mongodb://127.0.0.1:27017/raqdb")	/ Connect to MongoDB database
3	=mongo_shell(A1,"scores.find(,{_id:0})")	/ Get data of score collection
4	=A3.groups(A1.pseg(subject.(score)(1)):Segment; count(1): Chinese )	/ Count students of each score interval for Chinese
5	=A3.groups(A1.pseg(subject.(score)(3)):Segment; count(1): English)	/ Count students of each score interval for English
6	=A4.join(Segment, A5:Segment, English)	/ A4Concatenate results of A4 and A5
7	>A1.close()	/ Close database connection

2. Execute the script to return the final result:

A6	Segment	Chinese	English
	0	1	3
	1	2	3
	2	8	5
	3	6	8
	4	3	1

Pseg() function returns the sequence number of interval a score falls in and accumulate the number of students score by score according to the corresponding sequence number of interval.

esProc provides JDBC interface to let you integrate the script into a Java program conveniently:

public static void doWork() {

    Connection con = null;
    java.sql.Statement st;

    try{
        Class.forName("com.esproc.jdbc.InternalDriver");
        con = DriverManager.getConnection("jdbc:esproc:local://");
        // Call script score.dfx
        st=con.createStatement();
        ResultSet rst = st.executeQuery("call score");
        System.out.println(rst);
    } catch(Exception e){
        System.out.println(e);
    } finally{
        // Close database connection
        if (con!= null) {
            try {
                con.close();
            } catch(Exception e) {
                System.out.println(e);
            }
        }
    }
}

Read How to Call an SPL Script in Java to learn more about integration of esProc SPL script into a Java program.

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

mongodb(19) subdocument(2) segmentation(1) merge(3) group(18)

Application

blackduckie • 368 View • 3 Years ago