* Perform Distinct Count over MongoDB Nested Subdocuments
Nested structures are common in MongoDB data. The database uses nested subdocuments to represent the one-to-many relationship. But, when you need to group nested subdocuments and perform distinct, you should first expand the nested structure to flat structure before using $group to perform aggregation.
Below is data in fund collection. The task is to group documents by id and get the number of unique fundcode under shares and trade fields respectively.
{ |
"agencyno" : "260", ] |
The MongoDB way of doing this: Use $group to group documents by id, extract fundcode under each document, use $unwind to split documents by shares and trade to convert to flat structure, use $addToSet to perform distinct during the second grouping, and then count fundcode under shares and trade respectively. The whole process is complicated.
All can be easy with esProc SPL. You can rearrange the trade subdocument into a table sequence, group records and perform distinct and count. It’s much easier.
Download esProc installation package HERE.
Implementation directions:
1. Write SPL script fund.dfx in esProc:
A |
B |
|
1 |
=mongo_open("mongodb://localhost:27017/local") |
/ Connect to MongoDB database |
2 |
=mongo_shell(A1,"fund.find(, {_id: 0})").fetch() |
/ Perform conditional filtering on fund collection |
3 |
=A2.trade.conj(if (#==1, t=~.fname(), t=t^~.fname())).id() |
/ Get intersection of fields under trade subdocument |
4 |
=A2.run( trade=trade.new( ${A3.(A3(#)).concat@c()})) |
/ Rearrange data of common fields into a table sequence |
5 |
=A2.group(id; ~.conj(shares.(fundcode)).id().count(): shares,~.conj(trade.(fundcode)).id().count():trade) |
/ Group records by id and perform distinct count on fundcode udner shares and trade fields |
6 |
>A1.close() |
/ Close database connection |
2. Debug and execute the script. Below is A3’s value:
A3 |
Member |
agencyno |
|
bk_tradetype |
|
…… |
3. Execute the script to return the final result:
A5 |
id |
shares |
trade |
1.0 |
4 |
6 |
|
2.0 |
3 |
3 |
|
… |
… |
… |
Since the fields under subdocument trade have different structures, we convert them into a table sequence for the convenience of subsequent processing.
esProc offers JDBC interface, so you can easily integrate the script into a Java program:
public static void doWork() {
Connection con = null;
java.sql.Statement st;
try{
Class.forName("com.esproc.jdbc.InternalDriver");
con = DriverManager.getConnection("jdbc:esproc:local://");
// Call script fund.dfx
st=con.createStatement();
ResultSet rst = st.executeQuery("call fund");
System.out.println(rst);
}catch(Exception e){
System.out.println(e);
}finally{
// Close database connection
if (con!=null) {
try {
con.close();
}catch(Exception e) {
System.out.println(e);
}
}
}
}
When fields under a certain nested subdocument have different structures, we can dynamically get the desired fields using union, intersection or other operation as needed. An easier alternative is to use a user-defined field to format data into a table sequence. The latter method makes it more convenient to further process data. Read How to Call an SPL Script in Java to learn more about integration of esProc SPL script into a Java program.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL