Simplifying MongoDB Data Association
With $lookup
operator, MongoDB supports the most basic data association between multiple tables. When data association tasks are complicated, the Mongo shell script becomes complicated. With SPL’s discreteness and ease of use (SPL is the abbreviation of Structured Process Language developed by Raqsoft), esProc can help simplify data association in MongoDB. There are more related articles in Raqsoft Community (https://c.scudata.com/).
MongoDB is a document-oriented NoSQL database operated on a distributed file system. It stores documents in BSON format, which descripts entity attributes in a more natural and intuitive manner. Since Version 3.2, MongoDB has improved its query capability by supporting data association via $lookup. The real-world business scenarios, however, are complicated and the computing problems are knotty. This results in tortuous scripting process.
In this article, we’ll explain how to simplify association of MongoDB data in SPL.
1. Association with embedded structures (1)
2. Association with embedded structures (2)
3. Association with embedded structures (3)
4. Association with selected fields
5. Multi-table association
6. Associated query by array values
7. Calling esProc dfx script in Java
1. Association with embedded structures (1)
This involves two associated tables. One of table A’s field is associated with the sub-documents in table B, and the to-be-returned data is contained in the sub-documents. For example, the childs field of childsgroup table contains embedded arrays, in which the name item and mobile item need to be returned.
Test data:
history:
_id | id | History | child_id |
---|---|---|---|
1 | 001 | today worked | ch001 |
2 | 002 | Working | ch004 |
3 | 003 | now working | ch009 |
childsgroup:
_id | groupid | name | childs |
---|---|---|---|
1 | g001 | group1 | {"id":"ch001","info":{"name":"a",mobile:1111}}, {"id":"ch002","info":{"name":"b",mobile:2222}} |
2 | g002 | group1 | {"id":"ch004","info":{"name":"c",mobile:3333}}, {"id":"ch009","info":{"name":"d",mobile:4444}} |
The child_id field in history table associates with childs.id in childsgroup table. Below is the desired joining result:
{
“_id” : ObjectId(“5bab2ae8ab2f1bdb4f434bc3”),
“id” : “001”,
“history” : “today worked”,
“child_id” : “ch001”,
“childInfo” :
{
“name” : “a”,
“mobile” : 1111
}
………………
}
MongoDB script:
db.history.aggregate([ {$lookup: { from: "childsgroup", let: {child_id: "$child_id"}, pipeline: [ {$match: { $expr: { $in: [ "$$child_id", "$childs.id"] } } }, {$unwind: "$childs"}, {$match: { $expr: { $eq: [ "$childs.id", "$$child_id"] } } }, {$replaceRoot: { newRoot: "$childs.info"} } ], as: "childInfo" }}, {"$unwind": "$childInfo"} ]) |
The script uses five operators, including lookup, pipeline, match, unwind and replaceRoot, to handle the computation. It’s complicated and difficult to write.
SPL script (childsgroup.dfx)
A | |
---|---|
1 | =mongo_open("mongodb://127.0.0.1:27017/raqdb") |
2 | =mongo_shell(A1,"history.find()").fetch() |
3 | =mongo_shell(A1,"childsgroup.find()").fetch() |
4 | =A3.conj(childs) |
5 | =A2.join(child_id,A4:id,info) |
6 | >A1.close() |
Associated query result:
_id | id | history | child_id | info |
---|---|---|---|---|
1 | 001 | today worked | ch001 | [a,1111] |
2 | 002 | working | ch004 | [c,3333] |
3 | 003 | now working | ch009 | [d,4444] |
SPL script explanation:
A1: Connect to MongoDB database.
A2: Read in data from history table.
A3: Read in data from childsgroup table.
A4: Concatenate the array values in childsgroup table’s childs field into a table sequence.
A5: Perform an associated query between history table an A4’s childs table via the associated fields child_id and id and append an info field to generate a new table sequence.
A6: Close the database connection.
The SPL script is simple, clear and efficient.
2. Association with embedded structures (2)
There are two associated tables. Table A is associated with table B’s embedded documents. The task is to combine table A’s data under the embedded documents. For example, txtPost table’s Comment field contains array values; combine comment_content field of txtComment table under the arrays.
txtComment:
_ID | comment_no | comment_content |
---|---|---|
1 | 143 | test test |
2 | 140 | math |
txtPost:
_ID | post_no | Comment |
---|---|---|
1 | 48 | [{"comment_no" : 143, "comment_group" : 1} ] |
2 | 47 | [{"comment_no" : 140, "comment_group" : 2}, {"comment_no" : 143, "comment_group" : 3} ] |
Desired result:
_ID | post_no | Comment |
---|---|---|
1 | 48 | [{"comment_no" : 143, "comment_group" : 1,"comment_content" : "test test"} ] |
2 | 47 | [{"comment_no" : 140, "comment_group" : 2,"comment_content" : "math"}, {"comment_no" : 143, "comment_group" : 3,"comment_content" : "test test"} ] |
MongoDB script:
db.getCollection("txtPost").aggregate([ {"$unwind": "$comment"}, { "$lookup": { "from": "txtComment", "localField": "comment.comment_no", "foreignField": "comment_no", "as": "comment.comment_content" }}, {"$unwind": "$comment.comment_content"}, {"$addFields": { "comment.comment_content":"$comment.comment_content.comment_content"}}, { "$group": { "_id": "$_id", 'post_no':{"$first": "$post_no"}, "comment": {"$push": "$comment"} }}, ]).pretty() |
Split txtPost records according to members of Comment arrays into more records, perform an associated query between them with records of txtComment, store the result as an array and split the array into records, move comment_content values under Comment field, and group and concatenate the records.
SPL script:
A | |
---|---|
1 | =mongo_open("mongodb://127.0.0.1:27017/raqdb") |
2 | =mongo_shell(A1,"txtPost.find()").fetch() |
3 | =mongo_shell(A1,"txtComment.find()").fetch() |
4 | =A2.conj(comment.derive(A2.post_no:pno)) |
5 | =A4.join(comment_no,A3:comment_no,comment_content:Content) |
6 | =A5.group(pno;~:comment) |
7 | >A1.close() |
Associated query result:
pno | Comment |
---|---|
47 | [[140, 2,47, …],[143, 3,47, …] ] |
48 | [[143, 1,48, …]] |
SPL script explanation:
A1:Connect to MongoDB database.
A2:Read data from txtPost table.
A3:Read data from txComment table.
A4:Create a table sequence with A2’s Comment field and post_no field, which is renamed pno.
A5:Associate A4’s table sequence with A3’s table sequence via comment_no and append comment_content field, which is renamed Content.
A6:Group records by pno and return the current table; ~ represents the current record.
A7:Close the database connection.
Both scripts transform the embedded data into data stored in columns and rows, perform association, and then group the associated records. But SPL script is clearer and more concise.
3. Association with embedded structures (3)
This involves two associated tables. Table A is associated with the sub-documents in table B, and the to-be-returned information is contained in the sub-records. For example, product field in collection2 contains array values and the association result needs to include fields in isCompleted array in collection2.
Test data:
collection1:
{
_id: '5bc2e44a106342152cd83e97',
description:
{
status: 'Good',
machine: 'X'
},
order: 'A',
lot: '1'
};
collection2:
{
_id: '5bc2e44a106342152cd83e80',
isCompleted: false,
serialNo: '1',
batchNo: '2',
<!-- note the subdocuments here-->
product: [
{ order: 'A', lot: '1' },
{ order: 'A', lot: '2' }
]
}
Desired result:
{
_id: 5bc2e44a106342152cd83e97,
description:
{
status: 'Good',
machine: 'X',
},
order: 'A',
lot: '1' ,
isCompleted: false,
serialNo: '1',
batchNo: '2'
}
MongoDB script:
db.collection1.aggregate([{ $lookup: { from: "collection2", let: {order: "$order", lot: "$lot"}, pipeline: [{ $match: { $expr:{$in: [ { order: "$$order", lot: "$$lot"}, "$product"] } } }], as: "isCompleted" } }, { $addFields: { "isCompleted": {$arrayElemAt: [ "$isCompleted", 0] } } }, { $addFields: { // add the required fields to the top level structure "isCompleted": "$isCompleted.isCompleted", "serialNo": "$isCompleted.serialNo", "batchNo": "$isCompleted.batchNo" } }]) |
$lookup
performs an associated query between the two tables. The first $addFields
gets the first field from isCompleted array, and the second $addFields
adds the other required fields.
SPL script:
A | |
---|---|
1 | =mongo_open("mongodb://127.0.0.1:27017/raqdb") |
2 | =mongo_shell(A1,"collection1.find()").fetch() |
3 | =mongo_shell(A1,"collection2.find()").fetch() |
4 | =A3.conj(A2.select(order:A3.product.order,lot:A3.product.lot).derive(A3.serialNo:sno,A3.batchNo:bno)) |
5 | >A1.close() |
SPL script explanation:
A1:Connect to MongoDB database.
A2:Read data from collection1 table.
A3:Read data from collection2 table.
A4:Query A2’s table according to order field and lot field, add A3’s serialNo field and batchNo field, and return the associated table sequence.
A5:Close the database connection.
Both scripts get the desired result. SPL script is clear in selecting required data from the embedded documents and concatenating them into a new table sequence.
4. Association with selected fields
Create a new table through association and with selected fields.
collection1:
user1 | user2 | income |
---|---|---|
1 | 2 | 0.56 |
1 | 3 | 0.26 |
collection2:
user1 | user2 | output |
---|---|---|
1 | 2 | 0.3 |
1 | 3 | 0.4 |
2 | 3 | 0.5 |
Desired result:
user1 | user2 | income | output |
---|---|---|---|
1 | 2 | 0.56 | 0.3 |
1 | 3 | 0.26 | 0.4 |
MongoDB script:
db.c1.aggregate([ { "$lookup": { "from": "c2", "localField": "user1", "foreignField": "user1", "as": "collection2_doc" }}, {"$unwind": "$collection2_doc"}, { "$redact": { "$cond": [ {"$eq": [ "$user2", "$collection2_doc.user2"] }, "$$KEEP", "$$PRUNE" ] }}, { "$project": { "user1": 1, "user2": 1, "income": "$income", "output": "$collection2_doc. output" }} ]).pretty() |
$lookup
performs an associated query. $redact
traverses records to perform conditional handling. $project
selects the desired fields.
SPL script:
A | |
---|---|
1 | =mongo_open("mongodb://127.0.0.1:27017/raqdb") |
2 | =mongo_shell(A1,"c1.find()").fetch() |
3 | =mongo_shell(A1,"c2.find()").fetch() |
4 | =A2.join(user1:user2,A3:user1:user2,output) |
5 | >A1.close() |
SPL script explanation:
A1: Connect to MongoDB database.
A2: Read data from collection1 table.
A3: Read data from collection2 table.
A4: Join the two tables by user1 and user2, append A3’s output field and return a new table sequence.
A5: Close the database connection.
Both MongDB script and SPL script achieve the computing goal. The latter combines some of fields in two tables via a join into a new table, which is similar to relational database computation.
5. Multi-table association
This is the association between more than two tables to get a merged table.
Doc1:
_id | firstName | lastName |
---|---|---|
U001 | shubham | verma |
Doc2:
_id | userId | address | mob |
---|---|---|---|
2 | U001 | Gurgaon | 9876543200 |
Doc3:
_id | userId | fbURLs | twitterURLs |
---|---|---|---|
3 | U001 | http://www.facebook.com | http://www.twitter.com |
Merged result:
{
"_id" : ObjectId("5901a4c63541b7d5d3293766"),
"firstName" : "shubham",
"lastName" : "verma",
"address" : {
"address" : "Gurgaon"
},
"social" : {
"fbURLs" : "http://www.facebook.com",
"twitterURLs" : "http://www.twitter.com"
}
}
MongoDB script:
db.doc1.aggregate([ {$match: { _id: ObjectId("5901a4c63541b7d5d3293766") } }, { $lookup: { from: "doc2", localField: "_id", foreignField: "userId", as: "address" } }, { $unwind: "$address" }, { $project: { "address._id": 0, "address.userId": 0, "address.mob": 0 } }, { $lookup: { from: "doc3", localField: "_id", foreignField: "userId", as: "social" } }, { $unwind: "$social" }, { $project: { "social._id": 0, "social.userId": 0 } } ]).pretty(); |
There can be different MongoDB scripts to do this and various merged results in MongoDB thanks to the database’s unique data structure.
SPL script:
A | |
---|---|
1 | =mongo_open("mongodb://127.0.0.1:27017/raqdb") |
2 | =mongo_shell(A1,"doc1.find()").fetch() |
3 | =mongo_shell(A1,"doc2.find()").fetch() |
4 | =mongo_shell(A1,"doc3.find()").fetch() |
5 | =A2.join(_id,A3:userId,address,mob) |
6 | =A5.join(_id,A4:userId,fbURLs,twitterURLs) |
7 | >A1.close() |
This SPL script is similar to that in the previous example, with only one more joined table. Every join adds a new field to form a merged table.
Obviously, the SPL script is concise and coherent.
6. Associated query by array values
Get records from a table according to the array field in its associated table and generate a new table with certain fields.
Test data:
users:
_id | Name | workouts |
---|---|---|
1000 | xxx | [2,4,6] |
1002 | yyy | [1,3,5] |
workouts:
_id | Date | Book |
---|---|---|
1 | 1/1/2001 | Othello |
2 | 2/2/2001 | A Midsummer Night's Dream |
3 | 3/3/2001 | The Old Man and the Sea |
4 | 4/4/2001 | GULLIVER’S TRAVELS |
5 | 5/5/2001 | Pickwick Papers |
6 | 6/6/2001 | The Red and the Black |
Desired result:
Name | _id | Date | Book |
---|---|---|---|
xxx | 2 | 2/2/2001 | A Midsummer Night's Dream |
xxx | 4 | 4/4/2001 | GULLIVER’S TRAVELS |
xxx | 6 | 6/6/2001 | The Red and the Black |
yyy | 1 | 1/1/2001 | Othello |
yyy | 3 | 3/3/2001 | The Old Man and the Sea |
yyy | 5 | 5/5/2001 | Pickwick Papers |
MongoDB script:
db.users.aggregate([ { "$lookup": { "from" : "workouts", "localField" : "workouts", "foreignField" : "_id", "as" : "workoutDocumentsArray" }}, {$project: { _id:0,workouts:0} } , {"$unwind": "$workoutDocumentsArray"}, {"$replaceRoot": { "newRoot": { $mergeObjects: [ "$$ROOT", "$workoutDocumentsArray"] } } }, {$project: { workoutDocumentsArray: 0} } ]).pretty() |
Put the result of associated query between users and workouts into an array, split the array and move the embedded document to the top level, and remove the unwanted fields.
SPL script (users.dfx):
A | |
---|---|
1 | =mongo_open("mongodb://127.0.0.1:27017/raqdb") |
2 | =mongo_shell(A1,"users.find()").fetch() |
3 | =mongo_shell(A1,"workouts.find()").fetch() |
4 | =A2.conj(A3.select(A2.workouts^~.array(_id)!=[]).derive(A2.name)) |
5 | >A1.close() |
SPL script explanation:
A1: Connect to MongoDB database.
A2: Read data from users table.
A3: Read data from workouts table.
A4: Get records from workouts table where _id field is included in workouts field of A2’s table sequence, add name field to the records, and return the concatenated table sequence.
A5: Close the database connection.
The operation is considered normal only if the matching result isn’t null. So _id field is converted into a sequence to make the comparison convenient in A4.
The SPL script is more integration-friendly, more flexible and much simpler.
7. Calling esProc dfx script in Java
The result of executing an SPL script can be easily integrated by a Java application. esProc provides JDBC driver to be accessed withJava stored procedure. That is the same as calling a SQL stored procedure in Java. (For esProc JDBC deployment, refer to 1.9 esProc JDBC in esProc Tutorial)
Below is the process of calling an SPL script from Java:
public void testUsers(){
Connection con = null;
com.esproc.jdbc.InternalCStatement st;
try{
//Establish database connection
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
// Call the stored procedure, in which user is the dfx file’s name
st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call users ()");
// Execute the stored procedure
st.execute();
//Get result set
ResultSet rs = st.getResultSet();
......
catch(Exception e){
System.out.println(e);
}
This is a standard JDBC invocation, which is easy to embed an SPL script into a Java application. esProc also supports the ODBC driver, which makes it easy to be integrated with languages that use ODBC.
Compared with relational databases, MongoDB stores data in a more complicated, more flexible way, has powerful and adaptable query syntax, and provides various functions and rich cooperative possibilities between them. It’s not easy to be proficient in MongoDB. esProc, however, features discreteness and ease of use to become the NoSQL database’s right-hand and can help it play to its full potential.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL