* How to Make a Crosstab in MongoDB
Crosstab is a commonly used grouping & aggregation table. It shows the aggregates on a specific field and groups them; groups rows by a specific variable, and transposes groups into rows and other variables or variable combinations into columns to form a data table for statistical analysis.
Take collection events for instance. We count events for each user, append the aggregate results by userID at the end of columns and those by EventID at the end of rows. Below is the source table:
UserID |
EventID |
Count |
Date |
1 |
1 |
10 |
2020-01-02 |
1 |
1 |
15 |
2020-01-02 |
1 |
2 |
12 |
2020-01-02 |
1 |
3 |
18 |
2020-01-03 |
2 |
1 |
5 |
2020-01-02 |
3 |
2 |
10 |
2020-01-02 |
Group rows by UserID and aggregate EvendID and Count respectively to get the following crosstab:
UserID |
Evt1 |
Evt2 |
Evt3 |
Events |
1 |
25 |
12 |
18 |
55 |
… |
… |
|||
… |
… |
|||
Total |
30.0 |
60.0 |
60.0 |
70.0 |
Here’s how MongoDB handles it. It aligns each document for each event and adds a field to record the current number of events. This is equivalent to data rearrangement. Then it groups documents by UserID and count the frequency of each event in each group. And then it sums the event column by column and appends the results at the end of documents. Finally it filters the data set to display the desired documents. The process is roundabout and complicated.
esProc SPL, however, does it in an easy and effortless way. It groups records by UserID and then performs aggregates.
Download esProc installation package HERE.
Directions:
1. Write SPL script events.dfx in esProc:
A |
B |
|
1 |
=mongo_open("mongodb://127.0.0.1:27017/raqdb") |
/ Connect to MongoDB database |
2 |
=mongo_shell(A1,"events.find(,{_id:0})") |
/ Get data of events collection |
3 |
=A2.pivot@s(UserID;EventID,sum(Count); 1:"Evt1",2:"Evt2", 3:"Evt3").derive(Evt1+Evt2+Evt3:Events) |
/ Group records by UserID, count frequency of each EventID in each group and get the total |
4 |
>A3.insert(0, "Total", A3.sum(Evt1), A3.sum(Evt2), A3.sum(Evt3), A3.sum(Events)) |
/ Sum each event by column and append results to the sequence |
5 |
>A1.close() |
/ Close database connection |
2. Execute the script to return the final result:
A3 |
UserID |
Evt1 |
Evt2 |
Evt3 |
Events |
1 |
25 |
12 |
18 |
55 |
|
2 |
5 |
(null) |
(null) |
5 |
|
3 |
(null) |
10 |
(null) |
10 |
|
Total |
30 |
22 |
18 |
70 |
A3 groups records by UserID, count EventIDs in each group and then gets the total.
esProc provides JDBC interface to let you integrate the script into a Java program conveniently:
public static void doWork() {
Connection con = null;
java.sql.Statement st;
try{
Class.forName("com.esproc.jdbc.InternalDriver");
con = DriverManager.getConnection("jdbc:esproc:local://");
// Call script events.dfx
st=con.createStatement();
ResultSet rst = st.executeQuery("call events");
System.out.println(rst);
} catch(Exception e){
System.out.println(e);
} finally{
// Close database connection
if (con!= null) {
try {
con.close();
} catch(Exception e) {
System.out.println(e);
}
}
}
}
Read How to Call an SPL Script in Java to learn more about integration of esProc SPL script into a Java program.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL