How Reporting Tools Access MongoDB Data

MongoDB is a popular NoSQL database. Some reporting tools do not supply drivers for connecting to MongoDB, and users have to write hard code in a user-defined data set to access it. Others, such as Birt and JasperReport, supply built-in drivers to enable the use of official mongo Shell for computations. But the code is complicated and computing abilities are not enough. It is hard for them to handle complex reports.

A wise choice is to use esProc SPL, the open-source class library under JAVA ecosystem. SPL offers powerful computational capabilities for processing multilevel data, making it particularly suitable for handling bson data stored in MongoDB. It has built-in, simple-to-use MongoDB access functions and offers JDBC/ODBC driver for being invoked by reporting tools conveniently.

SPL has built-in MongoDB data retrieval functions. To retrieve data from a collection through mongo Shell, for instance:


A

1

=mongo_open("mongodb://127.0.0.1:27017/mongo")

2

=mongo_shell(A1,"keyValue.find()").fetch()

3

=mongo_close(A1)

      

If SPL just stops at the use of mongo Shell, it would be no better than reporting tools that connects to MongoDB directly. Its real advantage lies in the ability to simplify MongoDB-based computations.

SPL has a wealth of built-in functions that are simpler and more intuitive than Mongo Shell, getting rid of the complicated MongoDB query syntax. Suppose we are trying to split value field into multiple numbers according to commas and get records where the Nth number falls in a specified interval, SPL has the following code where pN, pStart and pEnd are query parameters:


A

1

=mongo_open("mongodb://127.0.0.1:27017/mongo")

2

=mongo_shell@x(A1,"keyValue.find()").fetch()

3

=A2.select((valueN=value.split@pc()(pN),   valueN>=pStart && valueN<pEnd))

SPL provides JDBC/ODBC driver to be conveniently integrated, as databases, by reporting tools. Take the above SPL code as an example, we can first save it as a script file (splitQuery.splx) and then invoke it in the reporting tool as we do with a RDB stored procedure through JDBC:

{call splitQuery(?,?,?) }

 

More examples:


A

B

2


3

=A2.select(Amount>1000 ||   like@c(Client,"*s*"))

//Fuzzy query

4

=A2.groups(year(OrderDate),Client;sum(Amount))

//Grouping & aggregation

5

=A2.sort(Dept,-Salary)

//Sort

6

=A2.id(State)

//Distinct

 

SPL offers simple and easy-to-use join functions that make up for the lack of ability to perform join operations in old MongoDB versions and that enhance $lookup function’s capability and simplify code for new versions.

To left join two collections according to their logical foreign keys key1 and key 2, for instance:


A

B

1

=mongo_open("mongodb://127.0.0.1:27017/mongo")

2

=mongo_shell@x(A1,"c1.find()").fetch()

=mongo_shell@x(A1,"c2.find()").fethc()

3

=A2.join(key1:key2,B2:key1:key2,output)


The code will be really complicated if we use mongo Shell to handle it:

db.c1.aggregate([
	{"\lookup":{
		"from":"c2",
		"localField":"key1",
		"foreignField":"key1",
		"as":"collection2_doc"
	}},
{"\$unwind":"\$collection2_doc"},
{"\$redact":{
"\$cond":[
	{"\$eq":["\$key2","\$collection2_doc.key2"]},
	"\$\$keep",
	"\$\$prune"
	]
}},
{"\$project":{
	"key1":1,
	"key2":1,
	"income":"\$income",
	"output":"\$collection2_doc.output"
}}
]}.pretty()

SPL provides multilevel data type for computing multilevel data. MongoDB supports handling multilevel data naturally, but it does not have corresponding computing ability. One instance. We are trying to sum income and output in each record of the following collection.

_id

Income

Output

1

"cpu":1000, "mem":500,  "mouse":"100"

"cpu":1000, "mem":600   ,"mouse":"120"

2

"cpu":2000,"mem":1000, "mouse":"50","mainboard":500

{"cpu":1500, "mem":300}

The mongo Shell code is lengthy and intelligible:

 

var fields = [  "income", "output"];
db.computer.aggregate([ 
   { 
      $project:{ 
         "values":{ 
            $filter:{ 
               input:{ 
                    "\$objectToArray":"$$ROOT"
               },
               cond:{ 
                  $in:[ 
                     "$$this.k",
                     fields
                  ]
               }
            }
         }
      }
   },
   { 
      \$unwind:"$values"
   },
   { 
      $project:{ 
         key:"$values.k",
         values:{ 
            "$sum":{ 
               "$let":{ 
                  "vars":{ 
                     "item":{ 
                          "\$objectToArray":"$values.v"
                     }
                  },
                    "in":"$$item.v"
               }
            }
         }
      }
   },
   {$sort: {"_id":-1}},
   { "$group": {
    "_id": "$_id",
    'income':{"\$first":   "$values"},
    "output":{"\$last":   "$values"}
    }},
]);

While SPL code is short and easy to understand:


A

2

3

=A2.new(_id:ID,income.array().sum():INCOME,output.array().sum():OUTPUT)

 

SPL syntax is powerfully expressive, making the language exceptionally good at handling computations with complex logic, including stepwise computations, inter-row computations and post-grouping computations. It can deal with many computations effortlessly that are hard to handle in both SQL and the stored procedure. Take an inter-row computation as example. To find the largest number of days when a stock rises consecutively, SPL core code is only a single line:


A

2

3

=a=0,A2.max(a=if(price>price[-1],a+1,0))

      

SPL provides professional IDE equipped with a complete set of debugging functionalities and lets users observe the result of each step with grid-style coding, making it particularly suited to implementing multi-step algorithms with complex logic.

undefined 

SPL also supports many more No SQL data sources, including CSV, XLS, WebService, XML, RESTful, JSON, Hadoop, Redis, ElasticSearch, SalesForce and Cassandra, as well as `mixed computations between any types of data sources or databases.