How to Write Simple & Powerful Script Data Sources for BIRT Reports
1. Preface: JVM-based SQL functions and stored procedures
Some databases, such as MySQL, don’t have analytic functions. Some others, such as Vertica, don’t support stored procedures. They turn to external Python or R script, or other languages, to deal with complicated data computations. But the scripting languages and Java, the mainstream programming language, are integration-unfriendly. Often, a lengthy Java script that tries to replace SQL functions or stored procedures aims at achieving a certain computing goal, and is unreusable.
It’s not easy to implement complicated logics even with analytic functions. Here’s a common computing task: Find the first N customers whose sales accounts for half of the total sum and sort them by amount in descending order. Oracle implements it this way:
with A as (selectCUSTOM,SALESAMOUNT,row_number() over (order by SALESAMOUNT) RANKING from SALES) select CUSTOM,SALESAMOUNT from (select CUSTOM,SALESAMOUNT,sum(SALESAMOUNT) over (order by RANKING) AccumulativeAmount from A) where AccumulativeAmount>(select sum(SALESAMOUNT)/2 from SALES) order by SALESAMOUNT desc |
The Oracle script sorts records by sales amount in ascending order, and then finds the customers whose sales amount to half of the total sum in an opposite direction according to the condition that the accumulated amount is greater than half of the total sum. In order to avoid window function’s mistake in handling same sales amounts when calculating the accumulated value, we calculate the sales amounts rankings in the first subquery.
esProc script:
A | B | |
---|---|---|
1 | =connect("verticaLink") | /Connect to Vertica database |
2 | =A1.query("select * from sales").sort(SALESAMOUNT:-1) | /Get the sales records and sort them by sales amount in descending order |
3 | =A2.cumulate(SALESAMOUNT) | /Calculate a sequence of accumulated values; the function is a replacement of database window function |
4 | =A3.m(-1)/2 | /Calculate half of total sales amount |
5 | =A3.pselect(~>=A4) | /Find the position in the accumulated value sequence where half of total sales amount falls |
6 | =A2(to(A5)) | /Get the record where half of total sales amount falls and records before it |
7 | >A1.close() | /Close database connection |
8 | return A6 | /Return A6’s result |
Instead of the complicated nested SQL plus window function, esProc uses concise syntax to implement the computing logic. Being applicable to all databases (data sources), the code is more universal.
esProc is driven by a JVM-based scripting language intended to handle structured data. As SQL functions and stored procedures, it can be integrated with a Java application to create migratable, versatile and database-independent computing logics. Such a computing logic run as a middle layer is separated from the data logic run in the database (data source) layer. The separation makes the overall application more scalable, more flexible and more maintainable.
2. Application scenario: Report data preparation
2.1 Reporting architecture
An esProc script embedded into the reporting layer is like a local logical database that doesn’t need deploying a server specifically. It stands as a data preparation layer between the reporting tool and data source for performing various complicated computations.
2.2 Integration
Let’s look at how to integrate esProc as the data preparation layer (take Vertica and BIRT as the example).
I. Integration of basic jars
esProc JDBC has three basic jars, which are situated in [installation directory]\esProc\lib :
esproc-bin-xxxx.jar esProc computing engine and JDBC driver
jdom-1.1.3.jar Parse configuration files
icu4j-60.3.jar Handle internationalization
Besides, there are jars for achieving specific functionalities. To use databases as the data sources in esProc JDBC, their driver jars are required. As Vertica is the data source here, the corresponding jars are needed (Take Vertica 9.1.0 as an example).
vertica-jdbc-9.1.0-0.jar Download it from Vertica website
Those jars should be copied and placed under BIRT’s [installation directory]\plugins\org.eclipse.birt.report.data.oda.jdbc_4.6.0.v20160607212.
II. Deploy the configuration file
The configuration file, raqsoftConfig.xml, contains script file path, data source connection configuration information, and etc.
It is located in [esProc installation directory]\esProc\config, and needs to be copied and placed under BIRT designer class path [installation directory]\plugins\org.eclipse.birt.report.data.oda.jdbc_4.6.0.v20160607212.
The file’s name must not be changed.
2.3 BIRT development environment
1. Copy all the required jars under BIRT’s WEB-INF\lib;
2. Copy raqsoftConfig.xml under BIRT’s WEB-INF\classes.
2.3.1 Example 1: Normal call
♦1. Below is Sales table in Vertica database. (The table contains data of the years 2013, 2014 and 2015, and queried via vsql)
♦2. Create an esProc script
(1) Put Vertica JDBC driver jars into esProc designer path
Download JDBC driver jar (vertica-jdbc-9.1.0-0.jar, for instance) from Vertica website, and put it under 【esProc installation directory】\common\jdbc.
(2) Add Vertica data source
Open esProc designer, click Tool -> Datasource to add the Vertica data source in JDBC way.
Click OK to save the configuration and then Connect to connect to the data source.
The data source is successfully connected once the data source name turns pink.
(3) Create an algorithm script (saved as VerticaExternalProcedures.dfx) through File – >New.
A | B | |
---|---|---|
1 | =connect("verticaLink") | /Connect to Vertica database |
2 | =A1.query("select * from sales").sort(SALESAMOUNT:-1) | /Get the sales records and sort them by sales amount in descending order |
3 | =A2.cumulate(SALESAMOUNT) | /Calculate a sequence of accumulated values; the function is a replacement of database window function |
4 | =A3.m(-1)/2 | /Calculate half of total sales amount |
5 | =A3.pselect(~>=A4) | /Find the position in the accumulated value sequence where half of total sales amount falls |
6 | =A2(to(A5)) | /Get the record where half of total sales amount falls and records before it |
7 | >A1.close() | /Close database connection |
8 | return A6 | /Return A6’s result to BIRT as the report source data set |
♦3. Deploy the script
Put the script file under the script file main directory configured in raqsoftConfig.xml.
♦4. Configure data source connection: verticaLink, in raqsoftConfig.xml
<DB name="verticaLink">
<property name="url" value="jdbc:vertica://192.168.10.10:5433/ForEsprocTestDB"/>
<property name="driver" value="com.vertica.jdbc.Driver"/>
<property name="type" value="0"/>
<property name="user" value="dbadmin"/>
<property name="password" value="runqian"/>
<property name="batchSize" value="0"/>
<property name="autoConnect" value="false"/>
<property name="useSchema" value="false"/>
<property name="addTilde" value="false"/>
<property name="needTransContent" value="false"/>
<property name="needTransSentence" value="false"/>
<property name="caseSentence" value="false"/>
</DB>
♦5. Create a new report BIRT report designer and add esProc data source: esProcConnection.
The Driver class is com.esproc.jdbc.InternalDriver(v1.0), which needs esproc-bin-xxxx.jar and other jars. Database URL is jdbc:esproc:local://
♦6. BIRT calls esProc data set (Vertica’s external stored procedure)
Create a new data set; select the esProc data source (esProcConnection); the data set type is SQL Stored Procedure Query.
Next, enter {call VerticaExternalProcedures()} under Query Text. VerticaExternalProcedures is esProc script file name.
Now we can preview the computing result with Preview Results.
That’s the process of how to use esProc script as Vertica’s external stored procedure to prepare data source for a report.
♦7. Web presentation
Take a grid report as an example. Below is the report design:
Publish preview:
2.3.2 Example 2: Parameter-based call
We change the above computing task a bit. Find the first N customers whose sales accounts for half of the total sum by year and sort them by amount in descending order. The task requires a parameter filtering.
♦1. Add a year parameter for filtering.
Open esProc designer, and click Program –> Parameter –> Add to add parameter qyear (the name can be different from a report parameter).
Modified script:
A | B | |
---|---|---|
1 | =connect("verticaLink") | /Connect to Vertica database |
2 | =A1.query("select * from sales where year(subscriptiondate)=?",qyear).sort(SALESAMOUNT:-1) | /qyear is the parameter receiving a typed year to find the corresponding sales records and sort them by sales amount in descending order |
3 | =A2.cumulate(SALESAMOUNT) | /Calculate a sequence of accumulated values; the function is a replacement of database window function |
4 | =A3.m(-1)/2 | /Calculate half of total sales amount |
5 | =A3.pselect(~>=A4) | /Find the position in the accumulated value sequence where half of total sales amount falls |
6 | =A2(to(A5)) | /Get the record where half of total sales amount falls and records before it |
7 | >A1.close() | /Close database connection |
8 | return A6 | /Return A6’s result to BIRT as the report source data set |
A2 performs conditional filtering.
♦2. Define a year parameter for the report
Define an input parameter named qyear for the report.
Open the report, click Data Explorer –> Report Parameter –> New parameter to add the parameter.
The second red box is the default value of parameter qyear.
♦3. Add a data set parameter and link it with the report parameter
Create data set VerticaExternalProcedures.
There is a bit different about the Query Text, which is {call VerticaExternalProcedures(?)}. The question mark (?) is a placeholder for an input year parameter. Under Parameters, add data set parameter qyear and link it with report parameter qyear.
Under Preview Results, query data of the year 2013 according to the default value of qyear.
After passing the value “2015” to the parameter:
♦4. Web presentation
Query data of the year 2015:
After modifying the URL or passing “2013” to qyear:
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL