How to Parse and Generate XLS in Java?

POI provides all-around functionalities for parsing and generating xls file. The problem is that it has a too low-level API that makes users write a lot of code from scratch even for the simple read/write actions. And to be all-around means that every detail needs to be taken care of, leading to complicated, heavy, hard-coded details handling process. Another issue is that POI invests too much in configuring too many appearance properties, such as font, color, thickness and alignment, yet its data processing capabilities are not as professional as expected when data parsing and generation should have been the focus of such a tool. Hardcoding becomes the alternative again to reinforce the POI-based approach.

One idea is to encapsulate POI while offering a simplistic interface good at data processing. esProc SPL is just the desired open-source library.

With a rowwise xls file of regular format, SPL has T function to read data from it in the most concise way. Suppose we have an Excel file where each row is an order record and where the first row contains column headers. SPL only needs one single line of code to read the file:

=T("D:/Orders.xls")

With a rowwise xls file of irregular format, SPL offers xlsimport() function, which has a rich yet concise set of reading functionalities. Here are examples of some common scenarios:

\# To import a headerless file where the detail data begins directly from   the first row:

=file("D:/Orders.xlsx").xlsimport()

\# To import a file by skipping the first two rows containing the title:

=file("D:/Orders.xlsx").xlsimport@t(;,3)

\# To import rows from the 3rdto the 10th:

=file("D:/Orders.xlsx").xlsimport@t(;,3:10)

\# To import only three columns:

=file("D:/Orders.xlsx").xlsimport@t(OrderID,Amount,OrderDate)

\# To import data of the specific sheet named "sheet3":

=file("D:/Orders.xlsx").xlsimport@t(;"sheet3")

The xlsimport function has more abilities – importing N rows backwards, opening a file using password, and importing data from a big file to name a few.

SPL offers JDBC driver to let users call SPL code from Java conveniently. To parse and compute records in an xls file, for instance:

…
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
String str="=T(\"D:/Orders.xls\").select(Amount>1000 && Amount<=3000 && like(Client,\"*S*\"))";
ResultSet result = statement.executeQuery(str);
…

Besides, query, grouping and joins are also SPL’s strong suit. But those are another topic. You can find relative discussions in SPL Excel Handling Examples.

 

With an xls file of unusually irregular format, SPL provides xlscell function to read data in a concise and flexible way. The function’s most basic use is to read and write data in a specified range of a given sheet. To read cell C2 in sheet 1, for instance, we have the following SPL statement:

=file("d:/orders.xlsx").xlsopen().xlscell("C2")

Based on the agile SPL syntax, it can parse free-style XLS files. To read the following file as a standard two-dimensional table (a table sequence), for instance:

The file has extremely irregular format. This means a huge amount of coding work if you use POI. But the SPL code below is short:


A

B

C

1

=create(ID,Name,Sex,Position,Birthday,Phone,Address,PostCode)      


2

=file(“e:/excel/employe.xlsx").xlsopen()    



3

[C,C,F,C,C,D,C,C]

[1,2,2,3,4,5,7,8]


4

For

=A3.(~/B3(#)).(A2.xlscell(~))


5


if len(B4(1))==0

Break

6


>A1.record(B4)


7


>B3=B3.(~+9)


The SPL code can be saved as a separate script file independent of the Java code, and does not need to be recompiled when there are any changes, significantly loose couplings. So, SPL is particularly suitable for handling parsing or computational scenarios that are complicated or that may undergo frequent modifications. Now we can invoke the above SPL script file from Java as we call a stored procedure:

…
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
ResultSet result = statement.executeQuery("call getEmps()");
…

SPL’s professional IDE, equipped with a complete set of debugging functionalities and letting users observe the intermediate result of each step, is fit for programming parsing processes with complex logics.

Now let’s move on to talk about Excel data generation.

 

The SPL code for generating xls data is simple and easy to learn. The result of the above parsing operation is a table sequence stored in SPL’s cell A1. Now we are trying to write A1 into sheet 1 of an xls file with the first row containing column headers. To do this, a single line of code is sufficient:


A

B

C


… 



8

=file("e:/result.xlsx").xlsexport@t(A1)



SPL provides a wealth of generation functionalities to write a table sequence to the specified sheet, or to write certain data of the table sequence in. To write specified columns in a table sequence to an Excel file, for instance:

=file("e:/scores.xlsx").xlsexport@t(A1,No,Name,Class,Maths)

It is convenient to append data in SPL. For example, we have an non-empty xls file and trying to append table sequence A1 to the end of the file, keeping same format as the last row in the original file:

=file("e:/scores.xlsx").xlsexport@a(A1)

Apart from reading functionalities, the xlscell function has same flexible and concise writing functionalities, enabling users to populate data to the specified irregular, discontinuous cells. In an Excel file recording company information, the cells in blue contains headers of irregular formats, and we need to enter data to corresponding white cells. The desired result is as follows:

It takes a huge block of lengthy code to achieve the task with POI, while SPL code is short and simple (cells in the A1:F5 area contain to-be-populated data):


A

B

C

D

E

F

1

Mengniu Funds

2017

3

58.2

364

300

2

8.5

50

200

100

400

200

3

182.6

76.3

43.7

28.5

16.4


4

120

1.07

30

0.27

90

0.8

5

154

6

4




6

=file("e:/result.xlsx")

=A6.xlsopen()

7

=C6.xlscell("B2",1;A1)

=C6.xlscell("J2",1;B1)

=C6.xlscell("L2",1;C1)

8

=C6.xlscell("B3",1;D1)

=C6.xlscell("G3",1;E1)

=C6.xlscell("K3",1;F1)

9

=C6.xlscell("B6",1;[A2:F2].concat("\t"))

=C6.xlscell("H6",1;[A3:E3].concat("\t"))

10

=C6.xlscell("B9",1;[A4:F4].concat("\t"))

=C6.xlscell("B11",1;[A5:C5].concat("\t"))

11

=A6.xlswrite(B6)




Make a not that there are continuous cells in row 6, row 9 and row 11 in the Excel file. SPL can simplify code to do a batch input while POI can only feed data to them one by one.

In short, the sophisticatedly encapsulated SPL is more convenient and concise than POI in parsing and generating xls, especially when reading and writing files of irregular formats.