How to Load Excel Data into Database
For data analysis work, it’s not uncommon to need to first structuralize Excel data before computing it or importing it to a relational database. The Excel file often has a non-standard format and their type is unfixed for different occasions. This results in a large amount of coding work when trying to structuralize data. Moreover, the code is unreusable. One has to do the parsing job for each specific file format and type before they begin the development process.
Here’s an Excel file (sales.xlsx) of simple format:
ORDERID |
CLIENT |
SELLERID |
AMOUNT |
ORDERDATE |
1 |
LFRP |
12 |
9268 |
2020-12-07 |
2 |
YG |
10 |
3458 |
2020-12-07 |
3 |
OHY |
3 |
10298 |
2020-12-07 |
4 |
ANZSW |
12 |
7662 |
2020-12-07 |
5 |
QMTP |
6 |
7947 |
2020-12-07 |
There is an existing database table (sales) with the following structure:
Field name |
Type |
ORDERID |
int |
CLIENT |
varchar(10) |
SELLERID |
smallint |
AMOUNT |
double |
ORDERDATE |
date |
The task is to Write (update) the Excel data into the database table sales.
It’s convenient to do this with esProc SPL.
Download esProc installation package HERE.
1. Start esProc, configure database connection and name it. The parameters should be consistent with those in the JDBC connection configuration.
Do the configurations in the following window (Tool->Datasource Connection->Datasource):
2. Write SPL script sales2db.dfx in esProc:
A |
B |
|
1 |
=file("sales.xlsx").xlsimport@t() |
/ Import sales.xlsx |
2 |
=connect("mysql") |
/ Connect to database |
3 |
>A2.update(A1,sales,ORDERID,CLIENT,SELLERID,AMOUNT,ORDERDATE) |
/db.update function updates Excel data to database table sales |
4 |
>A2.close() |
/ Disconnect from database |
If we do the update only, just use update()function. If we already know that data in the Excel file is all new, we can use @i option with update()function (update@i) to generate INSERT statement only. This is much faster because the program doesn’t need to check if it needs to generate an UPDATE statement. Learn more about db.update() function.
3. Execute the esProc script to update the database table. Below is sales table after the update is finished:
If the Excel file contains a relatively large amount of data and unsuitable or impossible to be loaded into the memory during the runtime environment, we just need to use @c option in xlsimport()function to the loading work. In that case, A1 will be =file("sales.xlsx").xlsimport@tc().
When the Excel file has a complicated format, as shown below in the employee information emp.xlsx:
ID: |
1 |
||
Name: |
jack |
Sex: |
F |
location |
LA |
||
Birthday: |
1987/9/13 |
||
phone: |
36527183 |
||
ID: |
2 |
||
Name: |
tom |
Sex: |
M |
location |
CA |
||
Birthday: |
2000/1/1 |
||
phone: |
56253674 |
There is a database table emp that stores employee data in the following structure:
Field names |
Type |
id |
int |
name |
varchar(10) |
sex |
char(1) |
location |
char(2) |
birthday |
date |
phone |
char(8) |
1.Write script emp2db.dfx in esProc:
A |
B |
C |
|
1 |
=create(id,name,sex,location,birthday,phone) |
||
2 |
=file("emp.xlsx").xlsopen() |
||
3 |
[B,B,D,B,B,B] |
[1,2,2,3,4,5] |
|
4 |
for |
=A3.(~/B3(#)).(eval($[A2.xlscell()/~/")")) |
|
5 |
if len(B4(1))==0 |
break |
|
6 |
>A1.record(B4) |
||
7 |
>B3=B3.(~+6) |
||
8 |
=connect("mysql") |
||
9 |
=A8.update(A1,emp) |
||
10 |
>A8.close() |
A1 Create an empty table sequence consisting of columns "id,name,sex,location,birthday,phone".
A2 Open the Excel file to be loaded to the database.
A3 Define a sequence of column numbers commanding information of each employee.
B3 Define a sequence of row numbers leading information of each employee.
A4 Use for statement to read information of every employee.
B4 A3.(~/B3(#)) gets the sequence of cells holding data of the current employee and reads in the corresponding values to have a sequence of employee data. The first round of loop reads through [B1,B2,D2,B3,B4,B5], and by adding 6 to the row number, the second round reads through [B7,B8,D8,B9,B10,B11], and so on. $[A2.xlscell(] is equivalent to "A2.xlscell(". Both represents a string, but the former can automatically adjust the expression when the number of A2 changes. For instance, if a row is inserted before A2, the former will automatically become $[A3.xlscell(] while the latter will remain unchanged.
B5 Check whether the employee ID is null. Exit loop to terminate data retrieving if it is null.
B6 Write an employee record at the end of A1’s table sequence.
B7 Add 9 to each member of B3’s sequence of row numbers to get the next employee record.
A8-A10 Connect to database to store the employee information to database table emp, and then close the connection.
2. Execute the esProc script. Below is emp table after the loading is finished:
At times we need to load multiple Excel files of same structure to the database. Take files of simple format as an example. To load sales files of same structure (sales.xlsx, sales1.xlsx, sales2.xlsx) to the database, we have the following script:
A |
B |
|
1 |
=connect("mysql") |
|
2 |
=directory@p("sales*.xlsx") |
|
3 |
for A2 |
=file(A3).xlsimport@ct() |
4 |
>A1.update(B3,sales,ORDERID,CLIENT,SELLERID,AMOUNT,ORDERDATE) |
|
5 |
>A1.close() |
A1 Connect to database.
A2 Get all files matching the format of ales*.xlsx and return a sequence of file names.
A3 Loop through the sequence of file names.
B3 Read in the Excel file of corresponding to the current file name.
B4 Use db.update() function to update Excel data to database table sales.
A5 Close database connection after the loop is finished.
You can also use load Excel files of complicated format to the database using the same way.
Read How to Call an SPL Script in Java to learn about the integration of an esProc SPL script with a Java program.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_Desktop/
SPL Learning Material 👉 https://c.scudata.com
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProcDesktop
Linkedin Group 👉 https://www.linkedin.com/groups/14419406/