The skill of updating database with esProc
When the database is updated in batches, the performance of update function is better than execute function.
For example, source and target are two different data sources. Callrecorda is the physical table in source, with 10000 records. Callrecordb is the physical table in target, without records. Now, you need to synchronize the data in callrecorda to callrecordb.
When the execute function is used for batch update, it can be found that the performance is not ideal. The code is as follows:
A |
B |
|
1 |
=connect("source") |
/Connect source |
2 |
=A1.query@x("select * from callrecordA") |
/Take callrecordA |
3 |
||
4 |
=connect("target") |
/Connect target |
5 |
=A4.execute("truncate table callrecordB") |
/ To facilitate repeated testing, clear callrecordb first |
6 |
=now() |
/ Record test time |
7 |
=A2.(A4.execute("insert into callrecordB values(?,?,?,?,?,?)",~. SERIALNUMBER, ~.OUTID,~.INID,~.CALLTIME,~.DURATION,~.CHARGE)) |
/ Batch update |
8 |
=A4.close() |
|
9 |
=interval@ms(A6,now()) |
/SQL execute statement,8251ms |
In fact, the above algorithm executes insert statements in loop, and does not make full use of JDBC's batch update mechanism, so its performance is poor. Moreover, the above algorithm only inserts data. If there are both inserts and modifications, it will be a lot of trouble to process.
After using the update function, you can see that the performance has improved significantly. The code is as follows:
11 |
=connect("target") |
/Connect target |
12 |
=A11.execute("truncate table callrecordB") |
/ To facilitate repeated testing, clear callrecordb first |
13 |
=now() |
/ Record test time |
14 |
=A11.update@i(A2,callrecord;SERIALNUMBER) |
/ Batch update |
15 |
=A11.close() |
|
16 |
=interval@ms(A13,now()) |
/SPL update function,2075ms |
The function update uses the executebatch mechanism of JDBC for batch update, so the performance is better. The @i option means that only insert statements need to be generated, @u means that only update statements need to be generated. If there is no option, it means that there is both insert and update, that is, the target table has data, and the primary keys of the source table and the target table need to be compared. If a primary key exists in the source table and does not exist in the target table, an insert statement will be generated. If a primary key exists in both the source table and the target table, an update statement will be generated.
The database usually provides synchronization / import tools, which usually have better performance and also support command-line calls. SPL executes the command line, so you can call these tools. For example, to import data with Oracle's sqlldr, you can use the following code:
A |
B |
|
1 |
=connect("source") |
/Connect source |
2 |
=A1.query@x("select * from callrecordA") |
/Take callrecordA |
3 |
=file("d:\\temp\\callrecordB.txt").export(A2;"|") |
/Generate TXT/csv file,separator is | |
4 |
=system("cmd /C sqlldr system/runqian@orcl control=d:\\temp\\callrecordB.ctl data=d:\\temp\\callrecordB.txt log=log.log bad=bad.log errors=5000 rows=1000 bindsize=10485760") |
/ Execute command line |
A1-A3:Export the table in source to a text file with a separator of |. If the text file already exists, this step can be omitted.
A4:Execute the command line and call sqlldr. The command format should meet the official requirements of sqlldr. Note that callrecordb.ctl is the control file required by sqlldr. The format is as follows:
load data
CHARACTERSET UTF8
append
into table callrecordB
fields terminated by '|'
trailing nullcols
(
SERIALNUMBER INTEGER EXTERNAL,
OUTID INTEGER EXTERNAL,
INID INTEGER EXTERNAL,
CALLTIME date "yyyy-mm-dd hh24:mi:ss",
DURATION INTEGER EXTERNAL,
CHARGE INTEGER EXTERNAL
)
SPL supports parallel computing, including parallel execution of synchronization/import tools, so you can split single text into multiple texts and import multiple files at the same time to achieve better performance. Different synchronization/import tools have different parallel requirements. The common requirement is not to lock the table, and there is no unique index. For example, to execute sqlldr in parallel, you can use the following code:
A |
B |
C |
|
1 |
=file("d:\\temp\\callrecordB.txt") |
/ Open single file |
|
2 |
=2.(file(concat("d:\\temp\\callrecordB",~,".txt")). export(A1.cursor(;~:2,"|");"|")) |
/ Split into multiple files |
|
3 |
fork to(2) |
=system(concat("cmd /C sqlldr system/runqian@orcl control=d:\\temp\\callrecordB. ctl data=d:\\temp\\callrecordB", A3,".txt direct=y parallel=true log=log.log bad=bad.log errors=5000 bindsize=10485760")) |
/ Multi file parallel import |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version