How slow is JDBC of Oracle and MySQL?
It is often said that the IO performance of the database is not good, but we don’t have the actual understanding. Now let's actually test the JDBC read performance of Oracle and MySQL.
The reason for testing JDBC is that most applications are written by JAVA, so JDBC can only be used to access data. This test only reads data with JDBC and generates a Java record object (after all, data can only be used in the application after this step), without any calculation.
1. Data source
Using the data generated by TPCH, the customer table is selected for testing. The data records have 30 million rows and 8 fields. The original text file it generates is named customer.tbl, and the size of the file is 4.9G. The file data is imported into the tables of Oracle and MySQL using the data import tool provided by the database.
2. Test environment
The test was completed on an Intel server with 2 Intel 2670 CPUs, 2.6G main frequency, 16 cores and 64G memory. Data of database tables and text files are stored on the same SSD hard disk. All the tests were completed on the local server without consuming network transmission time.
3. Database Reading Test
Through JDBC interface provided by Oracle, data reading is carried out with SQL statements. Java is cumbersome to write, so the test is carried out with SPL scripts:
A |
||
1 |
=now() |
/ Recording time |
2 |
=connect("oracle") |
/Connect the database |
3 |
=A2.cursor("select * from customer") |
/Create cursor |
4 |
for A3,10000 |
/Fetch data in loop, 10000 pieces per loop |
5 |
=A2.close() |
/Close the connection |
6 |
=interval@s(A1,now()) |
/Calculating time consumed |
MySQL's test code is similar and we will not go into details
Test results (time unit: seconds)
First time |
Second time |
Number of rows per second |
|
Oracle |
293 |
281 |
106000 |
MySQL |
518 |
381 |
79000 |
The second time is faster because the operating system has a hard disk cache. Because our main purpose is to test the read time of JDBC, we take the second time as the criterion to reduce the impact of the database itself reading from the hard disk. The number of rows read per second is also calculated by the second time. That is to say, Oracle can read more than 100,000 rows per second, and MySQL can read about 80,000 rows. Of course, this value is related to the type and number of fields in the table (there are eight fields in the customer table), which is just a reference.
4. Comparison with text file
We do not get too much perceptual understanding only from the above data volume. Let's read the text file to compare. The method is the same. Read the data from the file and parse the record without any calculation.
Write the following SPL script to execute the test:
A |
||
1 |
=now() |
/ Recording time |
2 |
=file("/home/sjr/tbl/customer.tbl") |
/ Generate file object |
3 |
=A2.cursor(;,"|") |
/ Create cursor,the separator is| |
4 |
for A3,10000 |
/Fetch data in loop, 10000 pieces per loop |
5 |
=interval@s(A1,now()) |
/ Calculating time consumed |
Test result is 42 seconds!
This means that reading text is 281/42 = 6.69 times faster than reading Oracle and 381/42 = 9.07 times faster than MySQL!
We know that text parsing is a very troublesome thing, but even so, reading data from a text file is still much faster than reading data from a database. IO of Oracle and MySQL is too slow!
5. Binary format
We further look at the performance of the binary storage format and compare it with the text.
In order to make the comparison more obvious, this time we change to a larger table, using the orders table in TPCH, which has 300 million rows of data and 9 fields.
The code for text reading is similar to that above, and the reading time is 438 seconds.
Then we convert the text file into SPL group table and write code to test:
A |
||
1 |
=now() |
/ Recording time |
2 |
=file("/home/sjr/ctx/orders.ctx").create() |
/Create group table object |
3 |
=A2.cursor() |
/ Create cursor |
4 |
for A3,10000 |
/ Fetch data in loop, 10000 pieces per loop |
5 |
=interval@s(A1,now()) |
/ Calculating time consumed |
The test result is 164 seconds, about a third of the text reading.
This is reasonable, because binary data no longer need to be parsed and can directly generate objects. The computation is a lot less and therefore faster.
It should be noted that although the group table file uses column storage format, it reads all columns here, and does not take any less content than the text, and does not take advantage of column storage. In fact, because you read all columns, you'll suffer a little from using column storage, and you'll get faster if you use SPL set files (a row storage format).
6. Increase speed by parallel
It is also easy to achieve parallelism to fetch data from files, and it is easy to write parallel programs for both text and group tables. Let’s use the order table above as an example to test, using four threads to fetch data.
Text data fetching code:
A |
B |
C |
|
1 |
>n=4 |
/n is the number of parallel |
=now() |
2 |
=file("/home/sjr/tpch_2_17_0/tbls/orders.tbl") |
||
3 |
fork to(n) |
=A2.cursor(;A3:n, "|") |
Create cursor in multi threads, each cursor fetches only one segment of the four segments |
4 |
for B3, 10000 |
||
5 |
=interval@s(C1,now()) |
Group table data fetching code:
A |
B |
C |
|
1 |
>n=4 |
/ n is the number of parallel |
=now() |
2 |
=file("/home/sjr/ctx/orders.ctx").create() |
||
3 |
fork to(n) |
=A2.cursor(;;A3:n) |
Create cursor in multi threads, each cursor fetches only one segment of the four segments |
4 |
for B3, 10000 |
||
5 |
=interval@s(C1,now()) |
It is easy to implement data segmentation and parallel computing with SPL.
The test results are as follows:
Text 119 seconds
Group table 43 seconds
Compared with serial computing, it is close to linear elevation and makes full use of the multi-core of CPU.
The data in the database is not easy to implement segment parallelism, and need to be spelled with WHERE condition. It is difficult to tell whether the parallel is weak or the execution of WHERE is losing too much. The reference value of the test result is not much, so we will not do it here.
7. Conclusion
JDBC performance of databases (Oracle and MySQL) is very poor! More than five times worse than text files. When binary data is used, the reading performance will be improved three times more than that of text. That is to say, a reasonably formatted binary file has more than 15 times the advantage of a database. Considering the parallelism, it is also possible to be tens or hundreds times faster than the database.
When focusing on performance and there is large amount of data, do not read the data out of the database to calculate!
If you really need to read and then calculate (sometimes SQL is difficult to write complex process calculations), do not use the database to store (big data are mostly history data, basically no need to change, and can be read in advance). Text is even better than database, and using binary is of course better (recommend using SPL group tables, haha). Don't waste time on non-computational tasks like reading.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL