Updated: May 2023
When trying to assess how knowledgeable a developer is in general and in JDBC in particular, here’s a question I like to ask: how would you speed up inserts when using JDBC?
Here are some options to consider if you ever need to insert data quickly into an SQL database.
1. Don’t use JDBC
Most databases have a tool to handle bulk inserts natively and this is the fastest way to do it. The option isn’t always there, but if the data format is supported by the tool, you won’t find a faster option.
2. Use Prepared Statements
Prepared statements reduce insert time by compiling the SQL once, and also help prevent SQL injection attacks. Injection attacks are another good question to ask in interviews.
Once the prepared statement is parsed by the engine, the client only needs to send the data for each statement, instead of the entire SQL every time.
3. Use Batch Inserts
Prepared statements can help you when using JDBC, but batch statements are going to be your BFF when doing inserts and updates.
We have a demo in Data Pipeline that loads some hospital data into a MySQL database. Using batch statements of 100 records, insert time went from 3.5 minutes to under 1 second.
Here are two things you should know about the data:
1) The data isn’t perfect. It does have bad records and duplicate IDs that cause some records to fail.
2) Some work needs to be done to map the target and source formats.
Here’s how we did it in Java.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
DataWriter discardWriter = new StreamWriter(System.out); DataReader reader = new CSVReader(new File("data/hospital.csv")) .setFieldNamesInFirstRow(true); reader = new TransformingReader(reader, discardWriter, "error_message") .add(new RenameField("Provider Number", "hospital_id")) .add(new BasicFieldTransformer("hospital_id").stringToLong()) .add(new RenameField("Hospital Name", "hospital_name")) .add(new RenameField("Address 1", "address1")) .add(new RenameField("Address 2", "address2")) .add(new RenameField("Address 3", "address3")) .add(new RenameField("City", "city")) .add(new RenameField("State", "state")) .add(new RenameField("ZIP Code", "zip_code")) .add(new RenameField("County", "county")) .add(new RenameField("Phone Number", "phone_number")) .add(new RenameField("Hospital Type", "hospital_type")) .add(new RenameField("Hospital Ownership", "hospital_ownership")) .add(new RenameField("Emergency Services", "emergency_services")) .add(new SetCalculatedField("emergency_services", "decode(emergency_services, 'Yes', true, 'No', false)")) ; DataWriter writer = new JdbcWriter(getConnection(), "hospital") .setBatchSize(BATCH_SIZE); Job job = Job.run(reader, writer); System.out.println("Time: " + job.getRunningTimeAsString()); |
If you want to learn more about the code and Data Pipeline, please have a look at our Java ETL framework.
4. Use Multiple Connections
Depending on your configuration and database, speed could be improved by setting up more than one connection to do inserts.
In a test we run recently, we saw improvements with up to 5 connections with one of the major database providers. Going above 5 connections saw no noticeable benefit.
5. Temporarily Disable Integrity Constraints
With this one, caution is required. I don’t recommend it unless you are completely familiar with what you are doing, and most importantly, have a rollback plan.
Disabling integrity checks means your database engine can insert data faster since it has less work to do for each record. However, if your data isn’t perfectly clean, you may end up with corrupted tables and broken client apps. Also, don’t forget to re-enable constraints. You don’t want to be the guy who forgot to re-enable constraints on a production database.
What’s your trick for speeding up inserts with JDBC?
Drop me a comment if you have an insert trick to share.
Happy coding.
Pingback: Java Weekly 22/16: JDBC insert, L2 caching, eager optimization
Pretty good, short, nice and concise.
Thanks, I appreciate your feedback. Cheers.
Hi Dele,
Thanks for the good and very useful article .
i have some small doubt on the below
4. Use Multiple Connections
can you please help me to understand , how to do this .
Hi Ramesh,
The simplest way you can do this is to run your code that connects to the DB and inserts data in separate threads.
Our setup is a little less simple. Each thread has its own LinkedBlockingDeque it pulls data from to write to its own connection. We have a main thread to read data from a source (like a CSV file) and put one record in each thread’s LinkedBlockingDeque in turn.
Of course, we encapsulate all of this inside JdbcMultiWriter. You can see an example here: Write to a Database Using Multiple Connections.
Hope this helps. Cheers.
And to optimize CPU usage, use JDBC reactive (e.g. vertx jdbc implementation)
What worked wonders for us was using the JDBC drivers reWriteBatchedInserts option (this is the name for Postgres, for MySQL it’s rewriteBatchedInserts), which can be turned on by appending it to the connection string (?reWriteBatchedInserts=true).
Especially for short rows, this option, which rewrites batched single value insert statements into multi value insert statements, resulted in a big speedup of up to 50x.
This article describes how it works for hibernate: https://vladmihalcea.com/postgresql-multi-row-insert-rewritebatchedinserts-property/
Here is the link for JDBC+MySQL: https://stackoverflow.com/a/24026539