Updated: May 2023
When trying to assess how knowledgeable a developer is in general and in JDBC in particular, here’s a question I like to ask: how would you speed up inserts when using JDBC?
Here are some options to consider if you ever need to insert data quickly into an SQL database.
1. Don’t use JDBC
Most databases have a tool to handle bulk inserts natively and this is the fastest way to do it. The option isn’t always there, but if the data format is supported by the tool, you won’t find a faster option.
2. Use Prepared Statements
Prepared statements reduce insert time by compiling the SQL once, and also help prevent SQL injection attacks. Injection attacks are another good question to ask in interviews.
Once the prepared statement is parsed by the engine, the client only needs to send the data for each statement, instead of the entire SQL every time.
3. Use Batch Inserts
Prepared statements can help you when using JDBC, but batch statements are going to be your BFF when doing inserts and updates.
We have a demo in Data Pipeline that loads some hospital data into a MySQL database. Using batch statements of 100 records, insert time went from 3.5 minutes to under 1 second.
Here are two things you should know about the data:
1) The data isn’t perfect. It does have bad records and duplicate IDs that cause some records to fail.
2) Some work needs to be done to map the target and source formats.
Here’s how we did it in Java.
DataWriter discardWriter = new StreamWriter(System.out);
DataReader reader = new CSVReader(new File("data/hospital.csv"))
reader = new TransformingReader(reader, discardWriter, "error_message")
.add(new RenameField("Provider Number", "hospital_id"))
.add(new RenameField("Hospital Name", "hospital_name"))
.add(new RenameField("Address 1", "address1"))
.add(new RenameField("Address 2", "address2"))
.add(new RenameField("Address 3", "address3"))
.add(new RenameField("City", "city"))
.add(new RenameField("State", "state"))
.add(new RenameField("ZIP Code", "zip_code"))
.add(new RenameField("County", "county"))
.add(new RenameField("Phone Number", "phone_number"))
.add(new RenameField("Hospital Type", "hospital_type"))
.add(new RenameField("Hospital Ownership", "hospital_ownership"))
.add(new RenameField("Emergency Services", "emergency_services"))
"decode(emergency_services, 'Yes', true, 'No', false)"))
DataWriter writer = new JdbcWriter(getConnection(), "hospital")
Job job = Job.run(reader, writer);
System.out.println("Time: " + job.getRunningTimeAsString());
If you want to learn more about the code and Data Pipeline, please have a look at our Java ETL framework.
4. Use Multiple Connections
Depending on your configuration and database, speed could be improved by setting up more than one connection to do inserts.
In a test we run recently, we saw improvements with up to 5 connections with one of the major database providers. Going above 5 connections saw no noticeable benefit.
5. Temporarily Disable Integrity Constraints
With this one, caution is required. I don’t recommend it unless you are completely familiar with what you are doing, and most importantly, have a rollback plan.
Disabling integrity checks means your database engine can insert data faster since it has less work to do for each record. However, if your data isn’t perfectly clean, you may end up with corrupted tables and broken client apps. Also, don’t forget to re-enable constraints. You don’t want to be the guy who forgot to re-enable constraints on a production database.
What’s your trick for speeding up inserts with JDBC?
Drop me a comment if you have an insert trick to share.