Import CSV and Excel to Java Web Apps with Data Pipeline

This blog will demonstrate how to upload Excel and CSV files into a database while using Data Pipeline to handle the differences in format and structure of the individual files.

The code here builds on the previous Export CSV and Excel files blog, scan that first if you haven’t already.

Overview

Like the previous example, the demo web app displays a grid of events (conferences, meetings, appointments, etc.) from an in-memory database. The difference here is this app includes an upload form above the grid for adding events to the database.

The upload files can be in either CSV or Excel (XLS or XLSX) format and must contain the following columns:

  • EVENT_TYPE
  • TITLE
  • SUMMARY
  • LOCATION
  • START_TIME
  • DURATION_MINUTES

The columns can be in any order and the file may contain additional columns. Any additional, unused columns will be removed during the import process.

This example uses Apache Commons FileUpload, in addition to JBoss’ RESTEasy, HyperSQL, and Data Pipeline libraries.

The full source, web.xml configuration, and jars are included with the download for this blog.

1. Create the Upload Form

The first step is to add the upload form and fields above the grid.

The form’s action attribute is left empty since we’ll be POSTing back to the same URL that displays the form and grid. The enctype attribute is set multipart/form-data to support sending files from the browser to the server.

2. Handle Multipart File Uploads

Next we need to handle the form’s post data.

The uploadEvents method uses the Apache Commons FileUpload library to first determine if this is in fact a file upload (isMultipartContent). It then extracts each file from the stream and sends them to the saveEvents method for import to the database. Finally, it redirects the browser back to the same URL to display all events from the database.

By default, DiskFileItemFactory stores files 10 KB and under in-memory and those over to a temporary file on disk. You can change the byte threshold and location by calling DiskFileItemFactory.setSizeThreshold(int) and DiskFileItemFactory.setRepository(File).

3. Import the CSV and Excel Files

In this step we parse the uploaded files and save their record to the events database.

The first check saveEvents does is ensure the FileItem is an actual file and not just a form field.

Next, it determines the format of the uploaded file by looking at its extension and instantiates the appropriate parser. For .csv we use the CSVReader and for .xls and .xlsx we use ExcelReader.

Excel Formats

Excel files come in two flavours. Microsoft’s old XLS format, for Office 2003 and earlier, and the new compressed XLSX format for Office 2007 and beyond. (You can see the internal XML files by renaming an .xlsx file to .zip and then exploding it.)

Data pipeline allows you to target either Excel format by passing one of the ProviderType enums (POI_XSSF for XLSX and POI for XLS) to ExcelDocument.

Field Selection

After instantiating the right DataReader we wrap it in a TransformingReader to pull out only those columns we plan to store. The IncludeFields transformer works like a SQL select. It arranges the columns in the order we’ve specified and discards any unused columns.

Data Loading

We use JdbcWriter to perform the actual database updates. All it requires is a connection and the name of the target table.

JdbcWriter comes with several optional features you might find useful.

  1. High Performance Batch Updates: enable this by calling setBatchSize(int batchSize) with a value greater than 1.
  2. Connection Auto-closing: call setAutoCloseConnection(boolean closeConnection) with true.
  3. SQL Type Override: call setJdbcType(String fieldName, int jdbcType) with the field name and a constant from java.sql.Types .

Finally, we send the reader and writer to a JobTemplate to perform the actual data transfer.

Running the example

That’s it. To try the example app for yourself:

  1. Download the project.
  2. Run the ant build script.
  3. Deploy the generated war to your server of choice.
  4. Visit http://localhost:8080/events2/ (adjust the port as needed).

 

Happy coding!

Download

The web app download contains the entire source code (including Eclipse project and Ant build). The example’s source code is licensed under the terms of the Apache License, Version 2.0.

You can view additional examples by visiting http://northconcepts.com/data-pipeline/examples/.

Java Exception Tracking

If you enjoyed this, I’ve got an exception tracking tool for Java coming out. Sign-up at StackHunter.com to be notified when it does.

About Dele Taylor

Dele Taylor is the founder of NorthConcepts.com, creators of Data Pipeline -- a data migration library for Java. You can follow him on Twitter, G+, and LinkedIn.
CSV, Data Pipeline, Excel, Java, Web | permalink

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">