Import CSV and Excel to Java Web Apps with Data Pipeline

Import CSV and Excel to Java Web Apps with Data PipelineThis blog will demonstrate how to upload Excel and CSV files into a database while using Data Pipeline to handle the differences in format and structure of the individual files.

The code here builds on the previous Export CSV and Excel files blog, scan that first if you haven’t already.

Overview

Like the previous example, the demo web app displays a grid of events (conferences, meetings, appointments, etc.) from an in-memory database. The difference here is this app includes an upload form above the grid for adding events to the database.

The upload files can be in either CSV or Excel (XLS or XLSX) format and must contain the following columns:

  • EVENT_TYPE
  • TITLE
  • SUMMARY
  • LOCATION
  • START_TIME
  • DURATION_MINUTES

The columns can be in any order and the file may contain additional columns. Any additional, unused columns will be removed during the import process.

This example uses Apache Commons FileUpload, in addition to JBoss’ RESTEasy, HyperSQL, and Data Pipeline libraries.

The full source, web.xml configuration, and jars are included with the download for this blog.

1. Create the Upload Form

The first step is to add the upload form and fields above the grid.

The form’s action attribute is left empty since we’ll be POSTing back to the same URL that displays the form and grid. The enctype attribute is set multipart/form-data to support sending files from the browser to the server.

2. Handle Multipart File Uploads

Next we need to handle the form’s post data.

The uploadEvents method uses the Apache Commons FileUpload library to first determine if this is in fact a file upload (isMultipartContent). It then extracts each file from the stream and sends them to the saveEvents method for import to the database. Finally, it redirects the browser back to the same URL to display all events from the database.

By default, DiskFileItemFactory stores files 10 KB and under in-memory and those over to a temporary file on disk. You can change the byte threshold and location by calling DiskFileItemFactory.setSizeThreshold(int) and DiskFileItemFactory.setRepository(File).

3. Import the CSV and Excel Files

In this step we parse the uploaded files and save their record to the events database.

The first check saveEvents does is ensure the FileItem is an actual file and not just a form field.

Next, it determines the format of the uploaded file by looking at its extension and instantiates the appropriate parser. For .csv we use the CSVReader and for .xls and .xlsx we use ExcelReader.

Excel Formats

Excel files come in two flavours. Microsoft’s old XLS format, for Office 2003 and earlier, and the new compressed XLSX format for Office 2007 and beyond. (You can see the internal XML files by renaming an .xlsx file to .zip and then exploding it.)

Data pipeline allows you to target either Excel format by passing one of the ProviderType enums (POI_XSSF for XLSX and POI for XLS) to ExcelDocument.

Field Selection

After instantiating the right DataReader we wrap it in a TransformingReader to pull out only those columns we plan to store. The IncludeFields transformer works like a SQL select. It arranges the columns in the order we’ve specified and discards any unused columns.

Data Loading

We use JdbcWriter to perform the actual database updates. All it requires is a connection and the name of the target table.

JdbcWriter comes with several optional features you might find useful.

  1. High Performance Batch Updates: enable this by calling setBatchSize(int batchSize) with a value greater than 1.
  2. Connection Auto-closing: call setAutoCloseConnection(boolean closeConnection) with true.
  3. SQL Type Override: call setJdbcType(String fieldName, int jdbcType) with the field name and a constant from java.sql.Types .

Finally, we send the reader and writer to a JobTemplate to perform the actual data transfer.

Running the example

That’s it. To try the example app for yourself:

  1. Download the project.
  2. Download the latest Data Pipeline and unpack it into the project’s WEB-INF\lib folder.
  3. Run the ant build script.
  4. Deploy the generated war to your server of choice.
  5. Visit http://localhost:8080/events2/ (adjust the port as needed).

 

Happy coding!

You can view additional examples by visiting https://northconcepts.com/data-pipeline/examples/.

About Dele Taylor

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

10 thoughts on “Import CSV and Excel to Java Web Apps with Data Pipeline

  • By Andrea - Reply

    Hi, I tried your example, calling the saveEvents method to import an Excel file to my database. But it’s only importing 999 records from 9300 that the file has.
    There is anyway to import all the records?
    Thank you!
    Andrea

    • By Dele Taylor - Reply

      Hi Andrea, I’ll follow-up with you over email.

  • By Jethro - Reply

    Hi Dele,

    I tried this tutorial, but then in a different approach. Only used your savedEvents method. I’m trying to import an excel file coming from my local pc to my hsqldb. Since your parameter is FileItem. I tried using these codes to convert the file into fileitem :

    InputStream inputStream = new FileInputStream(“C:/Users/jethro/Documents/filename.xlsx”);
    int availableBytes = inputStream.available();
    File outFile = new File(“c:\\tmp\\newfile.xlsx”);
    FileItem fileItem = new DiskFileItem(“fileUpload”, “application/vnd.ms-excel”, false, “filename.xlsx”, availableBytes, outFile);

    Sorry for the lousy formatting but then it came out like that when I pasted it. Anw, when running it it causes an NPE on this part excelDocument.open(file.getInputStream());

    Any ideas why? Thanks 🙂

  • By Jethro - Reply

    Hi,

    Me again. Please ignore the first comment. I already solved that one sorry for that. Panic comment. Anyway, I have the same problem as Andrea(The first commenter). Out of 2000+ data, It only recorded 1000.

    Thanks,
    Jet

    • By Dele Taylor - Reply

      Do you mind emailing me your new code (support at NorthConcepts.com)?

  • By pavan - Reply

    How can i create a multiple sheets in single excel file.

    DataWriter writer = new ExcelWriter(document)
    .setSheetName(“balance”);

  • Pingback: How to create multiple sheets in a single Excel file

  • By yogesh - Reply

    I need a code that imports excel sheets into sql database .. next step is after uploading we will be generating reports by writing sql queries can you guys helpme out ?

Leave a Reply

Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">