This blog will demonstrate how to upload Excel and CSV files into a database while using Data Pipeline to handle the differences in format and structure of the individual files.
The code here builds on the previous Export CSV and Excel files blog, scan that first if you haven’t already.
Overview
Like the previous example, the demo web app displays a grid of events (conferences, meetings, appointments, etc.) from an in-memory database. The difference here is this app includes an upload form above the grid for adding events to the database.
The upload files can be in either CSV or Excel (XLS or XLSX) format and must contain the following columns:
- EVENT_TYPE
- TITLE
- SUMMARY
- LOCATION
- START_TIME
- DURATION_MINUTES
The columns can be in any order and the file may contain additional columns. Any additional, unused columns will be removed during the import process.
This example uses Apache Commons FileUpload, in addition to JBoss’ RESTEasy, HyperSQL, and Data Pipeline libraries.
The full source, web.xml configuration, and jars are included with the download for this blog.
1. Create the Upload Form
The first step is to add the upload form and fields above the grid.
1 2 3 4 |
<form> File: <input id="file1" name="file1" size="100" type="file"> <input value="Upload" type="submit"> </form> |
The form’s action attribute is left empty since we’ll be POSTing back to the same URL that displays the form and grid. The enctype attribute is set multipart/form-data to support sending files from the browser to the server.
2. Handle Multipart File Uploads
Next we need to handle the form’s post data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
@POST @Path("/events") @SuppressWarnings("unchecked") public Response uploadEvents(@Context HttpServletRequest request) throws Throwable { boolean isMultipart = ServletFileUpload.isMultipartContent(request); if (!isMultipart) { return Response.ok("Not multipart request").build(); } FileItemFactory factory = new DiskFileItemFactory(); ServletFileUpload upload = new ServletFileUpload(factory); List<fileitem> files = upload.parseRequest(request); for (FileItem file : files) { saveEvents(file); } return Response.seeOther(new URI("/events")).build(); } |
The uploadEvents method uses the Apache Commons FileUpload library to first determine if this is in fact a file upload (isMultipartContent). It then extracts each file from the stream and sends them to the saveEvents method for import to the database. Finally, it redirects the browser back to the same URL to display all events from the database.
By default, DiskFileItemFactory stores files 10 KB and under in-memory and those over to a temporary file on disk. You can change the byte threshold and location by calling DiskFileItemFactory.setSizeThreshold(int) and DiskFileItemFactory.setRepository(File).
3. Import the CSV and Excel Files
In this step we parse the uploaded files and save their record to the events database.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
private void saveEvents(FileItem file) throws Throwable { if (file.isFormField()) { return; } String fileName = file.getName(); DataReader reader; if (fileName.endsWith(".csv")) { reader = new CSVReader(new InputStreamReader(file.getInputStream())).setFieldNamesInFirstRow(true); } else if (fileName.endsWith(".xls") || fileName.endsWith(".xlsx")) { ExcelDocument excelDocument = new ExcelDocument(fileName.endsWith(".xlsx")?ProviderType.POI_XSSF:ProviderType.POI); excelDocument.open(file.getInputStream()); reader = new ExcelReader(excelDocument).setFieldNamesInFirstRow(true); } else { throw new DataException("unknown file type: expected .csv, .xls, or .xlsx file extension").set("fileName", fileName); } reader = new TransformingReader(reader).add(new IncludeFields("EVENT_TYPE", "TITLE", "SUMMARY", "LOCATION", "START_TIME", "DURATION_MINUTES")); DataWriter writer = new JdbcWriter(db.getConnection(), "EVENT"); JobTemplate.DEFAULT.transfer(reader, writer); } |
The first check saveEvents does is ensure the FileItem is an actual file and not just a form field.
Next, it determines the format of the uploaded file by looking at its extension and instantiates the appropriate parser. For .csv we use the CSVReader and for .xls and .xlsx we use ExcelReader.
Excel Formats
Excel files come in two flavours. Microsoft’s old XLS format, for Office 2003 and earlier, and the new compressed XLSX format for Office 2007 and beyond. (You can see the internal XML files by renaming an .xlsx file to .zip and then exploding it.)
Data pipeline allows you to target either Excel format by passing one of the ProviderType enums (POI_XSSF for XLSX and POI for XLS) to ExcelDocument.
Field Selection
After instantiating the right DataReader we wrap it in a TransformingReader to pull out only those columns we plan to store. The IncludeFields transformer works like a SQL select. It arranges the columns in the order we’ve specified and discards any unused columns.
Data Loading
We use JdbcWriter to perform the actual database updates. All it requires is a connection and the name of the target table.
JdbcWriter comes with several optional features you might find useful.
-
High Performance Batch Updates: enable this by calling setBatchSize(int batchSize) with a value greater than 1.
-
Connection Auto-closing: call setAutoCloseConnection(boolean closeConnection) with true.
-
SQL Type Override: call setJdbcType(String fieldName, int jdbcType) with the field name and a constant from java.sql.Types .
Finally, we send the reader and writer to a JobTemplate to perform the actual data transfer.
Running the example
That’s it. To try the example app for yourself:
- Download the project.
- Download the latest Data Pipeline and unpack it into the project’s WEB-INF\lib folder.
- Run the ant build script.
- Deploy the generated war to your server of choice.
- Visit http://localhost:8080/events2/ (adjust the port as needed).
Happy coding!
You can view additional examples by visiting https://northconcepts.com/data-pipeline/examples/.
Hi, I tried your example, calling the saveEvents method to import an Excel file to my database. But it’s only importing 999 records from 9300 that the file has.
There is anyway to import all the records?
Thank you!
Andrea
Hi Andrea, I’ll follow-up with you over email.
Hi Dele,
I tried this tutorial, but then in a different approach. Only used your savedEvents method. I’m trying to import an excel file coming from my local pc to my hsqldb. Since your parameter is FileItem. I tried using these codes to convert the file into fileitem :
InputStream inputStream = new FileInputStream(“C:/Users/jethro/Documents/filename.xlsx”);
int availableBytes = inputStream.available();
File outFile = new File(“c:\\tmp\\newfile.xlsx”);
FileItem fileItem = new DiskFileItem(“fileUpload”, “application/vnd.ms-excel”, false, “filename.xlsx”, availableBytes, outFile);
Sorry for the lousy formatting but then it came out like that when I pasted it. Anw, when running it it causes an NPE on this part excelDocument.open(file.getInputStream());
Any ideas why? Thanks 🙂
Hi,
Me again. Please ignore the first comment. I already solved that one sorry for that. Panic comment. Anyway, I have the same problem as Andrea(The first commenter). Out of 2000+ data, It only recorded 1000.
Thanks,
Jet
Do you mind emailing me your new code (support at NorthConcepts.com)?
How can i create a multiple sheets in single excel file.
DataWriter writer = new ExcelWriter(document)
.setSheetName(“balance”);
Hi Pavan, check out this how-to for the details: https://northconcepts.com/blog/2015/05/30/how-to-create-multiple-sheets-in-a-single-excel-file/. Cheers.
Pingback: How to create multiple sheets in a single Excel file
I need a code that imports excel sheets into sql database .. next step is after uploading we will be generating reports by writing sql queries can you guys helpme out ?
Hi Yogesh, you should already have the import part from reading this blog. For the export/reports part, see https://northconcepts.com/blog/2012/12/10/export-csv-and-excel-from-java-web-apps-with-data-pipeline/ and change the “SELECT * FROM EVENT” query to your own. Cheers.