Use Streaming Excel for Writing
This example shows you how to take records and write them into a Microsoft Excel document using a streaming approach.
The usage of Apache POI's streaming API (SXSSF) in this example provides an alternative approach to writing Excel files compared to the default method. It leverages the compressed nature of modern Microsoft Office file formats, such as XLSX, and writes data directly to the file in a streaming fashion, reducing memory consumption. This approach is especially useful when working with large datasets that may require a significant amount of RAM if using the default method. However, it's important to note that certain Excel features that rely on the entire dataset may be disabled when using the streaming approach.
This can be used to export data from databases, data warehouses, or other data sources into Excel. This enables you to perform in-depth analysis, create custom reports, and generate visualizations using Excel's powerful features. It is particularly useful for financial analysis, sales reporting, business intelligence, and data-driven decision-making.
This example can easily be modified to show how to use streaming Excel for reading.
Java Code Listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.excel.ExcelDocument; import com.northconcepts.datapipeline.excel.ExcelDocument.ProviderType; import com.northconcepts.datapipeline.excel.ExcelWriter; import com.northconcepts.datapipeline.job.Job; public class UseStreamingExcelWriting { public static void main(String[] args) { DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv")) .setFieldNamesInFirstRow(true); ExcelDocument document = new ExcelDocument(ProviderType.POI_SXSSF); DataWriter writer = new ExcelWriter(document).setSheetName("balance"); Job.run(reader, writer); document.save(new File("example/data/output/credit-balance-06.xlsx")); } }
Code Walkthrough
- First, CSVReader is created to read comma-separated records from an input
credit-balance-01.csv
file. - ExcelDocument.ProviderType indicates an Excel provider which is used to open the input Excel file. For this example, POI_SXSSF is selected as an Excel provider for the purpose of reducing memory usage.
- ExcelWriter is created using
ExcelDocument
object to write records to the specified Excel file. A method ExcelWriter.setSheetName() changes the default sheet name to whatever value passed in the parameter which isbalance
in this example. - Data is transferred from CSVReader to ExcelWriter via Job.run() method. See how to compile and run data pipeline jobs
- Output Excel file is saved using ExcelDocument.save() method.
CSVReader
Obtains records from a Comma Separated Value (CSV) or delimited stream. It can be created using File or Reader object. A method CSVReader.setFieldNamesInFirstRow(true)
in this class causes the CSVWriter
to use the names specified in the first row of the input data as field names. If this method is not invoked, the fields would be named A1, A2. If those fields' names need to be changed, a rename transformation can be added on top of CSVReader
or any other type (Refer Rename a field for example).
ExcelDocument
The in-memory abstraction for an Excel workbook. It is not thread-safe and will throw an exception if used in multiple ExcelReaders and/or ExcelWriter concurrently. Since its data is stored in memory, reading and re-reading from it multiple times is very cheap (think millions of reads per second). A method ExcelDocument.save() can be used to save a specified file document.
ExcelDocument.ProviderType
ExcelDocument.ProviderType is an enum class that contains a list of Excel providers.
POI_SXSSF
POI_SXSSF writes .xlsx files (Excel 2007+) using Apache POI's streaming API (SXSSF) and temporary files to reduce memory usage. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window. Older rows that are no longer in the window become inaccessible, as they are written to the disk.
ExcelWriter
Writes records to a Microsoft Excel document. ExcelWriter.setSheetName() method in this class can be used to assign sheet names for an Excel file. If you remove this method a default sheet name ie. sheet1, sheet2, and so on will be automatically assigned.
Output
The output will be written to the Excel file and stored in the specified location i.e example/data/output
.