Use Streaming Excel for Writing

Updated: Aug 24, 2023

This example shows you how to take records and write them into a Microsoft Excel document using a streaming approach.

The usage of Apache POI's streaming API (SXSSF) in this example provides an alternative approach to writing Excel files compared to the default method. It leverages the compressed nature of modern Microsoft Office file formats, such as XLSX, and writes data directly to the file in a streaming fashion, reducing memory consumption. This approach is especially useful when working with large datasets that may require a significant amount of RAM if using the default method. However, it's important to note that certain Excel features that rely on the entire dataset may be disabled when using the streaming approach. 

This can be used to export data from databases, data warehouses, or other data sources into Excel. This enables you to perform in-depth analysis, create custom reports, and generate visualizations using Excel's powerful features. It is particularly useful for financial analysis, sales reporting, business intelligence, and data-driven decision-making.

This example can easily be modified to show how to use streaming Excel for reading.

 

Java Code Listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.excel.ExcelDocument;
import com.northconcepts.datapipeline.excel.ExcelDocument.ProviderType;
import com.northconcepts.datapipeline.excel.ExcelWriter;
import com.northconcepts.datapipeline.job.Job;

public class UseStreamingExcelWriting {
    
    public static void main(String[] args) {
        DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv"))
                .setFieldNamesInFirstRow(true);

        ExcelDocument document = new ExcelDocument(ProviderType.POI_SXSSF);
        DataWriter writer = new ExcelWriter(document).setSheetName("balance");

        Job.run(reader, writer);
        
        document.save(new File("example/data/output/credit-balance-06.xlsx"));
    }

}

 

Code Walkthrough

  1. First, CSVReader is created to read comma-separated records from an input credit-balance-01.csv file.
  2. ExcelDocument.ProviderType indicates an Excel provider which is used to open the input Excel file. For this example, POI_SXSSF is selected as an Excel provider for the purpose of reducing memory usage.
  3. ExcelWriter is created using ExcelDocument object to write records to the specified Excel file. A method ExcelWriter.setSheetName() changes the default sheet name to whatever value passed in the parameter which is balance in this example.
  4. Data is transferred from CSVReader to ExcelWriter via Job.run() method. See how to compile and run data pipeline jobs
  5. Output Excel file is saved using ExcelDocument.save() method.

 

CSVReader

Obtains records from a Comma Separated Value (CSV) or delimited stream. It can be created using File or Reader object. A method CSVReader.setFieldNamesInFirstRow(true) in this class causes the CSVWriter to use the names specified in the first row of the input data as field names. If this method is not invoked, the fields would be named A1, A2. If those fields' names need to be changed, a rename transformation can be added on top of CSVReader or any other type (Refer Rename a field for example).

 

ExcelDocument

The in-memory abstraction for an Excel workbook. It is not thread-safe and will throw an exception if used in multiple ExcelReaders and/or ExcelWriter concurrently. Since its data is stored in memory, reading and re-reading from it multiple times is very cheap (think millions of reads per second). A method ExcelDocument.save() can be used to save a specified file document.

 

ExcelDocument.ProviderType

ExcelDocument.ProviderType is an enum class that contains a list of Excel providers.

 

POI_SXSSF

POI_SXSSF writes .xlsx files (Excel 2007+) using Apache POI's streaming API (SXSSF) and temporary files to reduce memory usage. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

 

ExcelWriter

Writes records to a Microsoft Excel document. ExcelWriter.setSheetName() method in this class can be used to assign sheet names for an Excel file. If you remove this method a default sheet name ie. sheet1, sheet2, and so on will be automatically assigned.

 

Output

The output will be written to the Excel file and stored in the specified location i.e example/data/output.

Mobile Analytics