Read from CSV And Writer to Excel

This library is designed to facilitate the transformation of data from CSV files through a data pipeline, with the ultimate goal of generating output in XLSX (Excel) format. It provides an efficient and streamlined way to process and convert data, allowing users to leverage the power of the pipeline for data manipulation and then extract the transformed data into an Excel-compatible format.

Real-life use cases for this library can involve data preprocessing, data cleansing, and data analysis tasks. Users can utilize the pipeline to perform various transformations, such as filtering, aggregating, or enriching the data from CSV files. The ability to generate the output in XLSX format makes it convenient for further analysis, reporting, or sharing the transformed data with stakeholders who rely on Excel as their primary tool for data consumption.


Input CSV File



Java Code Listing

package com.northconcepts.datapipeline.foundations.examples.pipeline;

import com.northconcepts.datapipeline.foundations.file.LocalFileSink;
import com.northconcepts.datapipeline.foundations.file.LocalFileSource;
import com.northconcepts.datapipeline.foundations.pipeline.Pipeline;
import com.northconcepts.datapipeline.foundations.pipeline.action.convert.ConvertStringToNumberAction;
import com.northconcepts.datapipeline.foundations.pipeline.action.transform.AddFieldsAction;
import com.northconcepts.datapipeline.foundations.pipeline.action.transform.RenameFieldsAction;
import com.northconcepts.datapipeline.foundations.pipeline.input.CsvPipelineInput;
import com.northconcepts.datapipeline.foundations.pipeline.output.ExcelPipelineOutput;

public class ReadFromCsvWriteToExcel {

    public static void main(String[] args) {

        CsvPipelineInput pipelineInput = new CsvPipelineInput()
                .setFileSource(new LocalFileSource().setPath("data/input/Listing.csv"))

        ExcelPipelineOutput pipelineOutput = new ExcelPipelineOutput()
                .setFileSink(new LocalFileSink().setPath("data/output/output.xlsx"))

        Pipeline pipeline = new Pipeline()
            .addAction(new RenameFieldsAction().add("Taxes", "Taxes_Renamed"))
            .addAction(new ConvertStringToNumberAction()
                .add("Sell", "List")
            .addAction(new AddFieldsAction().add("new_column", AddFieldsAction.FieldType.EXPRESSION, "List - Sell"));;


Code Walkthrough

  1. CsvPipelineInput instance is created to read data from the input file Listing.csv.
  2. The setFieldNamesInFirstRow(true) method is invoked to specify that the names specified in the first row should be used as field names.
  3. ExcelPipelineOutput instance is created to specify the output file output.xlsx.
  4. A Pipeline object is created. Input and output instances in the previous steps are declared in the pipeline instance. 
  5. In order to process data, different actions/rules are also specified for the pipeline.
  6. First, the "Taxes" field is renamed to "Taxes_Renamed".
  7. Next, two fields are converted from String to Double with a pattern.
  8. Then, the "new_column" field is added with the calculation expression. Its value is a difference between the "List" and "Sell" column values. 
  9. The pipeline is executed.


Output File

Sell	List	Living Rooms	Beds	Baths	Age	Acres	Taxes_Renamed	Rating	Garage	new_column
142.00	160.00  28	10	5		60	0.28	3167	        A		18.00
1.00	193.00	28	11		3	60	0.28	3168	        B		192.00
10.00	110.00	28		5	3	60	0.28	3169	        B		100.00
3.00	132.00		12	5	3	60	0.28	3170	        C		129.00
5.00		180	13	5	3	60	0.28	3171	        C		


Mobile Analytics