Read from CSV And Writer to Excel
This library is designed to facilitate the transformation of data from CSV files through a data pipeline, with the ultimate goal of generating output in XLSX (Excel) format. It provides an efficient and streamlined way to process and convert data, allowing users to leverage the power of the pipeline for data manipulation and then extract the transformed data into an Excel-compatible format.
Real-life use cases for this library can involve data preprocessing, data cleansing, and data analysis tasks. Users can utilize the pipeline to perform various transformations, such as filtering, aggregating, or enriching the data from CSV files. The ability to generate the output in XLSX format makes it convenient for further analysis, reporting, or sharing the transformed data with stakeholders who rely on Excel as their primary tool for data consumption.
Input CSV File
Sell,List,Living,Rooms,Beds,Baths,Age,Acres,Taxes,Rating,Garage 142,160,28,10,5,,60,0.28,3167,A, 1,193,28,11,,3,60,0.28,3168,B, 10,110,28,,5,3,60,0.28,3169,B, 3,132,,12,5,3,60,0.28,3170,C, 5,,180,13,5,3,60,0.28,3171,C,
Java Code Listing
package com.northconcepts.datapipeline.foundations.examples.pipeline; import com.northconcepts.datapipeline.foundations.file.LocalFileSink; import com.northconcepts.datapipeline.foundations.file.LocalFileSource; import com.northconcepts.datapipeline.foundations.pipeline.Pipeline; import com.northconcepts.datapipeline.foundations.pipeline.action.convert.ConvertStringToNumberAction; import com.northconcepts.datapipeline.foundations.pipeline.action.transform.AddFieldsAction; import com.northconcepts.datapipeline.foundations.pipeline.action.transform.RenameFieldsAction; import com.northconcepts.datapipeline.foundations.pipeline.input.CsvPipelineInput; import com.northconcepts.datapipeline.foundations.pipeline.output.ExcelPipelineOutput; public class ReadFromCsvWriteToExcel { public static void main(String[] args) { CsvPipelineInput pipelineInput = new CsvPipelineInput() .setFileSource(new LocalFileSource().setPath("data/input/Listing.csv")) .setFieldNamesInFirstRow(true); ExcelPipelineOutput pipelineOutput = new ExcelPipelineOutput() .setFileSink(new LocalFileSink().setPath("data/output/output.xlsx")) .setFieldNamesInFirstRow(true); Pipeline pipeline = new Pipeline() .setInput(pipelineInput) .setOutput(pipelineOutput) .addAction(new RenameFieldsAction().add("Taxes", "Taxes_Renamed")) .addAction(new ConvertStringToNumberAction() .add("Sell", "List") .setType(ConvertStringToNumberAction.FieldType.DOUBLE) .setPattern("0.00")) .addAction(new AddFieldsAction().add("new_column", AddFieldsAction.FieldType.EXPRESSION, "List - Sell")); pipeline.run(); } }
Code Walkthrough
- CsvPipelineInput instance is created to read data from the input file
Listing.csv
. - The
setFieldNamesInFirstRow(true)
method is invoked to specify that the names specified in the first row should be used as field names. - ExcelPipelineOutput instance is created to specify the output file
output.xlsx
. - A Pipeline object is created. Input and output instances in the previous steps are declared in the pipeline instance.
- In order to process data, different actions/rules are also specified for the pipeline.
- First, the "Taxes" field is renamed to "Taxes_Renamed".
- Next, two fields are converted from String to Double with a pattern.
- Then, the "new_column" field is added with the calculation expression. Its value is a difference between the "List" and "Sell" column values.
- The pipeline is executed.
Output File
Sell List Living Rooms Beds Baths Age Acres Taxes_Renamed Rating Garage new_column 142.00 160.00 28 10 5 60 0.28 3167 A 18.00 1.00 193.00 28 11 3 60 0.28 3168 B 192.00 10.00 110.00 28 5 3 60 0.28 3169 B 100.00 3.00 132.00 12 5 3 60 0.28 3170 C 129.00 5.00 180 13 5 3 60 0.28 3171 C