Read a Set of Files
This example shows you how to read multiple CSV files sequentially in a single job. While this example uses a set of CSV files, the input set can be any type supported by DataPipeline. They can even be multiple, different types.
Input CSV files
Countries-1.csv (15 total records)
Country (en),Country (de),Country (local),Country code,Continent,Capital,Population,Area,Coastline,Government form,Currency,Currency code,Dialing prefix,Birthrate,Deathrate,Life expectancy,Url American Samoa,Amerikanisch Samoa,Amerika Samoa,AS,Oceania,,54194,199,116,Presidential democracy (self-governing territory of the US),Dollar,USD,1-684,22.9,4.8,75.4,https://www.laenderdaten.info/Ozeanien/Amerikanisch-Samoa/index.php British Indian Ocean Territory,Britisches Territorium im Indischen Ozean,British Indian Ocean Territory,IO,Africa,,0,54400,698,British overseas territory,Dollar,USD,246,0,0,0,https://www.laenderdaten.info/Afrika/Britisches-Territorium-im-Indischen-Ozean/index.php ...
Countries-2.csv (233 total records)
Country (en),Country (de),Country (local),Country code,Continent,Capital,Population,Area,Coastline,Government form,Currency,Currency code,Dialing prefix,Birthrate,Deathrate,Life expectancy,Url Afghanistan,Afghanistan,Afganistan/Afqanestan,AF,Asia,,33332025,652230,0,Presidential islamic republic,Afghani,AFN,93,38.3,13.7,51.3,https://www.laenderdaten.info/Asien/Afghanistan/index.php Egypt,Ägypten,Misr,EG,Africa,,94666993,1001450,2450,Presidential republic,Pfund,EGP,20,30.3,4.7,72.7,https://www.laenderdaten.info/Afrika/Aegypten/index.php Åland Islands,Ålandinseln,Åland,AX,Europe,,29013,1580,0,Autonomous region of Finland,Euro,EUR,358,0,0,0,https://www.laenderdaten.info/Europa/Aland/index.php ...
Java code listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.core.SequenceReader; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.job.Job; public class ReadASetOfFiles { private static final String INPUT = "example/data/input"; private static final String[] FILES = {"countries-1.csv", "countries-2.csv"}; public static void main(String[] args) { SequenceReader sequenceReader = new SequenceReader(); for (int i = 0; i < FILES.length; i++) { sequenceReader.add(new CSVReader(new File(INPUT, FILES[i])) .setFieldNamesInFirstRow(true)); } DataReader reader = sequenceReader; DataWriter writer = StreamWriter.newSystemOutWriter(); Job.run(reader, writer); } }
Code walkthrough
- An instance of SequenceReader class is created. It is used to combine all the input into a single stream.
- In the for loop, a CSVReader object is created using the file path of each input file and added to the
sequenceReader
. - The CSVReader.setFieldNamesInFirstRow(true) method is invoked to specify that the names specified in the first row should be used as field names.
- SequenceReader object is parsed to DataReader object called
reader
. - Data is transferred from the
reader
to theStreamWriter
via Job.run() method. - StreamWriter will be used to print the results to the console.
CSVReader
CSVReader is an input reader which can be used to read CSV files. It is a sub-class of TextReader and inherits the open and close among other methods. The CSVReader.setFieldNamesInFirstRow(true) method causes the CSVReader to use the names specified in the first row of the input data as field names. If this method is not invoked, the fields would be named as A1, A2, etc. similar to MS Excel. If those fields names need to be changed, a rename transformation can be added on top of CSVReader or any other type (Refer Rename a field for example).
Console output
----------------------------------------------- 0 - Record { 0:[Country (en)]:STRING=[American Samoa]:String 1:[Country (de)]:STRING=[Amerikanisch Samoa]:String 2:[Country (local)]:STRING=[Amerika Samoa]:String 3:[Country code]:STRING=[AS]:String 4:[Continent]:STRING=[Oceania]:String 5:[Capital]:STRING=[null] 6:[Population]:STRING=[54194]:String 7:[Area]:STRING=[199]:String 8:[Coastline]:STRING=[116]:String 9:[Government form]:STRING=[Presidential democracy (self-governing territory of the US)]:String 10:[Currency]:STRING=[Dollar]:String 11:[Currency code]:STRING=[USD]:String 12:[Dialing prefix]:STRING=[1-684]:String 13:[Birthrate]:STRING=[22.9]:String 14:[Deathrate]:STRING=[4.8]:String 15:[Life expectancy]:STRING=[75.4]:String 16:[Url]:STRING=[https://www.laenderdaten.info/Ozeanien/Amerikanisch-Samoa/index.php]:String } ----------------------------------------------- 1 - Record { 0:[Country (en)]:STRING=[British Indian Ocean Territory]:String 1:[Country (de)]:STRING=[Britisches Territorium im Indischen Ozean]:String 2:[Country (local)]:STRING=[British Indian Ocean Territory]:String 3:[Country code]:STRING=[IO]:String 4:[Continent]:STRING=[Africa]:String 5:[Capital]:STRING=[null] 6:[Population]:STRING=[0]:String 7:[Area]:STRING=[54400]:String 8:[Coastline]:STRING=[698]:String 9:[Government form]:STRING=[British overseas territory]:String 10:[Currency]:STRING=[Dollar]:String 11:[Currency code]:STRING=[USD]:String 12:[Dialing prefix]:STRING=[246]:String 13:[Birthrate]:STRING=[0]:String 14:[Deathrate]:STRING=[0]:String 15:[Life expectancy]:STRING=[0]:String 16:[Url]:STRING=[https://www.laenderdaten.info/Afrika/Britisches-Territorium-im-Indischen-Ozean/index.php]:String }
...