Read a Set of Files

Updated: Jul 3, 2023

This example shows you how to read multiple CSV files sequentially in a single job.  While this example uses a set of CSV files, the input set can be any type supported by DataPipeline.  They can even be multiple, different types.

Input CSV files

Countries-1.csv (15 total records)

Country (en),Country (de),Country (local),Country code,Continent,Capital,Population,Area,Coastline,Government form,Currency,Currency code,Dialing prefix,Birthrate,Deathrate,Life expectancy,Url
American Samoa,Amerikanisch Samoa,Amerika Samoa,AS,Oceania,,54194,199,116,Presidential democracy (self-governing territory of the US),Dollar,USD,1-684,22.9,4.8,75.4,https://www.laenderdaten.info/Ozeanien/Amerikanisch-Samoa/index.php
British Indian Ocean Territory,Britisches Territorium im Indischen Ozean,British Indian Ocean Territory,IO,Africa,,0,54400,698,British overseas territory,Dollar,USD,246,0,0,0,https://www.laenderdaten.info/Afrika/Britisches-Territorium-im-Indischen-Ozean/index.php
...

Countries-2.csv (233 total records)

Country (en),Country (de),Country (local),Country code,Continent,Capital,Population,Area,Coastline,Government form,Currency,Currency code,Dialing prefix,Birthrate,Deathrate,Life expectancy,Url
Afghanistan,Afghanistan,Afganistan/Afqanestan,AF,Asia,,33332025,652230,0,Presidential islamic republic,Afghani,AFN,93,38.3,13.7,51.3,https://www.laenderdaten.info/Asien/Afghanistan/index.php
Egypt,Ägypten,Misr,EG,Africa,,94666993,1001450,2450,Presidential republic,Pfund,EGP,20,30.3,4.7,72.7,https://www.laenderdaten.info/Afrika/Aegypten/index.php
Åland Islands,Ålandinseln,Åland,AX,Europe,,29013,1580,0,Autonomous region of Finland,Euro,EUR,358,0,0,0,https://www.laenderdaten.info/Europa/Aland/index.php
...

Java code listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.SequenceReader;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.job.Job;

public class ReadASetOfFiles {
    
    private static final String INPUT = "example/data/input";
    private static final String[] FILES = {"countries-1.csv", "countries-2.csv"};

    public static void main(String[] args) {
        
        SequenceReader sequenceReader = new SequenceReader();
        
        for (int i = 0; i < FILES.length; i++) {
            sequenceReader.add(new CSVReader(new File(INPUT, FILES[i]))
                    .setFieldNamesInFirstRow(true));            
        }
        
        DataReader reader = sequenceReader;
        DataWriter writer = StreamWriter.newSystemOutWriter();
        
        Job.run(reader, writer);
    }

}

Code walkthrough

  1. An instance of SequenceReader class is created. It is used to combine all the input into a single stream.
  2. In the for loop, a CSVReader object is created using the file path of each input file and added to the sequenceReader.
  3. The CSVReader.setFieldNamesInFirstRow(true) method is invoked to specify that the names specified in the first row should be used as field names.
  4. SequenceReader object is parsed to DataReader object called reader.
  5. Data is transferred from the reader to the StreamWriter via Job.run() method.
  6. StreamWriter will be used to print the results to the console.

CSVReader

CSVReader is an input reader which can be used to read CSV files. It is a sub-class of TextReader and inherits the open and close among other methods. The CSVReader.setFieldNamesInFirstRow(true) method causes the CSVReader to use the names specified in the first row of the input data as field names. If this method is not invoked, the fields would be named as A1, A2, etc. similar to MS Excel. If those fields names need to be changed, a rename transformation can be added on top of CSVReader or any other type (Refer Rename a field for example).

Console output

-----------------------------------------------
0 - Record {
    0:[Country (en)]:STRING=[American Samoa]:String
    1:[Country (de)]:STRING=[Amerikanisch Samoa]:String
    2:[Country (local)]:STRING=[Amerika Samoa]:String
    3:[Country code]:STRING=[AS]:String
    4:[Continent]:STRING=[Oceania]:String
    5:[Capital]:STRING=[null]
    6:[Population]:STRING=[54194]:String
    7:[Area]:STRING=[199]:String
    8:[Coastline]:STRING=[116]:String
    9:[Government form]:STRING=[Presidential democracy (self-governing territory of the US)]:String
    10:[Currency]:STRING=[Dollar]:String
    11:[Currency code]:STRING=[USD]:String
    12:[Dialing prefix]:STRING=[1-684]:String
    13:[Birthrate]:STRING=[22.9]:String
    14:[Deathrate]:STRING=[4.8]:String
    15:[Life expectancy]:STRING=[75.4]:String
    16:[Url]:STRING=[https://www.laenderdaten.info/Ozeanien/Amerikanisch-Samoa/index.php]:String
}

-----------------------------------------------
1 - Record {
    0:[Country (en)]:STRING=[British Indian Ocean Territory]:String
    1:[Country (de)]:STRING=[Britisches Territorium im Indischen Ozean]:String
    2:[Country (local)]:STRING=[British Indian Ocean Territory]:String
    3:[Country code]:STRING=[IO]:String
    4:[Continent]:STRING=[Africa]:String
    5:[Capital]:STRING=[null]
    6:[Population]:STRING=[0]:String
    7:[Area]:STRING=[54400]:String
    8:[Coastline]:STRING=[698]:String
    9:[Government form]:STRING=[British overseas territory]:String
    10:[Currency]:STRING=[Dollar]:String
    11:[Currency code]:STRING=[USD]:String
    12:[Dialing prefix]:STRING=[246]:String
    13:[Birthrate]:STRING=[0]:String
    14:[Deathrate]:STRING=[0]:String
    15:[Life expectancy]:STRING=[0]:String
    16:[Url]:STRING=[https://www.laenderdaten.info/Afrika/Britisches-Territorium-im-Indischen-Ozean/index.php]:String
}

...
Mobile Analytics