Lookup in a Flat File

Updated: Jun 8, 2023

In this example, you will learn how you can use DataPipeline to look up data in flat files and combine the result in a new output file.

Lookup mainly searches for data in another source to merge with each record passing through this lookup's transformer -- the streaming version of a join.

Lookup is similar to how SQL queries work where you have, for example, an id of a product and you would like to get all data that is associated with that id.

Input CSV files

dbip-country-lite-2018-10.csv

0.0.0.0,0.255.255.255,ZZ
1.0.0.0,1.0.0.255,AU
1.0.1.0,1.0.3.255,CN
1.0.4.0,1.0.7.255,AU
1.0.8.0,1.0.15.255,CN
...

countries.csv

"Country (en)";"Country (de)";"Country (local)";"Country code";"Continent";"Capital";"Population";"Area";"Coastline";"Government form";"Currency";"Currency code";"Dialing prefix";"Birthrate";"Deathrate";"Life expectancy";"Url"
"Afghanistan";"Afghanistan";"Afganistan/Afqanestan";"AF";"Asia";;33332025;652230;0;"Presidential islamic republic";"Afghani";"AFN";93;38.3;13.7;51.3;"https://www.laenderdaten.info/Asien/Afghanistan/index.php"
"Egypt";"Ägypten";"Misr";"EG";"Africa";;94666993;1001450;2450;"Presidential republic";"Pfund";"EGP";20;30.3;4.7;72.7;"https://www.laenderdaten.info/Afrika/Aegypten/index.php"
"Åland Islands";"Ålandinseln";"Åland";"AX";"Europe";;29013;1580;0;"Autonomous region of Finland";"Euro";"EUR";358;0;0;0;"https://www.laenderdaten.info/Europa/Aland/index.php"
"Albania";"Albanien";"Shqipëria";"AL";"Europe";;3038594;28748;362;"parliamentary republic";"Lek";"ALL";355;13.1;6.7;78.3;"https://www.laenderdaten.info/Europa/Albanien/index.php"
"Algeria";"Algerien";"Al-Jaza’ir/Algérie";"DZ";"Africa";;40263711;2381741;998;"Presidential republic";"Dinar";"DZD";213;23;4.3;76.8;"https://www.laenderdaten.info/Afrika/Algerien/index.php"
...

Java Code Listing

/*
 * Copyright (c) 2006-2022 North Concepts Inc.  All rights reserved.
 * Proprietary and Confidential.  Use is subject to license terms.
 * 
 * https://northconcepts.com/data-pipeline/licensing/
 */
package com.northconcepts.datapipeline.examples.lookups;

import java.io.File;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.FieldList;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.csv.CSVWriter;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.transform.TransformingReader;
import com.northconcepts.datapipeline.transform.lookup.DataReaderLookup;
import com.northconcepts.datapipeline.transform.lookup.Lookup;
import com.northconcepts.datapipeline.transform.lookup.LookupTransformer;

public class LookupDataInFlatFile {

    private static final String INPUT_FOLDER = "data/input";
    private static final String OUTPUT_FOLDER = "data/output";

    public static void main(String[] args) {
        
        DataReader reader = new CSVReader(new File(INPUT_FOLDER, "dbip-country-lite-2018-10.csv"))
                .setFieldNamesInFirstRow(false)
                .setFieldNames("ip_start", "ip_end", "country");
        
        Lookup lookupCountries = new DataReaderLookup(
                new CSVReader(new File(INPUT_FOLDER, "countries.csv")).setFieldSeparator(';').setFieldNamesInFirstRow(true),
                new FieldList("Country code"));        
        
        reader = new TransformingReader(reader)
                .add(new LookupTransformer(new FieldList("country"), lookupCountries).setAllowNoResults(false))
                .setExceptionOnFailure(false);
        
        DataWriter writer = new CSVWriter(new File(OUTPUT_FOLDER, "ip-countries.csv"))
                .setFieldNamesInFirstRow(true);
        
        Job.run(reader, writer);
    }

}

Code walkthrough

  1. CSVReader is created corresponding to the input file dbip-country-lite-2018-10.csv. The first row in the file includes 0.0.0.0, 0.255.255.255 and ZZ
  2. CSVReader.setFieldNamesInFirstRow(false) and setFieldNames("ip_start", "ip_end", "country") methods are used to specify the field names in the first row of the input dataset.
  3. DataReaderLookup object accepts two arguments:
    • DataReader object to read the input file. In the example, it is CSVReader class that gets data from countries.csv
    • parameterFields are the fields that need to be matched with another dataset. In other words, this field specifies the common field in two datasets. In the example for this argument, "Country code" is wrapped in the FieldList object. 
  4. TransformingReader is a proxy that applies transformations to records passing through. 
  5. The lookup matches "dbip-country-lite-2018-10.csv".[country] with "countries.csv".[Country code]. The reader contains the full data in two files merged with the common field.   
  6. Data is transferred from the reader to the CSVWriter via Job.run() method. 

Output

ip_start,ip_end,country,Country (en),Country (de),Country (local),Country code,Continent,Capital,Population,Area,Coastline,Government form,Currency,Currency code,Dialing prefix,Birthrate,Deathrate,Life expectancy,Url
1.0.0.0,1.0.0.255,AU,Australia,Australien,Australia,AU,Australia,,22992654,7741220,25760,Parliamentary democracy (under constitutional monarchy),Dollar,AUD,61,12.1,7.2,82.2,https://www.laenderdaten.info/Australien/Australien/index.php
1.0.1.0,1.0.3.255,CN,China,China,Zhongquo,CN,Asia,,1373541278,9596960,14500,People's republic (communist 	one-party system),Yuan,CNY,86,12.4,7.7,75.5,https://www.laenderdaten.info/Asien/China/index.php
1.0.4.0,1.0.7.255,AU,Australia,Australien,Australia,AU,Australia,,22992654,7741220,25760,Parliamentary democracy (under constitutional monarchy),Dollar,AUD,61,12.1,7.2,82.2,https://www.laenderdaten.info/Australien/Australien/index.php
1.0.8.0,1.0.15.255,CN,China,China,Zhongquo,CN,Asia,,1373541278,9596960,14500,People's republic (communist 	one-party system),Yuan,CNY,86,12.4,7.7,75.5,https://www.laenderdaten.info/Asien/China/index.php
1.0.16.0,1.0.31.255,JP,Japan,Japan,Nihon/Nippon,JP,Asia,,126702133,377915,29751,Parliamentary constitutional monarchy,Yen,JPY,81,7.8,9.6,85,https://www.laenderdaten.info/Asien/Japan/index.php
....
Mobile Analytics