Lookup in a Flat File
Updated: Jul 5, 2023
This example performs data lookups in flat files, allowing users to retrieve information from multiple files and combine them based on one or more common fields.
This is useful in cases where data is spread across different files or systems, enabling users to integrate and consolidate the data for further analysis or processing.
Input CSV files
dbip-country-lite-2018-10.csv
0.0.0.0,0.255.255.255,ZZ 1.0.0.0,1.0.0.255,AU 1.0.1.0,1.0.3.255,CN 1.0.4.0,1.0.7.255,AU 1.0.8.0,1.0.15.255,CN ...
countries.csv
"Country (en)";"Country (de)";"Country (local)";"Country code";"Continent";"Capital";"Population";"Area";"Coastline";"Government form";"Currency";"Currency code";"Dialing prefix";"Birthrate";"Deathrate";"Life expectancy";"Url" "Afghanistan";"Afghanistan";"Afganistan/Afqanestan";"AF";"Asia";;33332025;652230;0;"Presidential islamic republic";"Afghani";"AFN";93;38.3;13.7;51.3;"https://www.laenderdaten.info/Asien/Afghanistan/index.php" "Egypt";"Ägypten";"Misr";"EG";"Africa";;94666993;1001450;2450;"Presidential republic";"Pfund";"EGP";20;30.3;4.7;72.7;"https://www.laenderdaten.info/Afrika/Aegypten/index.php" "Åland Islands";"Ålandinseln";"Åland";"AX";"Europe";;29013;1580;0;"Autonomous region of Finland";"Euro";"EUR";358;0;0;0;"https://www.laenderdaten.info/Europa/Aland/index.php" "Albania";"Albanien";"Shqipëria";"AL";"Europe";;3038594;28748;362;"parliamentary republic";"Lek";"ALL";355;13.1;6.7;78.3;"https://www.laenderdaten.info/Europa/Albanien/index.php" "Algeria";"Algerien";"Al-Jaza’ir/Algérie";"DZ";"Africa";;40263711;2381741;998;"Presidential republic";"Dinar";"DZD";213;23;4.3;76.8;"https://www.laenderdaten.info/Afrika/Algerien/index.php"
...
Java Code Listing
package com.northconcepts.datapipeline.examples.lookups; import java.io.File; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.core.FieldList; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.csv.CSVWriter; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.transform.TransformingReader; import com.northconcepts.datapipeline.transform.lookup.DataReaderLookup; import com.northconcepts.datapipeline.transform.lookup.Lookup; import com.northconcepts.datapipeline.transform.lookup.LookupTransformer; public class LookupDataInFlatFile { private static final String INPUT_FOLDER = "data/input"; private static final String OUTPUT_FOLDER = "data/output"; public static void main(String[] args) { DataReader reader = new CSVReader(new File(INPUT_FOLDER, "dbip-country-lite-2018-10.csv")) .setFieldNamesInFirstRow(false) .setFieldNames("ip_start", "ip_end", "country"); Lookup lookupCountries = new DataReaderLookup( new CSVReader(new File(INPUT_FOLDER, "countries.csv")).setFieldSeparator(';').setFieldNamesInFirstRow(true), new FieldList("Country code")); reader = new TransformingReader(reader) .add(new LookupTransformer(new FieldList("country"), lookupCountries).setAllowNoResults(false)) .setExceptionOnFailure(false); DataWriter writer = new CSVWriter(new File(OUTPUT_FOLDER, "ip-countries.csv")) .setFieldNamesInFirstRow(true); Job.run(reader, writer); } }
Code walkthrough
- CSVReader is created corresponding to the input file
dbip-country-lite-2018-10.csv
. The first row in the file includes0.0.0.0
,0.255.255.255
andZZ
. CSVReader.setFieldNamesInFirstRow(false)
andsetFieldNames("ip_start", "ip_end", "country")
methods are used to specify the field names in the first row of the input dataset.- DataReaderLookup object accepts two arguments:
- DataReader object to read the input file. In the example, it is CSVReader class that gets data from
countries.csv
. parameterFields
are the fields that need to be matched with another dataset. In other words, this field specifies the common field in two datasets. In the example for this argument, "Country code" is wrapped in the FieldList object.
- DataReader object to read the input file. In the example, it is CSVReader class that gets data from
- TransformingReader is a proxy that applies transformations to records passing through.
- The lookup matches "dbip-country-lite-2018-10.csv".[country] with "countries.csv".[Country code]. The reader contains the full data in two files merged with the common field.
- Data is transferred from the
reader
to theCSVWriter
via Job.run() method.
Output
ip_start,ip_end,country,Country (en),Country (de),Country (local),Country code,Continent,Capital,Population,Area,Coastline,Government form,Currency,Currency code,Dialing prefix,Birthrate,Deathrate,Life expectancy,Url 1.0.0.0,1.0.0.255,AU,Australia,Australien,Australia,AU,Australia,,22992654,7741220,25760,Parliamentary democracy (under constitutional monarchy),Dollar,AUD,61,12.1,7.2,82.2,https://www.laenderdaten.info/Australien/Australien/index.php 1.0.1.0,1.0.3.255,CN,China,China,Zhongquo,CN,Asia,,1373541278,9596960,14500,People's republic (communist one-party system),Yuan,CNY,86,12.4,7.7,75.5,https://www.laenderdaten.info/Asien/China/index.php 1.0.4.0,1.0.7.255,AU,Australia,Australien,Australia,AU,Australia,,22992654,7741220,25760,Parliamentary democracy (under constitutional monarchy),Dollar,AUD,61,12.1,7.2,82.2,https://www.laenderdaten.info/Australien/Australien/index.php 1.0.8.0,1.0.15.255,CN,China,China,Zhongquo,CN,Asia,,1373541278,9596960,14500,People's republic (communist one-party system),Yuan,CNY,86,12.4,7.7,75.5,https://www.laenderdaten.info/Asien/China/index.php 1.0.16.0,1.0.31.255,JP,Japan,Japan,Nihon/Nippon,JP,Asia,,126702133,377915,29751,Parliamentary constitutional monarchy,Yen,JPY,81,7.8,9.6,85,https://www.laenderdaten.info/Asien/Japan/index.php
....