Lookup Data in any Data Reader

Updated: Jul 13, 2023

This example shows how to enrich a dataset by combining it with data from any DataReader using DataPipeline.  The code uses DataReaderLookup to read and find records using one or more common columns, serving as a streaming version of SQL join.

Users dealing with financial data can leverage this example to combine transaction data from multiple sources, such as bank statements or payment processors, using a shared transaction ID or account number. 

Input CSV files

credit-balance-01.csv

Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating
101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A
312,Butler,Gerard,90.00,1000.00,8/6/2003,B
868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B
761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A
317,Murray,Bill,789.65,5000.00,2/5/2007,C

rating-table-01.csv

rating_code,rating_description
A,Class A
B,Class B
C,Class C

Java Code Listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.FieldList;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.transform.TransformingReader;
import com.northconcepts.datapipeline.transform.lookup.DataReaderLookup;
import com.northconcepts.datapipeline.transform.lookup.Lookup;
import com.northconcepts.datapipeline.transform.lookup.LookupTransformer;

public class LookupDataInAnyDataReader {
    
    public static void main(String[] args) throws Throwable {
        DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv"))
            .setFieldNamesInFirstRow(true);
        
        /* This lookup matches 
         *    [credit-balance-01.csv].[Rating] to [rating-table-01.csv].[rating_code]
         * to return    
         *   [rating-table-01.csv].[rating_description]
         */ 
        Lookup lookup = new DataReaderLookup(
                new CSVReader(new File("example/data/input/rating-table-01.csv"))
                    .setFieldNamesInFirstRow(true),
                new FieldList("rating_code"), 
                new FieldList("rating_description")
                );
        
        reader = new TransformingReader(reader)
                .add(new LookupTransformer(new FieldList("Rating"),  lookup));
        
        Job.run(reader, new StreamWriter(System.out));
    }
    
}

Code walkthrough

  1. CSVReader is created corresponding to the input file credit-balance-01.csv.
  2. The CSVReader.setFieldNamesInFirstRow(true) method is invoked to specify that the names specified in the first row should be used as field names.
  3. DataReaderLookup object accepts three arguments:
    • DataReader object to read the input file. In the example, it is CSVReader class that gets data from rating-table-01.csv.
    • parameterFields are the fields that need to be matched with another dataset. In other words, this field specifies the common field in two datasets. In the example for this argument, "rating_code" is wrapped in the FieldList object. 
    • resultFields are the fields that are returned after the lookup operation.
  4. TransformingReader is a proxy that applies transformations to records passing through.
  5. The lookup matches "credit-balance-01.csv".[Rating] with "rating-table-01.csv".[rating_code]. The reader contains the full data from credit-balance-01.csv along with the rating_description field from the other file. 
  6. Data is transferred from the reader to the CSVWriter via Job.run() method. 

Output

-----------------------------------------------
0 - Record {
    0:[Account]:STRING=[101]:String
    1:[LastName]:STRING=[Reeves]:String
    2:[FirstName]:STRING=[Keanu]:String
    3:[Balance]:STRING=[9315.45]:String
    4:[CreditLimit]:STRING=[10000.00]:String
    5:[AccountCreated]:STRING=[1/17/1998]:String
    6:[Rating]:STRING=[A]:String
    7:[rating_description]:STRING=[Class A]:String
}

-----------------------------------------------
1 - Record {
    0:[Account]:STRING=[312]:String
    1:[LastName]:STRING=[Butler]:String
    2:[FirstName]:STRING=[Gerard]:String
    3:[Balance]:STRING=[90.00]:String
    4:[CreditLimit]:STRING=[1000.00]:String
    5:[AccountCreated]:STRING=[8/6/2003]:String
    6:[Rating]:STRING=[B]:String
    7:[rating_description]:STRING=[Class B]:String
}

-----------------------------------------------
2 - Record {
    0:[Account]:STRING=[868]:String
    1:[LastName]:STRING=[Hewitt]:String
    2:[FirstName]:STRING=[Jennifer Love]:String
    3:[Balance]:STRING=[0]:String
    4:[CreditLimit]:STRING=[17000.00]:String
    5:[AccountCreated]:STRING=[5/25/1985]:String
    6:[Rating]:STRING=[B]:String
    7:[rating_description]:STRING=[Class B]:String
}

-----------------------------------------------
3 - Record {
    0:[Account]:STRING=[761]:String
    1:[LastName]:STRING=[Pinkett-Smith]:String
    2:[FirstName]:STRING=[Jada]:String
    3:[Balance]:STRING=[49654.87]:String
    4:[CreditLimit]:STRING=[100000.00]:String
    5:[AccountCreated]:STRING=[12/5/2006]:String
    6:[Rating]:STRING=[A]:String
    7:[rating_description]:STRING=[Class A]:String
}

-----------------------------------------------
4 - Record {
    0:[Account]:STRING=[317]:String
    1:[LastName]:STRING=[Murray]:String
    2:[FirstName]:STRING=[Bill]:String
    3:[Balance]:STRING=[789.65]:String
    4:[CreditLimit]:STRING=[5000.00]:String
    5:[AccountCreated]:STRING=[2/5/2007]:String
    6:[Rating]:STRING=[C]:String
    7:[rating_description]:STRING=[Class C]:String
}

-----------------------------------------------
5 records

Mobile Analytics