Lookup Data in any Data Reader
Updated: Jul 13, 2023
This example shows how to enrich a dataset by combining it with data from any DataReader using DataPipeline. The code uses DataReaderLookup to read and find records using one or more common columns, serving as a streaming version of SQL join.
Users dealing with financial data can leverage this example to combine transaction data from multiple sources, such as bank statements or payment processors, using a shared transaction ID or account number.
Input CSV files
credit-balance-01.csv
Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating 101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A 312,Butler,Gerard,90.00,1000.00,8/6/2003,B 868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B 761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A 317,Murray,Bill,789.65,5000.00,2/5/2007,C
rating-table-01.csv
rating_code,rating_description A,Class A B,Class B C,Class C
Java Code Listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.FieldList; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.transform.TransformingReader; import com.northconcepts.datapipeline.transform.lookup.DataReaderLookup; import com.northconcepts.datapipeline.transform.lookup.Lookup; import com.northconcepts.datapipeline.transform.lookup.LookupTransformer; public class LookupDataInAnyDataReader { public static void main(String[] args) throws Throwable { DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv")) .setFieldNamesInFirstRow(true); /* This lookup matches * [credit-balance-01.csv].[Rating] to [rating-table-01.csv].[rating_code] * to return * [rating-table-01.csv].[rating_description] */ Lookup lookup = new DataReaderLookup( new CSVReader(new File("example/data/input/rating-table-01.csv")) .setFieldNamesInFirstRow(true), new FieldList("rating_code"), new FieldList("rating_description") ); reader = new TransformingReader(reader) .add(new LookupTransformer(new FieldList("Rating"), lookup)); Job.run(reader, new StreamWriter(System.out)); } }
Code walkthrough
- CSVReader is created corresponding to the input file
credit-balance-01.csv
. - The
CSVReader.setFieldNamesInFirstRow(true)
method is invoked to specify that the names specified in the first row should be used as field names. - DataReaderLookup object accepts three arguments:
- DataReader object to read the input file. In the example, it is CSVReader class that gets data from
rating-table-01.csv
. parameterFields
are the fields that need to be matched with another dataset. In other words, this field specifies the common field in two datasets. In the example for this argument, "rating_code" is wrapped in the FieldList object.resultFields
are the fields that are returned after the lookup operation.
- DataReader object to read the input file. In the example, it is CSVReader class that gets data from
- TransformingReader is a proxy that applies transformations to records passing through.
- The lookup matches "credit-balance-01.csv".[Rating] with "rating-table-01.csv".[rating_code]. The reader contains the full data from
credit-balance-01.csv
along with therating_description
field from the other file. - Data is transferred from the
reader
to theCSVWriter
via Job.run() method.
Output
----------------------------------------------- 0 - Record { 0:[Account]:STRING=[101]:String 1:[LastName]:STRING=[Reeves]:String 2:[FirstName]:STRING=[Keanu]:String 3:[Balance]:STRING=[9315.45]:String 4:[CreditLimit]:STRING=[10000.00]:String 5:[AccountCreated]:STRING=[1/17/1998]:String 6:[Rating]:STRING=[A]:String 7:[rating_description]:STRING=[Class A]:String } ----------------------------------------------- 1 - Record { 0:[Account]:STRING=[312]:String 1:[LastName]:STRING=[Butler]:String 2:[FirstName]:STRING=[Gerard]:String 3:[Balance]:STRING=[90.00]:String 4:[CreditLimit]:STRING=[1000.00]:String 5:[AccountCreated]:STRING=[8/6/2003]:String 6:[Rating]:STRING=[B]:String 7:[rating_description]:STRING=[Class B]:String } ----------------------------------------------- 2 - Record { 0:[Account]:STRING=[868]:String 1:[LastName]:STRING=[Hewitt]:String 2:[FirstName]:STRING=[Jennifer Love]:String 3:[Balance]:STRING=[0]:String 4:[CreditLimit]:STRING=[17000.00]:String 5:[AccountCreated]:STRING=[5/25/1985]:String 6:[Rating]:STRING=[B]:String 7:[rating_description]:STRING=[Class B]:String } ----------------------------------------------- 3 - Record { 0:[Account]:STRING=[761]:String 1:[LastName]:STRING=[Pinkett-Smith]:String 2:[FirstName]:STRING=[Jada]:String 3:[Balance]:STRING=[49654.87]:String 4:[CreditLimit]:STRING=[100000.00]:String 5:[AccountCreated]:STRING=[12/5/2006]:String 6:[Rating]:STRING=[A]:String 7:[rating_description]:STRING=[Class A]:String } ----------------------------------------------- 4 - Record { 0:[Account]:STRING=[317]:String 1:[LastName]:STRING=[Murray]:String 2:[FirstName]:STRING=[Bill]:String 3:[Balance]:STRING=[789.65]:String 4:[CreditLimit]:STRING=[5000.00]:String 5:[AccountCreated]:STRING=[2/5/2007]:String 6:[Rating]:STRING=[C]:String 7:[rating_description]:STRING=[Class C]:String } ----------------------------------------------- 5 records