Lookup Data in any Data Reader
Updated: Jul 13, 2023
This example shows how to enrich a dataset by combining it with data from any DataReader using DataPipeline. The code uses DataReaderLookup to read and find records using one or more common columns, serving as a streaming version of SQL join.
Users dealing with financial data can leverage this example to combine transaction data from multiple sources, such as bank statements or payment processors, using a shared transaction ID or account number.
Input CSV files
credit-balance-01.csv
Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating 101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A 312,Butler,Gerard,90.00,1000.00,8/6/2003,B 868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B 761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A 317,Murray,Bill,789.65,5000.00,2/5/2007,C
rating-table-01.csv
rating_code,rating_description A,Class A B,Class B C,Class C
Java Code Listing
package com.northconcepts.datapipeline.examples.cookbook;
import java.io.File;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.FieldList;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.transform.TransformingReader;
import com.northconcepts.datapipeline.transform.lookup.DataReaderLookup;
import com.northconcepts.datapipeline.transform.lookup.Lookup;
import com.northconcepts.datapipeline.transform.lookup.LookupTransformer;
public class LookupDataInAnyDataReader {
public static void main(String[] args) throws Throwable {
DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv"))
.setFieldNamesInFirstRow(true);
/* This lookup matches
* [credit-balance-01.csv].[Rating] to [rating-table-01.csv].[rating_code]
* to return
* [rating-table-01.csv].[rating_description]
*/
Lookup lookup = new DataReaderLookup(
new CSVReader(new File("example/data/input/rating-table-01.csv"))
.setFieldNamesInFirstRow(true),
new FieldList("rating_code"),
new FieldList("rating_description")
);
reader = new TransformingReader(reader)
.add(new LookupTransformer(new FieldList("Rating"), lookup));
Job.run(reader, new StreamWriter(System.out));
}
}
Code walkthrough
- CSVReader is created corresponding to the input file
credit-balance-01.csv. - The
CSVReader.setFieldNamesInFirstRow(true)method is invoked to specify that the names specified in the first row should be used as field names. - DataReaderLookup object accepts three arguments:
- DataReader object to read the input file. In the example, it is CSVReader class that gets data from
rating-table-01.csv. parameterFieldsare the fields that need to be matched with another dataset. In other words, this field specifies the common field in two datasets. In the example for this argument, "rating_code" is wrapped in the FieldList object.resultFieldsare the fields that are returned after the lookup operation.
- DataReader object to read the input file. In the example, it is CSVReader class that gets data from
- TransformingReader is a proxy that applies transformations to records passing through.
- The lookup matches "credit-balance-01.csv".[Rating] with "rating-table-01.csv".[rating_code]. The reader contains the full data from
credit-balance-01.csvalong with therating_descriptionfield from the other file. - Data is transferred from the
readerto theCSVWritervia Job.run() method.
Output
-----------------------------------------------
0 - Record {
0:[Account]:STRING=[101]:String
1:[LastName]:STRING=[Reeves]:String
2:[FirstName]:STRING=[Keanu]:String
3:[Balance]:STRING=[9315.45]:String
4:[CreditLimit]:STRING=[10000.00]:String
5:[AccountCreated]:STRING=[1/17/1998]:String
6:[Rating]:STRING=[A]:String
7:[rating_description]:STRING=[Class A]:String
}
-----------------------------------------------
1 - Record {
0:[Account]:STRING=[312]:String
1:[LastName]:STRING=[Butler]:String
2:[FirstName]:STRING=[Gerard]:String
3:[Balance]:STRING=[90.00]:String
4:[CreditLimit]:STRING=[1000.00]:String
5:[AccountCreated]:STRING=[8/6/2003]:String
6:[Rating]:STRING=[B]:String
7:[rating_description]:STRING=[Class B]:String
}
-----------------------------------------------
2 - Record {
0:[Account]:STRING=[868]:String
1:[LastName]:STRING=[Hewitt]:String
2:[FirstName]:STRING=[Jennifer Love]:String
3:[Balance]:STRING=[0]:String
4:[CreditLimit]:STRING=[17000.00]:String
5:[AccountCreated]:STRING=[5/25/1985]:String
6:[Rating]:STRING=[B]:String
7:[rating_description]:STRING=[Class B]:String
}
-----------------------------------------------
3 - Record {
0:[Account]:STRING=[761]:String
1:[LastName]:STRING=[Pinkett-Smith]:String
2:[FirstName]:STRING=[Jada]:String
3:[Balance]:STRING=[49654.87]:String
4:[CreditLimit]:STRING=[100000.00]:String
5:[AccountCreated]:STRING=[12/5/2006]:String
6:[Rating]:STRING=[A]:String
7:[rating_description]:STRING=[Class A]:String
}
-----------------------------------------------
4 - Record {
0:[Account]:STRING=[317]:String
1:[LastName]:STRING=[Murray]:String
2:[FirstName]:STRING=[Bill]:String
3:[Balance]:STRING=[789.65]:String
4:[CreditLimit]:STRING=[5000.00]:String
5:[AccountCreated]:STRING=[2/5/2007]:String
6:[Rating]:STRING=[C]:String
7:[rating_description]:STRING=[Class C]:String
}
-----------------------------------------------
5 records
