Filter Records
Data Pipeline can filter records either programmatically or using the built-in expression language.
This example uses FilteringReader to select records from a CSV file and writes them to the console. It can easily be modified to write to a database or other target. The actual record selection here is done using both the rule-based FieldFilters and the FilterExpressions.
FilteringReader can also be extended by creating a brand new filter or by creating a new rule for use with FieldFilter.
Filter Input
The following CSV file (credit-balance-01.csv) will be used as the input for this example
Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating 101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A 312,Butler,Gerard,90.00,1000.00,8/6/2003,B 868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B 761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A 317,Murray,Bill,789.65,5000.00,2/5/2007,C
Java Code
The code adds 3 filters to FilteringReader.
The first is a rules-based filter (FieldFilter) on the "Ratings" field to match records where Ratings:
- is not null
- is a string, not a number or other type
- is one of "B" or "C"
The second filter matches records where "Account" :
- is not null
- is a string
- matches the, all numbers, regular expression pattern
[0-9]*
The last filter uses the built-in expression language to select records where "CreditLimit" is between 0 and 5000 and not exceeded by the balance.
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.filter.FieldFilter; import com.northconcepts.datapipeline.filter.FilterExpression; import com.northconcepts.datapipeline.filter.FilteringReader; import com.northconcepts.datapipeline.filter.rule.IsJavaType; import com.northconcepts.datapipeline.filter.rule.IsNotNull; import com.northconcepts.datapipeline.filter.rule.PatternMatch; import com.northconcepts.datapipeline.filter.rule.ValueMatch; import com.northconcepts.datapipeline.job.Job; public class FilterRecords { public static void main(String[] args) throws Throwable { DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv")) .setFieldNamesInFirstRow(true); FilteringReader filteringReader = new FilteringReader(reader); filteringReader.add(new FieldFilter("Rating") .addRule(new IsNotNull()) .addRule(new IsJavaType(String.class)) .addRule(new ValueMatch("B", "C"))); filteringReader.add(new FieldFilter("Account") .addRule(new IsNotNull()) .addRule(new IsJavaType(String.class)) .addRule(new PatternMatch("[0-9]*"))); filteringReader.add(new FilterExpression( "parseDouble(CreditLimit) >= 0 && parseDouble(CreditLimit) <= 5000 and parseDouble(Balance) <= parseDouble(CreditLimit)")); DataWriter writer = new StreamWriter(System.out); Job.run(filteringReader, writer); } }
Filter Output
The output records printed to the console would look like the following.
----------------------------------------------- 0 - Record { 0:[Account]:STRING=[312]:String 1:[LastName]:STRING=[Butler]:String 2:[FirstName]:STRING=[Gerard]:String 3:[Balance]:STRING=[90.00]:String 4:[CreditLimit]:STRING=[1000.00]:String 5:[AccountCreated]:STRING=[8/6/2003]:String 6:[Rating]:STRING=[B]:String } ----------------------------------------------- 1 - Record { 0:[Account]:STRING=[317]:String 1:[LastName]:STRING=[Murray]:String 2:[FirstName]:STRING=[Bill]:String 3:[Balance]:STRING=[789.65]:String 4:[CreditLimit]:STRING=[5000.00]:String 5:[AccountCreated]:STRING=[2/5/2007]:String 6:[Rating]:STRING=[C]:String } ----------------------------------------------- 2 records ]]>