Filter Records

Data Pipeline can filter records either programmatically or using the built-in expression language.

This example uses FilteringReader to select records from a CSV file and writes them to the console. It can easily be modified to write to a database or other target. The actual record selection here is done using both the rule-based FieldFilters and the FilterExpressions.

FilteringReader can also be extended by creating a brand new filter or by creating a new rule for use with FieldFilter.

Filter Input

The following CSV file (credit-balance-01.csv) will be used as the input for this example

Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating
101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A
312,Butler,Gerard,90.00,1000.00,8/6/2003,B
868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B
761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A
317,Murray,Bill,789.65,5000.00,2/5/2007,C

Java Code

The code adds 3 filters to FilteringReader.

The first is a rules-based filter (FieldFilter) on the "Ratings" field to match records where Ratings:

  1. is not null
  2. is a string, not a number or other type
  3. is one of "B" or "C"

The second filter matches records where "Account" :

  1. is not null
  2. is a string
  3. matches the, all numbers, regular expression pattern [0-9]*

The last filter uses the built-in expression language to select records where "CreditLimit" is between 0 and 5000 and not exceeded by the balance.

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.filter.FieldFilter;
import com.northconcepts.datapipeline.filter.FilterExpression;
import com.northconcepts.datapipeline.filter.FilteringReader;
import com.northconcepts.datapipeline.filter.rule.IsJavaType;
import com.northconcepts.datapipeline.filter.rule.IsNotNull;
import com.northconcepts.datapipeline.filter.rule.PatternMatch;
import com.northconcepts.datapipeline.filter.rule.ValueMatch;
import com.northconcepts.datapipeline.job.Job;

public class FilterRecords {
    
    public static void main(String[] args) throws Throwable {
        DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv"))
            .setFieldNamesInFirstRow(true);
        
        FilteringReader filteringReader = new FilteringReader(reader);
        
        filteringReader.add(new FieldFilter("Rating")
                .addRule(new IsNotNull())
                .addRule(new IsJavaType(String.class))
                .addRule(new ValueMatch("B", "C")));
        
        filteringReader.add(new FieldFilter("Account")
                .addRule(new IsNotNull())
                .addRule(new IsJavaType(String.class))
                .addRule(new PatternMatch("[0-9]*")));
        
        filteringReader.add(new FilterExpression(
                "parseDouble(CreditLimit) >= 0 && parseDouble(CreditLimit) <= 5000 and parseDouble(Balance) <= parseDouble(CreditLimit)"));

        DataWriter writer = new StreamWriter(System.out);

        Job.run(filteringReader, writer);
    }

}

Filter Output

The output records printed to the console would look like the following.

-----------------------------------------------
0 - Record {
    0:[Account]:STRING=[312]:String
    1:[LastName]:STRING=[Butler]:String
    2:[FirstName]:STRING=[Gerard]:String
    3:[Balance]:STRING=[90.00]:String
    4:[CreditLimit]:STRING=[1000.00]:String
    5:[AccountCreated]:STRING=[8/6/2003]:String
    6:[Rating]:STRING=[B]:String
}

-----------------------------------------------
1 - Record {
    0:[Account]:STRING=[317]:String
    1:[LastName]:STRING=[Murray]:String
    2:[FirstName]:STRING=[Bill]:String
    3:[Balance]:STRING=[789.65]:String
    4:[CreditLimit]:STRING=[5000.00]:String
    5:[AccountCreated]:STRING=[2/5/2007]:String
    6:[Rating]:STRING=[C]:String
}

-----------------------------------------------
2 records

]]>
Mobile Analytics