Validate Incoming Data

This example shows you how to read and validate data in a format independent way using DataPipeline.  The validation checks specific fields based on predefined rules.  The rules are used to filter and return only the data that meets their criteria.  Normally, any failing data would cause an exception, but this example shows how to disable that behavior.

This example uses CSV, but it can be updated to validate any type of data that can be read or loaded by DataPipeline.

Input CSV file

Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating
101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A
312,Butler,Gerard,90.00,1000.00,8/6/2003,B
868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B
761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A
317,Murray,Bill,789.65,5000.00,2/5/2007,C

Java Code Listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import org.apache.log4j.Logger;

import com.northconcepts.datapipeline.core.DataEndpoint;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.Messages;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.filter.FieldFilter;
import com.northconcepts.datapipeline.filter.FilterExpression;
import com.northconcepts.datapipeline.filter.rule.IsJavaType;
import com.northconcepts.datapipeline.filter.rule.IsNotNull;
import com.northconcepts.datapipeline.filter.rule.PatternMatch;
import com.northconcepts.datapipeline.filter.rule.ValueMatch;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.validate.ValidatingReader;

public class ValidateIncomingData {
    
    public static final Logger log = DataEndpoint.log; 

    public static void main(String[] args) throws Throwable {
        DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv"))
            .setFieldNamesInFirstRow(true);
        
        ValidatingReader validatingReader = new ValidatingReader(reader)
            .setExceptionOnFailure(false)
            .setRecordStackTraceInMessage(false);
        
        validatingReader.add(new FieldFilter("Rating")
                .addRule(new IsNotNull())
                .addRule(new IsJavaType(String.class))
                .addRule(new ValueMatch("A", "B", "Z")));  // should be A, B, C
        
        validatingReader.add(new FieldFilter("Account")
                .addRule(new IsNotNull())
                .addRule(new IsJavaType(String.class))
                .addRule(new PatternMatch("[0-7]*")));  // should be [0-9]*
        
        validatingReader.add(new FilterExpression(
                "parseDouble(CreditLimit) >= 0 && parseDouble(CreditLimit) <= 100000 and parseDouble(Balance) <= parseDouble(CreditLimit)"));

        DataWriter writer = new StreamWriter(System.out);

        Job.run(validatingReader, writer);
        
        log.info("messages: " + Messages.getCurrent());
    }

}

Code walkthrough

  1. CSVReader is created corresponding to the input file credit-balance-01.csv.
  2. The CSVReader.setFieldNamesInFirstRow(true) method is invoked to specify that the names specified in the first row should be used as field names.
  3. A ValidatingReader is created to apply one or more filters to the incoming data sequentially.
  4. setExceptionOnFailure(false) and setRecordStackTraceInMessage(false) methods are used to specify the behavior of the validatingReader when the data does match the validation/filter rules. In the given example, the program is customized not to throw exceptions and not to include stack traces in the message. 
  5. FieldFilter object is created to set validation rules. Its constructor accepts the name(s) of the fields to be validated. In the given example, the "Rating" field is validated first.
  6.  The validation rules are set with addRule() method. Checking the value for not null and the String type is implemented using IsNotNull() and IsJavaType(String.class) objects respectively. 
  7. In addition to the rules specified above, rules for checking the data with the predefined set of values or regex pattern matching are also available in FieldFilter. ValueMatch<String>("A", "B", "Z") checks if the value is either "A", "B" or "Z". PatternMatch("[0-7]*") checks that only numbers in the range between 0 and 7 are allowed in the string.
  8. FilterExpression is an alternative to FieldFilter to apply filtration rules. Unlike FieldFilter, however, FilterExpression is used with an expression where logic can be defined for the data from multiple columns. In the example, parseDouble(CreditLimit) >= 0 && parseDouble(CreditLimit) <= 100000 and parseDouble(Balance) <= parseDouble(CreditLimit) expression validates the CreditLimit field for the lower and upper limits, and compares it with Balance.
  9. Data is transferred from the validatingReader to the StreamWriter(System.out) via Job.run() method.
  10. Validation messages are displayed in logs.

Output

-----------------------------------------------
0 - Record {
    0:[Account]:STRING=[101]:String
    1:[LastName]:STRING=[Reeves]:String
    2:[FirstName]:STRING=[Keanu]:String
    3:[Balance]:STRING=[9315.45]:String
    4:[CreditLimit]:STRING=[10000.00]:String
    5:[AccountCreated]:STRING=[1/17/1998]:String
    6:[Rating]:STRING=[A]:String
}

-----------------------------------------------
1 - Record {
    0:[Account]:STRING=[312]:String
    1:[LastName]:STRING=[Butler]:String
    2:[FirstName]:STRING=[Gerard]:String
    3:[Balance]:STRING=[90.00]:String
    4:[CreditLimit]:STRING=[1000.00]:String
    5:[AccountCreated]:STRING=[8/6/2003]:String
    6:[Rating]:STRING=[B]:String
}

-----------------------------------------------
2 - Record {
    0:[Account]:STRING=[761]:String
    1:[LastName]:STRING=[Pinkett-Smith]:String
    2:[FirstName]:STRING=[Jada]:String
    3:[Balance]:STRING=[49654.87]:String
    4:[CreditLimit]:STRING=[100000.00]:String
    5:[AccountCreated]:STRING=[12/5/2006]:String
    6:[Rating]:STRING=[A]:String
}

-----------------------------------------------
3 records
05:35:56,619 DEBUG [main] datapipeline:661 - job::Success
05:35:56,621  INFO [main] datapipeline:57 - messages: [2023-06-09 05:35:56.618 WARNING main validation [Account field: (is not null) and (value is of class java.lang.String) and (value's pattern matches [0-7]*)] failed on record #2 Record {
    0:[Account]:STRING=[868]:String
    1:[LastName]:STRING=[Hewitt]:String
    2:[FirstName]:STRING=[Jennifer Love]:String
    3:[Balance]:STRING=[0]:String
    4:[CreditLimit]:STRING=[17000.00]:String
    5:[AccountCreated]:STRING=[5/25/1985]:String
    6:[Rating]:STRING=[B]:String
}
]
[2023-06-09 05:35:56.618 WARNING main validation [Rating field: (is not null) and (value is of class java.lang.String) and (value matches [A] or [B] or [Z])] failed on record #3 Record {
    0:[Account]:STRING=[317]:String
    1:[LastName]:STRING=[Murray]:String
    2:[FirstName]:STRING=[Bill]:String
    3:[Balance]:STRING=[789.65]:String
    4:[CreditLimit]:STRING=[5000.00]:String
    5:[AccountCreated]:STRING=[2/5/2007]:String
    6:[Rating]:STRING=[C]:String
}
]
Mobile Analytics