Validate Incoming Data
Updated: Jul 11, 2023
This example shows you how to read and validate data in a format independent way using DataPipeline. The validation checks specific fields based on predefined rules. The rules are used to filter and return only the data that meets their criteria. Normally, any failing data would cause an exception, but this example shows how to disable that behavior.
This example uses CSV, but it can be updated to validate any type of data that can be read or loaded by DataPipeline.
Input CSV file
Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating 101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A 312,Butler,Gerard,90.00,1000.00,8/6/2003,B 868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B 761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A 317,Murray,Bill,789.65,5000.00,2/5/2007,C
Java Code Listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import org.apache.log4j.Logger; import com.northconcepts.datapipeline.core.DataEndpoint; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.core.Messages; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.filter.FieldFilter; import com.northconcepts.datapipeline.filter.FilterExpression; import com.northconcepts.datapipeline.filter.rule.IsJavaType; import com.northconcepts.datapipeline.filter.rule.IsNotNull; import com.northconcepts.datapipeline.filter.rule.PatternMatch; import com.northconcepts.datapipeline.filter.rule.ValueMatch; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.validate.ValidatingReader; public class ValidateIncomingData { public static final Logger log = DataEndpoint.log; public static void main(String[] args) throws Throwable { DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv")) .setFieldNamesInFirstRow(true); ValidatingReader validatingReader = new ValidatingReader(reader) .setExceptionOnFailure(false) .setRecordStackTraceInMessage(false); validatingReader.add(new FieldFilter("Rating") .addRule(new IsNotNull()) .addRule(new IsJavaType(String.class)) .addRule(new ValueMatch("A", "B", "Z"))); // should be A, B, C validatingReader.add(new FieldFilter("Account") .addRule(new IsNotNull()) .addRule(new IsJavaType(String.class)) .addRule(new PatternMatch("[0-7]*"))); // should be [0-9]* validatingReader.add(new FilterExpression( "parseDouble(CreditLimit) >= 0 && parseDouble(CreditLimit) <= 100000 and parseDouble(Balance) <= parseDouble(CreditLimit)")); DataWriter writer = new StreamWriter(System.out); Job.run(validatingReader, writer); log.info("messages: " + Messages.getCurrent()); } }
Code walkthrough
- CSVReader is created corresponding to the input file
credit-balance-01.csv
. - The
CSVReader.setFieldNamesInFirstRow(true)
method is invoked to specify that the names specified in the first row should be used as field names. - A ValidatingReader is created to apply one or more filters to the incoming data sequentially.
setExceptionOnFailure(false)
andsetRecordStackTraceInMessage(false)
methods are used to specify the behavior of the validatingReader when the data does match the validation/filter rules. In the given example, the program is customized not to throw exceptions and not to include stack traces in the message.- FieldFilter object is created to set validation rules. Its constructor accepts the name(s) of the fields to be validated. In the given example, the "Rating" field is validated first.
- The validation rules are set with
addRule()
method. Checking the value for not null and the String type is implemented usingIsNotNull()
andIsJavaType(String.class)
objects respectively. - In addition to the rules specified above, rules for checking the data with the predefined set of values or regex pattern matching are also available in FieldFilter.
ValueMatch<String>("A", "B", "Z")
checks if the value is either "A", "B" or "Z".PatternMatch("[0-7]*")
checks that only numbers in the range between 0 and 7 are allowed in the string. - FilterExpression is an alternative to FieldFilter to apply filtration rules. Unlike FieldFilter, however, FilterExpression is used with an expression where logic can be defined for the data from multiple columns. In the example,
parseDouble(CreditLimit) >= 0 && parseDouble(CreditLimit) <= 100000 and parseDouble(Balance) <= parseDouble(CreditLimit)
expression validates the CreditLimit field for the lower and upper limits, and compares it with Balance. - Data is transferred from the
validatingReader
to theStreamWriter(System.out)
via Job.run() method. - Validation messages are displayed in logs.
Output
----------------------------------------------- 0 - Record { 0:[Account]:STRING=[101]:String 1:[LastName]:STRING=[Reeves]:String 2:[FirstName]:STRING=[Keanu]:String 3:[Balance]:STRING=[9315.45]:String 4:[CreditLimit]:STRING=[10000.00]:String 5:[AccountCreated]:STRING=[1/17/1998]:String 6:[Rating]:STRING=[A]:String } ----------------------------------------------- 1 - Record { 0:[Account]:STRING=[312]:String 1:[LastName]:STRING=[Butler]:String 2:[FirstName]:STRING=[Gerard]:String 3:[Balance]:STRING=[90.00]:String 4:[CreditLimit]:STRING=[1000.00]:String 5:[AccountCreated]:STRING=[8/6/2003]:String 6:[Rating]:STRING=[B]:String } ----------------------------------------------- 2 - Record { 0:[Account]:STRING=[761]:String 1:[LastName]:STRING=[Pinkett-Smith]:String 2:[FirstName]:STRING=[Jada]:String 3:[Balance]:STRING=[49654.87]:String 4:[CreditLimit]:STRING=[100000.00]:String 5:[AccountCreated]:STRING=[12/5/2006]:String 6:[Rating]:STRING=[A]:String } ----------------------------------------------- 3 records 05:35:56,619 DEBUG [main] datapipeline:661 - job::Success 05:35:56,621 INFO [main] datapipeline:57 - messages: [2023-06-09 05:35:56.618 WARNING main validation [Account field: (is not null) and (value is of class java.lang.String) and (value's pattern matches [0-7]*)] failed on record #2 Record { 0:[Account]:STRING=[868]:String 1:[LastName]:STRING=[Hewitt]:String 2:[FirstName]:STRING=[Jennifer Love]:String 3:[Balance]:STRING=[0]:String 4:[CreditLimit]:STRING=[17000.00]:String 5:[AccountCreated]:STRING=[5/25/1985]:String 6:[Rating]:STRING=[B]:String } ] [2023-06-09 05:35:56.618 WARNING main validation [Rating field: (is not null) and (value is of class java.lang.String) and (value matches [A] or [B] or [Z])] failed on record #3 Record { 0:[Account]:STRING=[317]:String 1:[LastName]:STRING=[Murray]:String 2:[FirstName]:STRING=[Bill]:String 3:[Balance]:STRING=[789.65]:String 4:[CreditLimit]:STRING=[5000.00]:String 5:[AccountCreated]:STRING=[2/5/2007]:String 6:[Rating]:STRING=[C]:String } ]