Capture Data that Failed Data Mapping Validation

This example is designed to capture records that fail custom validation rules during data processing. It provides users with the capability to define their own validation rules and easily identify records that do not meet the specified criteria, enabling effective data quality control and error handling.

Users can utilize the example to identify and flag records that contain anomalous or suspicious data. By defining custom validation rules that check for abnormal patterns, outliers, or inconsistencies, users can capture records that deviate from the expected norms, helping to detect data anomalies or potential data quality issues.

Input CSV File

Handle,Title,Body (HTML),Vendor,Type,Tags,Published,Option1 Name,Option1 Value,Option2 Name,Option2 Value,Option3 Name,Option3 Value,Variant SKU,Variant Grams,Variant Inventory Tracker,Variant Inventory Qty,Variant Inventory Policy,Variant Fulfillment Service,Variant Price,Variant Compare At Price,Variant Requires Shipping,Variant Taxable,Variant Barcode,Image Src,Image Position,Image Alt Text,Gift Card,SEO Title,SEO Description,Google Shopping / Google Product Category,Google Shopping / Gender,Google Shopping / Age Group,Google Shopping / MPN,Google Shopping / AdWords Grouping,Google Shopping / AdWords Labels,Google Shopping / Condition,Google Shopping / Custom Product,Google Shopping / Custom Label 0,Google Shopping / Custom Label 1,Google Shopping / Custom Label 2,Google Shopping / Custom Label 3,Google Shopping / Custom Label 4,Variant Image,Variant Weight Unit,Variant Tax Code
chain-bracelet,7 Shakra Bracelet,"7 chakra bracelet, in blue or black.",Company 123,Bracelet,Beads,true,Color,Blue,,,,,,0,,1,deny,manual,42.99,44.99,true,true,,,1,,false,,,,,,,,,,,,,,,,,kg,
leather-anchor,Anchor Bracelet Mens,Black leather bracelet with gold or silver anchor for men.,Company 123,Bracelet,"Anchor, Gold, Leather, Silver",true,Color,Gold,,,,,,0,,1,deny,manual,69.99,85,true,true,,,1,,false,,,,,,,,,,,,,,,,,kg,

Java Code Listing

package com.northconcepts.datapipeline.foundations.examples.datamapping;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.FieldType;
import com.northconcepts.datapipeline.core.NullWriter;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.foundations.datamapping.DataMapping;
import com.northconcepts.datapipeline.foundations.datamapping.DataMappingReader;
import com.northconcepts.datapipeline.foundations.datamapping.FieldMapping;
import com.northconcepts.datapipeline.foundations.schema.EntityDef;
import com.northconcepts.datapipeline.foundations.schema.NumericFieldDef;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.memory.MemoryReader;
import com.northconcepts.datapipeline.memory.MemoryWriter;

import java.math.BigDecimal;

public class CaptureDataThatFailedDataMappingValidation {

    public static void main(String... args) {
        EntityDef targetEntity = new EntityDef()
            .addField(new NumericFieldDef("Price", FieldType.DOUBLE).setRequired(true).setMinimum(50));

        DataMapping mapping = new DataMapping()
            .setValue("Markup", BigDecimal.valueOf(10.00))  // add a constant to be used in the mapping
            .addFieldMapping(new FieldMapping("Title", "coalesce(source.Title, source.Handle)"))
            .addFieldMapping(new FieldMapping("Cost", "${source.Variant Price}"))  // use ${} since field has space in name
            .addFieldMapping(new FieldMapping("Price", "toBigDecimal(target.Cost) + Markup"));

        DataReader reader = new CSVReader(new File("data/input/jewelry.csv"))

        MemoryWriter discardWriter = new MemoryWriter();

        reader = new DataMappingReader(reader, mapping, discardWriter, "ErrorMessage");, new NullWriter()); //run mapping MemoryReader(discardWriter.getRecordList()), StreamWriter.newSystemOutWriter()); //print discarded records

Code Walkthrough

  1. A new target entity is created with a required field called Price. Checking this field for a minimum value of 50 is also applied.
  2. DataMapping is created to enable the field mapping rules.
  3. A new variable Markup is set to the value of 10.00. 
  4. addFieldMapping() method accepts FieldMapping object that has two String arguments:
    1. targetFieldName - the name of the field in the target record.
    2. sourceExpression - a custom expression to hold the formatting logic for a target field.
  5. Fields in the source and target records can be accessed with the dot (.) operator: source.Titletarget.Cost.
  6. Different methods for formatting strings can be used in sourceExpression. In the given example, coalesce() and toBigDecimal() are called.
  7. coalesce() function evaluates the arguments in order and returns the first non-null value from the defined argument list. Usage in the example means that if the value of source.Title field is null, then Handle is returned.
  8. CSVReader instance is created corresponding to the input file jewelry.csv
  9. MemoryWriter object is created to hold the discarded records.
  10. DataMappingReader applies mapping to the reader. The records that do not match the target entity are written in discardWriter with the "ErrorMessage" field included.
  11. Data is transferred from the reader to the NullWriter via method.
  12. Discarded records are transferred to StreamWriter.newSystemOutWriter() via method.

Console output

0 - Record (MODIFIED) {
    2:[Body (HTML)]:STRING=[null]
    7:[Option1 Name]:STRING=[null]
    8:[Option1 Value]:STRING=[null]
    9:[Option2 Name]:STRING=[null]
    10:[Option2 Value]:STRING=[null]
    11:[Option3 Name]:STRING=[null]
    12:[Option3 Value]:STRING=[null]
    13:[Variant SKU]:STRING=[null]
    14:[Variant Grams]:STRING=[null]
    15:[Variant Inventory Tracker]:STRING=[null]
    16:[Variant Inventory Qty]:STRING=[null]
    17:[Variant Inventory Policy]:STRING=[null]
    18:[Variant Fulfillment Service]:STRING=[null]
    19:[Variant Price]:STRING=[null]
    20:[Variant Compare At Price]:STRING=[null]
    21:[Variant Requires Shipping]:STRING=[null]
    22:[Variant Taxable]:STRING=[null]
    23:[Variant Barcode]:STRING=[null]
    24:[Image Src]:STRING=[]:String
    25:[Image Position]:STRING=[3]:String
    26:[Image Alt Text]:STRING=[null]
    27:[Gift Card]:STRING=[null]
    28:[SEO Title]:STRING=[null]
    29:[SEO Description]:STRING=[null]
    30:[Google Shopping / Google Product Category]:STRING=[null]
    31:[Google Shopping / Gender]:STRING=[null]
    32:[Google Shopping / Age Group]:STRING=[null]
    33:[Google Shopping / MPN]:STRING=[null]
    34:[Google Shopping / AdWords Grouping]:STRING=[null]
    35:[Google Shopping / AdWords Labels]:STRING=[null]
    36:[Google Shopping / Condition]:STRING=[null]
    37:[Google Shopping / Custom Product]:STRING=[null]
    38:[Google Shopping / Custom Label 0]:STRING=[null]
    39:[Google Shopping / Custom Label 1]:STRING=[null]
    40:[Google Shopping / Custom Label 2]:STRING=[null]
    41:[Google Shopping / Custom Label 3]:STRING=[null]
    42:[Google Shopping / Custom Label 4]:STRING=[null]
    43:[Variant Image]:STRING=[null]
    44:[Variant Weight Unit]:STRING=[null]
    45:[Variant Tax Code]:STRING=[null]
    46:[ErrorMessage]:STRING=[Price is too small; expected minimum 50, found 10.0; failed on record 4]:String


Mobile Analytics