Compare Records Using Diff

This example demonstrates how you can use DataPipeline to compare records. This feature lets you determine which fields have been added, modified, and removed.  Since records can be nested or contain arrays, the entire data trees will be compared.   This can be useful for auditing and managing individual record transformations.

 

Java Code


package com.northconcepts.datapipeline.foundations.examples.difference; import com.amazonaws.thirdparty.joda.time.LocalDate; import com.northconcepts.datapipeline.core.Record; import com.northconcepts.datapipeline.foundations.difference.RecordDiff; import java.math.BigDecimal; public class CompareRecords { public static void main(String[] args) { Record oldRecord = new Record() .setField("name", "John Doe") .setField("dob", LocalDate.parse("2000-01-01")) .setField("languages", new String[] {"English", "French"}) .setField("height", 1.70) .setField("netIncome", new BigDecimal(100_286.99)); Record newRecord = new Record() .setField("name", "John Doe") .setField("age", 24) .setField("languages", new String[] {"English", "French", "Spanish"}) .setField("height", 1.73) .setField("netIncome", new BigDecimal(120_286.99)); RecordDiff diff = RecordDiff.diff("userRecord", oldRecord, newRecord, "height", "netIncome"); // Diff will report following changes: // name - NONE // dob - REMOVED // age - ADDED // English - NONE // French - NONE // Spanish - ADDED // height and newIncome will not be reported as they are excluded from comparison. System.out.println("Record Diff: " + diff); } }

 

Code Walkthrough

  1. An oldRecord is created and initialized with the following fields: name, dob, languages, height, and netIncome.
  2. A newRecord is created and initialized with the following fields: name, age, languages, height, and netIncome.
  3. RecordDiff instance is created by calling diff() method with the following arguments: name of the diff, the old record, the new record, and a list of field names to exclude from the comparison.
  4. The difference between the oldRecord and the newRecord are reported as:
    1. CHANGED - if the property is just updated.
    2. ADDED - if a new property is added (e.g., age field only exists in the newRecord).
    3. REMOVED - if an existing property is removed (e.g., dob exists in the oldRecord but not in the newRecord).
    4. NONE - if there is no change for the specific property (e.g., name field remains similar on both records).
  5. Any changes to fields listed for exclusion from the comparison (i.e., height and netIncome) are not reported.

Console Output

Record Diff: {
  "children" : [ {
    "name" : "name",
    "type" : "NONE"
  }, {
    "name" : "dob",
    "type" : "REMOVED"
  }, {
    "name" : "English",
    "type" : "NONE"
  }, {
    "name" : "French",
    "type" : "NONE"
  }, {
    "name" : "Spanish",
    "type" : "ADDED"
  }, {
    "name" : "age",
    "type" : "ADDED"
  } ],
  "name" : "userRecord",
  "type" : "CHANGED"
}
Mobile Analytics