Add a Sequence Number Column when Values Change

In this example, you will learn how you can use DataPipeline to transform data on the fly by adding a new Sequence Number Column that tracks the changes in a specified field. It provides users with the capability to assign a unique sequence number to each record based on the changes in the specified field, facilitating data tracking and analysis.

The example can be used to create versions of the data based on changes in the specified field. By assigning sequence numbers to track changes, users can analyze the historical progression of the data and perform a comparative analysis between different versions, facilitating historical trend analysis and understanding data evolution.

Input CSV file

Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating
101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A
312,Butler,Gerard,90.00,1000.00,8/6/2003,B
868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B
761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A
317,Murray,Bill,789.65,5000.00,2/5/2007,C

Java code listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.transform.SetBatchSequenceNumberField;
import com.northconcepts.datapipeline.transform.TransformingReader;

public class AddASequenceNumberColumnWhenValuesChange {

    public static void main(String[] args) {
        DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv"))
                .setFieldNamesInFirstRow(true);

        TransformingReader transformingReader = new TransformingReader(reader);

        transformingReader.add(new SetBatchSequenceNumberField("BatchNumber", 10, 5, "Rating"));

        Job.run(transformingReader, new StreamWriter(System.out));
    }
}

Code walkthrough

  1. CSVReader is created corresponding to the input file credit-balance-01.csv.
  2. The CSVReader.setFieldNamesInFirstRow(true) method is invoked to specify that the names specified in the first row should be used as field names.
  3. A TransformingReader is created to sequentially apply one or more transformations to the incoming data.
  4. transformingReader.add(new SetBatchSequenceNumberField("BatchNumber", 10, 5, "Rating")) is used to add a new sequence number column to the data from the reader.
  5. SetBatchSequenceNumberField("BatchNumber", 10, 5, "Rating") accepts four arguments i.e. name("BatchNumber"), start(10), step(5) and watchFields("Rating"). This method uses var args, meaning that for the last watchFields field, a variable number of String arguments can be specified.
  6. SetBatchSequenceNumberField is used to include a sequential field (auto increment) with the specified increment (step) when the values in the specified watchFields differ from the previous record. For this particular example, the BatchNumber field initiates with a value of 10 for the first record and monitors changes in the Rating field. If the value is different from the previous record's Rating field, the BatchNumber is incremented by 5 i.e. 10, 15, 20 ...
  7. Data is transferred from the transformingReader to the StreamWriter(System.out) via Job.run() method.

Console Output

-----------------------------------------------
0 - Record (MODIFIED) {
    0:[Account]:STRING=[101]:String
    1:[LastName]:STRING=[Reeves]:String
    2:[FirstName]:STRING=[Keanu]:String
    3:[Balance]:STRING=[9315.45]:String
    4:[CreditLimit]:STRING=[10000.00]:String
    5:[AccountCreated]:STRING=[1/17/1998]:String
    6:[Rating]:STRING=[A]:String
    7:[BatchNumber]:LONG=[10]:Long
}

-----------------------------------------------
1 - Record (MODIFIED) {
    0:[Account]:STRING=[312]:String
    1:[LastName]:STRING=[Butler]:String
    2:[FirstName]:STRING=[Gerard]:String
    3:[Balance]:STRING=[90.00]:String
    4:[CreditLimit]:STRING=[1000.00]:String
    5:[AccountCreated]:STRING=[8/6/2003]:String
    6:[Rating]:STRING=[B]:String
    7:[BatchNumber]:LONG=[15]:Long
}

-----------------------------------------------
2 - Record (MODIFIED) {
    0:[Account]:STRING=[868]:String
    1:[LastName]:STRING=[Hewitt]:String
    2:[FirstName]:STRING=[Jennifer Love]:String
    3:[Balance]:STRING=[0]:String
    4:[CreditLimit]:STRING=[17000.00]:String
    5:[AccountCreated]:STRING=[5/25/1985]:String
    6:[Rating]:STRING=[B]:String
    7:[BatchNumber]:LONG=[15]:Long
}

-----------------------------------------------
3 - Record (MODIFIED) {
    0:[Account]:STRING=[761]:String
    1:[LastName]:STRING=[Pinkett-Smith]:String
    2:[FirstName]:STRING=[Jada]:String
    3:[Balance]:STRING=[49654.87]:String
    4:[CreditLimit]:STRING=[100000.00]:String
    5:[AccountCreated]:STRING=[12/5/2006]:String
    6:[Rating]:STRING=[A]:String
    7:[BatchNumber]:LONG=[20]:Long
}

-----------------------------------------------
4 - Record (MODIFIED) {
    0:[Account]:STRING=[317]:String
    1:[LastName]:STRING=[Murray]:String
    2:[FirstName]:STRING=[Bill]:String
    3:[Balance]:STRING=[789.65]:String
    4:[CreditLimit]:STRING=[5000.00]:String
    5:[AccountCreated]:STRING=[2/5/2007]:String
    6:[Rating]:STRING=[C]:String
    7:[BatchNumber]:LONG=[25]:Long
}

-----------------------------------------------
5 records
Mobile Analytics