Add a Sequence Number Column when Values Change
Updated: Jun 20, 2023
In this example, you will learn how you can use DataPipeline to transform data on the fly by adding a new Sequence Number Column that tracks the changes in a specified field. It provides users with the capability to assign a unique sequence number to each record based on the changes in the specified field, facilitating data tracking and analysis.
The example can be used to create versions of the data based on changes in the specified field. By assigning sequence numbers to track changes, users can analyze the historical progression of the data and perform a comparative analysis between different versions, facilitating historical trend analysis and understanding data evolution.
Input CSV file
Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating 101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A 312,Butler,Gerard,90.00,1000.00,8/6/2003,B 868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B 761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A 317,Murray,Bill,789.65,5000.00,2/5/2007,C
Java code listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.transform.SetBatchSequenceNumberField; import com.northconcepts.datapipeline.transform.TransformingReader; public class AddASequenceNumberColumnWhenValuesChange { public static void main(String[] args) { DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv")) .setFieldNamesInFirstRow(true); TransformingReader transformingReader = new TransformingReader(reader); transformingReader.add(new SetBatchSequenceNumberField("BatchNumber", 10, 5, "Rating")); Job.run(transformingReader, new StreamWriter(System.out)); } }
Code walkthrough
- CSVReader is created corresponding to the input file
credit-balance-01.csv
. - The
CSVReader.setFieldNamesInFirstRow(true)
method is invoked to specify that the names specified in the first row should be used as field names. - A TransformingReader is created to sequentially apply one or more transformations to the incoming data.
transformingReader.add(new SetBatchSequenceNumberField("BatchNumber", 10, 5, "Rating"))
is used to add a new sequence number column to the data from thereader
.SetBatchSequenceNumberField("BatchNumber", 10, 5, "Rating")
accepts four arguments i.e. name("BatchNumber"
), start(10
), step(5
) and watchFields("Rating"
). This method uses var args, meaning that for the last watchFields field, a variable number of String arguments can be specified.SetBatchSequenceNumberField
is used to include a sequential field (auto increment) with the specified increment (step) when the values in the specifiedwatchFields
differ from the previous record. For this particular example, theBatchNumber
field initiates with a value of10
for the first record and monitors changes in theRating
field. If the value is different from the previous record'sRating
field, theBatchNumber
is incremented by5
i.e. 10, 15, 20 ...- Data is transferred from the
transformingReader
to theStreamWriter(System.out)
via Job.run() method.
Console Output
----------------------------------------------- 0 - Record (MODIFIED) { 0:[Account]:STRING=[101]:String 1:[LastName]:STRING=[Reeves]:String 2:[FirstName]:STRING=[Keanu]:String 3:[Balance]:STRING=[9315.45]:String 4:[CreditLimit]:STRING=[10000.00]:String 5:[AccountCreated]:STRING=[1/17/1998]:String 6:[Rating]:STRING=[A]:String 7:[BatchNumber]:LONG=[10]:Long } ----------------------------------------------- 1 - Record (MODIFIED) { 0:[Account]:STRING=[312]:String 1:[LastName]:STRING=[Butler]:String 2:[FirstName]:STRING=[Gerard]:String 3:[Balance]:STRING=[90.00]:String 4:[CreditLimit]:STRING=[1000.00]:String 5:[AccountCreated]:STRING=[8/6/2003]:String 6:[Rating]:STRING=[B]:String 7:[BatchNumber]:LONG=[15]:Long } ----------------------------------------------- 2 - Record (MODIFIED) { 0:[Account]:STRING=[868]:String 1:[LastName]:STRING=[Hewitt]:String 2:[FirstName]:STRING=[Jennifer Love]:String 3:[Balance]:STRING=[0]:String 4:[CreditLimit]:STRING=[17000.00]:String 5:[AccountCreated]:STRING=[5/25/1985]:String 6:[Rating]:STRING=[B]:String 7:[BatchNumber]:LONG=[15]:Long } ----------------------------------------------- 3 - Record (MODIFIED) { 0:[Account]:STRING=[761]:String 1:[LastName]:STRING=[Pinkett-Smith]:String 2:[FirstName]:STRING=[Jada]:String 3:[Balance]:STRING=[49654.87]:String 4:[CreditLimit]:STRING=[100000.00]:String 5:[AccountCreated]:STRING=[12/5/2006]:String 6:[Rating]:STRING=[A]:String 7:[BatchNumber]:LONG=[20]:Long } ----------------------------------------------- 4 - Record (MODIFIED) { 0:[Account]:STRING=[317]:String 1:[LastName]:STRING=[Murray]:String 2:[FirstName]:STRING=[Bill]:String 3:[Balance]:STRING=[789.65]:String 4:[CreditLimit]:STRING=[5000.00]:String 5:[AccountCreated]:STRING=[2/5/2007]:String 6:[Rating]:STRING=[C]:String 7:[BatchNumber]:LONG=[25]:Long } ----------------------------------------------- 5 records