Add a Sequence Number Column for Repeat Values
Updated: Jul 7, 2023
This example adds a new sequence number column to a dataset, tracking changes in specific fields. It assigns increasing sequence numbers to consecutive records where the value in the tracked field remains the same while resetting the sequence number when a change occurs.
In scenarios where records represent orders or transactions, maintaining the order and tracking changes is crucial. By using this example, a sequence number column can be added to the dataset, allowing for easy tracking of changes and preserving the order of records based on the specified field.
Input CSV file
"Provider Number","Hospital Name","Address 1","Address 2","Address 3","City","State","ZIP Code","County","Phone Number","Hospital Type","Hospital Ownership","Emergency Services" "010001","SOUTHEAST ALABAMA MEDICAL CENTER","1108 ROSS CLARK CIRCLE","","","DOTHAN","AL","36301","HOUSTON","3347938701","Acute Care Hospitals","Government - Hospital District or Authority","Yes" "010005","MARSHALL MEDICAL CENTER SOUTH","2505 U S HIGHWAY 431 NORTH","","","BOAZ","AL","35957","MARSHALL","2565938310","Acute Care Hospitals","Government - Hospital District or Authority","Yes" "010006","ELIZA COFFEE MEMORIAL HOSPITAL","205 MARENGO STREET","","","FLORENCE","AL","35631","LAUDERDALE","2567688400","Acute Care Hospitals","Government - Hospital District or Authority","Yes" "010007","MIZELL MEMORIAL HOSPITAL","702 N MAIN ST","","","OPP","AL","36467","COVINGTON","3344933541","Acute Care Hospitals","Voluntary non-profit - Private","Yes"
...
Java code listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import java.math.BigDecimal; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.transform.SetGroupSequenceNumberField; import com.northconcepts.datapipeline.transform.TransformingReader; public class AddASequenceNumberColumnForRepeatedValues { public static void main(String[] main) { DataReader reader = new CSVReader(new File("example/data/input/hospital.csv")) .setFieldNamesInFirstRow(true); TransformingReader transformingReader = new TransformingReader(reader); transformingReader.add(new SetGroupSequenceNumberField("Ownership_SequenceNumber", new BigDecimal(1), new BigDecimal(1), "Hospital Ownership")); Job.run(transformingReader, new StreamWriter(System.out)); } }
Code walkthrough
- CSVReader is created corresponding to the input file
hospital.csv
. - The
CSVReader.setFieldNamesInFirstRow(true)
method is invoked to specify that the names specified in the first row should be used as field names. - A TransformingReader is created to sequentially apply one or more transformations to the incoming data.
SetGroupSequenceNumberField("Ownership_SequenceNumber", new BigDecimal(1), new BigDecimal(1), "Hospital Ownership")
accepts four arguments i.e. name("Ownership_SequenceNumber"
), start(1
), step(1
) and watchFields("Hospital Ownership"
). This method uses var args, meaning that for the last watchFields field, a variable number of String arguments can be specified.SetGroupSequenceNumberField
is used to include a sequence (auto increment) field that is incremented with the step value as long as the value(s) in the specifiedwatchFields
remain the same as the previous record. The sequence is reset to the start value when watch fields differ from the previous record. For this particular example, theOwnership_SequenceNumber
field initiates with a value of1
for the first record and monitors changes in theHospital Ownership
field. If the value is the same as the previous record'sHospital Ownership
field, theOwnership_SequenceNumber
is incremented by 1 i.e. 1, 2, 3 ...- Data is transferred from the
transformingReader
to theStreamWriter(System.out)
via Job.run() method. See how to compile and run data pipeline jobs.
Console Output
----------------------------------------------- 0 - Record (MODIFIED) { 0:[Provider Number]:STRING=[010001]:String 1:[Hospital Name]:STRING=[SOUTHEAST ALABAMA MEDICAL CENTER]:String 2:[Address 1]:STRING=[1108 ROSS CLARK CIRCLE]:String 3:[Address 2]:STRING=[]:String 4:[Address 3]:STRING=[]:String 5:[City]:STRING=[DOTHAN]:String 6:[State]:STRING=[AL]:String 7:[ZIP Code]:STRING=[36301]:String 8:[County]:STRING=[HOUSTON]:String 9:[Phone Number]:STRING=[3347938701]:String 10:[Hospital Type]:STRING=[Acute Care Hospitals]:String 11:[Hospital Ownership]:STRING=[Government - Hospital District or Authority]:String 12:[Emergency Services]:STRING=[Yes]:String 13:[Ownership_SequenceNumber]:BIG_DECIMAL=[1]:BigDecimal } ----------------------------------------------- 1 - Record (MODIFIED) { 0:[Provider Number]:STRING=[010005]:String 1:[Hospital Name]:STRING=[MARSHALL MEDICAL CENTER SOUTH]:String 2:[Address 1]:STRING=[2505 U S HIGHWAY 431 NORTH]:String 3:[Address 2]:STRING=[]:String 4:[Address 3]:STRING=[]:String 5:[City]:STRING=[BOAZ]:String 6:[State]:STRING=[AL]:String 7:[ZIP Code]:STRING=[35957]:String 8:[County]:STRING=[MARSHALL]:String 9:[Phone Number]:STRING=[2565938310]:String 10:[Hospital Type]:STRING=[Acute Care Hospitals]:String 11:[Hospital Ownership]:STRING=[Government - Hospital District or Authority]:String 12:[Emergency Services]:STRING=[Yes]:String 13:[Ownership_SequenceNumber]:BIG_DECIMAL=[2]:BigDecimal } ----------------------------------------------- 2 - Record (MODIFIED) { 0:[Provider Number]:STRING=[010006]:String 1:[Hospital Name]:STRING=[ELIZA COFFEE MEMORIAL HOSPITAL]:String 2:[Address 1]:STRING=[205 MARENGO STREET]:String 3:[Address 2]:STRING=[]:String 4:[Address 3]:STRING=[]:String 5:[City]:STRING=[FLORENCE]:String 6:[State]:STRING=[AL]:String 7:[ZIP Code]:STRING=[35631]:String 8:[County]:STRING=[LAUDERDALE]:String 9:[Phone Number]:STRING=[2567688400]:String 10:[Hospital Type]:STRING=[Acute Care Hospitals]:String 11:[Hospital Ownership]:STRING=[Government - Hospital District or Authority]:String 12:[Emergency Services]:STRING=[Yes]:String 13:[Ownership_SequenceNumber]:BIG_DECIMAL=[3]:BigDecimal } ----------------------------------------------- 3 - Record (MODIFIED) { 0:[Provider Number]:STRING=[010007]:String 1:[Hospital Name]:STRING=[MIZELL MEMORIAL HOSPITAL]:String 2:[Address 1]:STRING=[702 N MAIN ST]:String 3:[Address 2]:STRING=[]:String 4:[Address 3]:STRING=[]:String 5:[City]:STRING=[OPP]:String 6:[State]:STRING=[AL]:String 7:[ZIP Code]:STRING=[36467]:String 8:[County]:STRING=[COVINGTON]:String 9:[Phone Number]:STRING=[3344933541]:String 10:[Hospital Type]:STRING=[Acute Care Hospitals]:String 11:[Hospital Ownership]:STRING=[Voluntary non-profit - Private]:String 12:[Emergency Services]:STRING=[Yes]:String 13:[Ownership_SequenceNumber]:BIG_DECIMAL=[1]:BigDecimal }