Add a Sequence Number Column for Repeat Values

This example adds a new sequence number column to a dataset, tracking changes in specific fields. It assigns increasing sequence numbers to consecutive records where the value in the tracked field remains the same while resetting the sequence number when a change occurs.

In scenarios where records represent orders or transactions, maintaining the order and tracking changes is crucial. By using this example, a sequence number column can be added to the dataset, allowing for easy tracking of changes and preserving the order of records based on the specified field.

 

Input CSV file

"Provider Number","Hospital Name","Address 1","Address 2","Address 3","City","State","ZIP Code","County","Phone Number","Hospital Type","Hospital Ownership","Emergency Services"
"010001","SOUTHEAST ALABAMA MEDICAL CENTER","1108 ROSS CLARK CIRCLE","","","DOTHAN","AL","36301","HOUSTON","3347938701","Acute Care Hospitals","Government - Hospital District or Authority","Yes"
"010005","MARSHALL MEDICAL CENTER SOUTH","2505 U S HIGHWAY 431 NORTH","","","BOAZ","AL","35957","MARSHALL","2565938310","Acute Care Hospitals","Government - Hospital District or Authority","Yes"
"010006","ELIZA COFFEE MEMORIAL HOSPITAL","205 MARENGO STREET","","","FLORENCE","AL","35631","LAUDERDALE","2567688400","Acute Care Hospitals","Government - Hospital District or Authority","Yes"
"010007","MIZELL MEMORIAL HOSPITAL","702 N MAIN ST","","","OPP","AL","36467","COVINGTON","3344933541","Acute Care Hospitals","Voluntary non-profit - Private","Yes"
...

 

Java code listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;
import java.math.BigDecimal;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.transform.SetGroupSequenceNumberField;
import com.northconcepts.datapipeline.transform.TransformingReader;

public class AddASequenceNumberColumnForRepeatedValues {

    public static void main(String[] main) {
        DataReader reader = new CSVReader(new File("example/data/input/hospital.csv"))
                .setFieldNamesInFirstRow(true);

        TransformingReader transformingReader = new TransformingReader(reader);

        transformingReader.add(new SetGroupSequenceNumberField("Ownership_SequenceNumber",
                new BigDecimal(1), new BigDecimal(1), "Hospital Ownership"));

        Job.run(transformingReader, new StreamWriter(System.out));
    }
}

 

Code walkthrough

  1. CSVReader is created corresponding to the input file hospital.csv.
  2. The CSVReader.setFieldNamesInFirstRow(true) method is invoked to specify that the names specified in the first row should be used as field names.
  3. A TransformingReader is created to sequentially apply one or more transformations to the incoming data.
  4. SetGroupSequenceNumberField("Ownership_SequenceNumber", new BigDecimal(1), new BigDecimal(1), "Hospital Ownership") accepts four arguments i.e. name("Ownership_SequenceNumber"), start(1), step(1) and watchFields("Hospital Ownership"). This method uses var args, meaning that for the last watchFields field, a variable number of String arguments can be specified.
  5. SetGroupSequenceNumberField is used to include a sequence (auto increment) field that is incremented with the step value as long as the value(s) in the specified watchFields remain the same as the previous record. The sequence is reset to the start value when watch fields differ from the previous record. For this particular example, the Ownership_SequenceNumber field initiates with a value of 1 for the first record and monitors changes in the Hospital Ownership field. If the value is the same as the previous record's Hospital Ownership field, the Ownership_SequenceNumber is incremented by 1 i.e. 1, 2, 3 ...
  6. Data is transferred from the transformingReader to the StreamWriter(System.out) via Job.run() method. See how to compile and run data pipeline jobs.

 

Console Output

-----------------------------------------------
0 - Record (MODIFIED) {
    0:[Provider Number]:STRING=[010001]:String
    1:[Hospital Name]:STRING=[SOUTHEAST ALABAMA MEDICAL CENTER]:String
    2:[Address 1]:STRING=[1108 ROSS CLARK CIRCLE]:String
    3:[Address 2]:STRING=[]:String
    4:[Address 3]:STRING=[]:String
    5:[City]:STRING=[DOTHAN]:String
    6:[State]:STRING=[AL]:String
    7:[ZIP Code]:STRING=[36301]:String
    8:[County]:STRING=[HOUSTON]:String
    9:[Phone Number]:STRING=[3347938701]:String
    10:[Hospital Type]:STRING=[Acute Care Hospitals]:String
    11:[Hospital Ownership]:STRING=[Government - Hospital District or Authority]:String
    12:[Emergency Services]:STRING=[Yes]:String
    13:[Ownership_SequenceNumber]:BIG_DECIMAL=[1]:BigDecimal
}

-----------------------------------------------
1 - Record (MODIFIED) {
    0:[Provider Number]:STRING=[010005]:String
    1:[Hospital Name]:STRING=[MARSHALL MEDICAL CENTER SOUTH]:String
    2:[Address 1]:STRING=[2505 U S HIGHWAY 431 NORTH]:String
    3:[Address 2]:STRING=[]:String
    4:[Address 3]:STRING=[]:String
    5:[City]:STRING=[BOAZ]:String
    6:[State]:STRING=[AL]:String
    7:[ZIP Code]:STRING=[35957]:String
    8:[County]:STRING=[MARSHALL]:String
    9:[Phone Number]:STRING=[2565938310]:String
    10:[Hospital Type]:STRING=[Acute Care Hospitals]:String
    11:[Hospital Ownership]:STRING=[Government - Hospital District or Authority]:String
    12:[Emergency Services]:STRING=[Yes]:String
    13:[Ownership_SequenceNumber]:BIG_DECIMAL=[2]:BigDecimal
}

-----------------------------------------------
2 - Record (MODIFIED) {
    0:[Provider Number]:STRING=[010006]:String
    1:[Hospital Name]:STRING=[ELIZA COFFEE MEMORIAL HOSPITAL]:String
    2:[Address 1]:STRING=[205 MARENGO STREET]:String
    3:[Address 2]:STRING=[]:String
    4:[Address 3]:STRING=[]:String
    5:[City]:STRING=[FLORENCE]:String
    6:[State]:STRING=[AL]:String
    7:[ZIP Code]:STRING=[35631]:String
    8:[County]:STRING=[LAUDERDALE]:String
    9:[Phone Number]:STRING=[2567688400]:String
    10:[Hospital Type]:STRING=[Acute Care Hospitals]:String
    11:[Hospital Ownership]:STRING=[Government - Hospital District or Authority]:String
    12:[Emergency Services]:STRING=[Yes]:String
    13:[Ownership_SequenceNumber]:BIG_DECIMAL=[3]:BigDecimal
}

-----------------------------------------------
3 - Record (MODIFIED) {
    0:[Provider Number]:STRING=[010007]:String
    1:[Hospital Name]:STRING=[MIZELL MEMORIAL HOSPITAL]:String
    2:[Address 1]:STRING=[702 N MAIN ST]:String
    3:[Address 2]:STRING=[]:String
    4:[Address 3]:STRING=[]:String
    5:[City]:STRING=[OPP]:String
    6:[State]:STRING=[AL]:String
    7:[ZIP Code]:STRING=[36467]:String
    8:[County]:STRING=[COVINGTON]:String
    9:[Phone Number]:STRING=[3344933541]:String
    10:[Hospital Type]:STRING=[Acute Care Hospitals]:String
    11:[Hospital Ownership]:STRING=[Voluntary non-profit - Private]:String
    12:[Emergency Services]:STRING=[Yes]:String
    13:[Ownership_SequenceNumber]:BIG_DECIMAL=[1]:BigDecimal
}

Mobile Analytics