Use Data Lineage with FixedWidthReader

In this example we are going to see how we can use Data lineage with FixedWidthReader

Data lineage which is a metadata added to records and fields indicating where they were loaded from. It can be useful for audits and reconciliation as well as troubleshooting.

Data lineage can also be used with other readers, for example Data Lineage with Excel reader and Data Lineage with CSV reader.

Input

Account LastName        FirstName       Balance     CreditLimit   AccountCreated  Rating 
101     Reeves          Keanu           9315.45     10000.00      1/17/1998       A      
312     Butler          Gerard          90.00       1000.00       8/6/2003        B      
868     Hewitt          Jennifer Love   0           17000.00      5/25/1985       B      
761     Pinkett-Smith   Jada            49654.87    100000.00     12/5/2006       A      
317     Murray          Bill            789.65      5000.00       2/5/2007        C   

Java Code listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.Field;
import com.northconcepts.datapipeline.core.Record;
import com.northconcepts.datapipeline.fixedwidth.FixedWidthReader;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.lineage.FieldLineage;
import com.northconcepts.datapipeline.lineage.RecordLineage;

public class UseDataLineageWithFixedWidthReader {

    public static void main(String[] args) {
        DataReader reader = new FixedWidthReader(new File("example/data/input/credit-balance-01.fw"))
                                .addFields(8)
                                .addFields(16)
                                .addFields(16)
                                .addFields(12)
                                .skipField(14)  // ignore CreditLimit field
                                .skipField(16)  // ignore AccountCreated field
                                .skipField(7)   // ignore Rating field
                                .setFieldNamesInFirstRow(true)
                                .setSaveLineage(true);
        
        Job.run(reader, new LineageWriter());
    }
    
    public final static class LineageWriter extends DataWriter {

        @Override
        protected void writeImpl(Record record) throws Throwable {
            System.out.println(record);
            
            RecordLineage recordLineage = new RecordLineage().setRecord(record);
            
            System.out.println("Record Lineage");
            System.out.println("    File: " + recordLineage.getFile());
            System.out.println("    File Line: " + recordLineage.getFileLineNumber());
            System.out.println("    File Column: " + recordLineage.getFileColumnNumber());
            System.out.println("    Record: " + recordLineage.getRecordNumber());
            
            System.out.println();
            
            FieldLineage fieldLineage = new FieldLineage();
            
            System.out.println("Field Lineage");
            for (int i=0; i < record.getFieldCount(); i++) {
                Field field = record.getField(i);
                fieldLineage.setField(field);
                System.out.println("    " + field.getName());
                System.out.println("        File: " + fieldLineage.getFile());
                System.out.println("        File Line: " + fieldLineage.getFileLineNumber());
                System.out.println("        File Column: " + fieldLineage.getFileColumnNumber());
                System.out.println("        Record: " + fieldLineage.getRecordNumber());
                System.out.println("        Field Index: " + fieldLineage.getOriginalFieldIndex());
                System.out.println("        Field Name: " + fieldLineage.getOriginalFieldName());
            }
            System.out.println("---------------------------------------------------------");
            System.out.println();
        }
        
    }

}

Code walkthrough

  1. FixedWidthReader is used to obtain records from a fixed width stream.
  2. .addFields() method is used to to specify which field to include and the value that it accepts specifies the maximum number of characters that should be in that specific field. If you specify a small number and any word in that field exceeds that value it will be truncated.
  3. setSaveLineage(true) enable lineage support since it is turned off by default.
  4. Job.run() method transfers data from the reader to the LineageWriter().

RecordLineage

  1. RecordLineage informs us of the starting location where the record was loaded.
  2. recordLineage.getFile() - The java.io.File, if one was used to create the DataReader.
  3. recordLineage.getFileLineNumber() -The line number in the input file starting with 0.
  4. recordLineage.getFileColumnNumber() -The column number in the input file starting with 0.
  5. recordLineage.getRecordNumber() -The sequential record number starting with 0.

FieldLineage

  1. FieldLineage informs us of the starting location for each individual field
  2. fieldLineage.getOriginalFieldIndex() -The index of a field set by the DataReader before any transformation or operation was performed.
  3. fieldLineage.getOriginalFieldName() -The name of a field set by the DataReader before any transformation or operation was performed.

Output

Record {
    0:[Account]:STRING=[101]:String
    1:[LastName]:STRING=[Reeves]:String
    2:[FirstName]:STRING=[Keanu]:String
    3:[Balance]:STRING=[9315.45]:String
}

Record Lineage
    File: example\data\input\credit-balance-01.fw
    File Line: 1
    File Column: 0
    Record: 0

Field Lineage
    Account
        File: example\data\input\credit-balance-01.fw
        File Line: 1
        File Column: 0
        Record: 0
        Field Index: 0
        Field Name: Account
    LastName
        File: example\data\input\credit-balance-01.fw
        File Line: 1
        File Column: 8
        Record: 0
        Field Index: 1
        Field Name: LastName
    FirstName
        File: example\data\input\credit-balance-01.fw
        File Line: 1
        File Column: 24
        Record: 0
        Field Index: 2
        Field Name: FirstName
    Balance
        File: example\data\input\credit-balance-01.fw
        File Line: 1
        File Column: 40
        Record: 0
        Field Index: 3
        Field Name: Balance
---------------------------------------------------------

Record {
    0:[Account]:STRING=[312]:String
    1:[LastName]:STRING=[Butler]:String
    2:[FirstName]:STRING=[Gerard]:String
    3:[Balance]:STRING=[90.00]:String
}

Record Lineage
    File: example\data\input\credit-balance-01.fw
    File Line: 2
    File Column: 0
    Record: 1

Field Lineage
    Account
        File: example\data\input\credit-balance-01.fw
        File Line: 2
        File Column: 0
        Record: 1
        Field Index: 0
        Field Name: Account
    LastName
        File: example\data\input\credit-balance-01.fw
        File Line: 2
        File Column: 8
        Record: 1
        Field Index: 1
        Field Name: LastName
    FirstName
        File: example\data\input\credit-balance-01.fw
        File Line: 2
        File Column: 24
        Record: 1
        Field Index: 2
        Field Name: FirstName
    Balance
        File: example\data\input\credit-balance-01.fw
        File Line: 2
        File Column: 40
        Record: 1
        Field Index: 3
        Field Name: Balance
---------------------------------------------------------

Record {
    0:[Account]:STRING=[868]:String
    1:[LastName]:STRING=[Hewitt]:String
    2:[FirstName]:STRING=[Jennifer Love]:String
    3:[Balance]:STRING=[0]:String
}

Record Lineage
    File: example\data\input\credit-balance-01.fw
    File Line: 3
    File Column: 0
    Record: 2

Field Lineage
    Account
        File: example\data\input\credit-balance-01.fw
        File Line: 3
        File Column: 0
        Record: 2
        Field Index: 0
        Field Name: Account
    LastName
        File: example\data\input\credit-balance-01.fw
        File Line: 3
        File Column: 8
        Record: 2
        Field Index: 1
        Field Name: LastName
    FirstName
        File: example\data\input\credit-balance-01.fw
        File Line: 3
        File Column: 24
        Record: 2
        Field Index: 2
        Field Name: FirstName
    Balance
        File: example\data\input\credit-balance-01.fw
        File Line: 3
        File Column: 40
        Record: 2
        Field Index: 3
        Field Name: Balance
---------------------------------------------------------

Record {
    0:[Account]:STRING=[761]:String
    1:[LastName]:STRING=[Pinkett-Smith]:String
    2:[FirstName]:STRING=[Jada]:String
    3:[Balance]:STRING=[49654.87]:String
}

Record Lineage
    File: example\data\input\credit-balance-01.fw
    File Line: 4
    File Column: 0
    Record: 3

Field Lineage
    Account
        File: example\data\input\credit-balance-01.fw
        File Line: 4
        File Column: 0
        Record: 3
        Field Index: 0
        Field Name: Account
    LastName
        File: example\data\input\credit-balance-01.fw
        File Line: 4
        File Column: 8
        Record: 3
        Field Index: 1
        Field Name: LastName
    FirstName
        File: example\data\input\credit-balance-01.fw
        File Line: 4
        File Column: 24
        Record: 3
        Field Index: 2
        Field Name: FirstName
    Balance
        File: example\data\input\credit-balance-01.fw
        File Line: 4
        File Column: 40
        Record: 3
        Field Index: 3
        Field Name: Balance
---------------------------------------------------------

Record {
    0:[Account]:STRING=[317]:String
    1:[LastName]:STRING=[Murray]:String
    2:[FirstName]:STRING=[Bill]:String
    3:[Balance]:STRING=[789.65]:String
}

Record Lineage
    File: example\data\input\credit-balance-01.fw
    File Line: 5
    File Column: 0
    Record: 4

Field Lineage
    Account
        File: example\data\input\credit-balance-01.fw
        File Line: 5
        File Column: 0
        Record: 4
        Field Index: 0
        Field Name: Account
    LastName
        File: example\data\input\credit-balance-01.fw
        File Line: 5
        File Column: 8
        Record: 4
        Field Index: 1
        Field Name: LastName
    FirstName
        File: example\data\input\credit-balance-01.fw
        File Line: 5
        File Column: 24
        Record: 4
        Field Index: 2
        Field Name: FirstName
    Balance
        File: example\data\input\credit-balance-01.fw
        File Line: 5
        File Column: 40
        Record: 4
        Field Index: 3
        Field Name: Balance
---------------------------------------------------------
Mobile Analytics