Use Data Lineage with FixedWidthReader
Updated: May 30, 2022
In this example we are going to see how we can use Data lineage with FixedWidthReader
Data lineage which is a metadata added to records and fields indicating where they were loaded from. It can be useful for audits and reconciliation as well as troubleshooting.
Data lineage can also be used with other readers, for example Data Lineage with Excel reader and Data Lineage with CSV reader.
Input
Account LastName FirstName Balance CreditLimit AccountCreated Rating 101 Reeves Keanu 9315.45 10000.00 1/17/1998 A 312 Butler Gerard 90.00 1000.00 8/6/2003 B 868 Hewitt Jennifer Love 0 17000.00 5/25/1985 B 761 Pinkett-Smith Jada 49654.87 100000.00 12/5/2006 A 317 Murray Bill 789.65 5000.00 2/5/2007 C
Java Code listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.core.Field; import com.northconcepts.datapipeline.core.Record; import com.northconcepts.datapipeline.fixedwidth.FixedWidthReader; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.lineage.FieldLineage; import com.northconcepts.datapipeline.lineage.RecordLineage; public class UseDataLineageWithFixedWidthReader { public static void main(String[] args) { DataReader reader = new FixedWidthReader(new File("example/data/input/credit-balance-01.fw")) .addFields(8) .addFields(16) .addFields(16) .addFields(12) .skipField(14) // ignore CreditLimit field .skipField(16) // ignore AccountCreated field .skipField(7) // ignore Rating field .setFieldNamesInFirstRow(true) .setSaveLineage(true); Job.run(reader, new LineageWriter()); } public final static class LineageWriter extends DataWriter { @Override protected void writeImpl(Record record) throws Throwable { System.out.println(record); RecordLineage recordLineage = new RecordLineage().setRecord(record); System.out.println("Record Lineage"); System.out.println(" File: " + recordLineage.getFile()); System.out.println(" File Line: " + recordLineage.getFileLineNumber()); System.out.println(" File Column: " + recordLineage.getFileColumnNumber()); System.out.println(" Record: " + recordLineage.getRecordNumber()); System.out.println(); FieldLineage fieldLineage = new FieldLineage(); System.out.println("Field Lineage"); for (int i=0; i < record.getFieldCount(); i++) { Field field = record.getField(i); fieldLineage.setField(field); System.out.println(" " + field.getName()); System.out.println(" File: " + fieldLineage.getFile()); System.out.println(" File Line: " + fieldLineage.getFileLineNumber()); System.out.println(" File Column: " + fieldLineage.getFileColumnNumber()); System.out.println(" Record: " + fieldLineage.getRecordNumber()); System.out.println(" Field Index: " + fieldLineage.getOriginalFieldIndex()); System.out.println(" Field Name: " + fieldLineage.getOriginalFieldName()); } System.out.println("---------------------------------------------------------"); System.out.println(); } } }
Code walkthrough
- FixedWidthReader is used to obtain records from a fixed width stream.
.addFields()
method is used to to specify which field to include and the value that it accepts specifies the maximum number of characters that should be in that specific field. If you specify a small number and any word in that field exceeds that value it will be truncated.setSaveLineage(true)
enable lineage support since it is turned off by default.Job.run()
method transfers data from thereader
to theLineageWriter()
.
RecordLineage
RecordLineage
informs us of the starting location where the record was loaded.recordLineage.getFile()
- The java.io.File, if one was used to create the DataReader.recordLineage.getFileLineNumber()
-The line number in the input file starting with 0.recordLineage.getFileColumnNumber()
-The column number in the input file starting with 0.recordLineage.getRecordNumber()
-The sequential record number starting with 0.
FieldLineage
FieldLineage
informs us of the starting location for each individual fieldfieldLineage.getOriginalFieldIndex()
-The index of a field set by the DataReader before any transformation or operation was performed.fieldLineage.getOriginalFieldName()
-The name of a field set by the DataReader before any transformation or operation was performed.
Output
Record { 0:[Account]:STRING=[101]:String 1:[LastName]:STRING=[Reeves]:String 2:[FirstName]:STRING=[Keanu]:String 3:[Balance]:STRING=[9315.45]:String } Record Lineage File: example\data\input\credit-balance-01.fw File Line: 1 File Column: 0 Record: 0 Field Lineage Account File: example\data\input\credit-balance-01.fw File Line: 1 File Column: 0 Record: 0 Field Index: 0 Field Name: Account LastName File: example\data\input\credit-balance-01.fw File Line: 1 File Column: 8 Record: 0 Field Index: 1 Field Name: LastName FirstName File: example\data\input\credit-balance-01.fw File Line: 1 File Column: 24 Record: 0 Field Index: 2 Field Name: FirstName Balance File: example\data\input\credit-balance-01.fw File Line: 1 File Column: 40 Record: 0 Field Index: 3 Field Name: Balance --------------------------------------------------------- Record { 0:[Account]:STRING=[312]:String 1:[LastName]:STRING=[Butler]:String 2:[FirstName]:STRING=[Gerard]:String 3:[Balance]:STRING=[90.00]:String } Record Lineage File: example\data\input\credit-balance-01.fw File Line: 2 File Column: 0 Record: 1 Field Lineage Account File: example\data\input\credit-balance-01.fw File Line: 2 File Column: 0 Record: 1 Field Index: 0 Field Name: Account LastName File: example\data\input\credit-balance-01.fw File Line: 2 File Column: 8 Record: 1 Field Index: 1 Field Name: LastName FirstName File: example\data\input\credit-balance-01.fw File Line: 2 File Column: 24 Record: 1 Field Index: 2 Field Name: FirstName Balance File: example\data\input\credit-balance-01.fw File Line: 2 File Column: 40 Record: 1 Field Index: 3 Field Name: Balance --------------------------------------------------------- Record { 0:[Account]:STRING=[868]:String 1:[LastName]:STRING=[Hewitt]:String 2:[FirstName]:STRING=[Jennifer Love]:String 3:[Balance]:STRING=[0]:String } Record Lineage File: example\data\input\credit-balance-01.fw File Line: 3 File Column: 0 Record: 2 Field Lineage Account File: example\data\input\credit-balance-01.fw File Line: 3 File Column: 0 Record: 2 Field Index: 0 Field Name: Account LastName File: example\data\input\credit-balance-01.fw File Line: 3 File Column: 8 Record: 2 Field Index: 1 Field Name: LastName FirstName File: example\data\input\credit-balance-01.fw File Line: 3 File Column: 24 Record: 2 Field Index: 2 Field Name: FirstName Balance File: example\data\input\credit-balance-01.fw File Line: 3 File Column: 40 Record: 2 Field Index: 3 Field Name: Balance --------------------------------------------------------- Record { 0:[Account]:STRING=[761]:String 1:[LastName]:STRING=[Pinkett-Smith]:String 2:[FirstName]:STRING=[Jada]:String 3:[Balance]:STRING=[49654.87]:String } Record Lineage File: example\data\input\credit-balance-01.fw File Line: 4 File Column: 0 Record: 3 Field Lineage Account File: example\data\input\credit-balance-01.fw File Line: 4 File Column: 0 Record: 3 Field Index: 0 Field Name: Account LastName File: example\data\input\credit-balance-01.fw File Line: 4 File Column: 8 Record: 3 Field Index: 1 Field Name: LastName FirstName File: example\data\input\credit-balance-01.fw File Line: 4 File Column: 24 Record: 3 Field Index: 2 Field Name: FirstName Balance File: example\data\input\credit-balance-01.fw File Line: 4 File Column: 40 Record: 3 Field Index: 3 Field Name: Balance --------------------------------------------------------- Record { 0:[Account]:STRING=[317]:String 1:[LastName]:STRING=[Murray]:String 2:[FirstName]:STRING=[Bill]:String 3:[Balance]:STRING=[789.65]:String } Record Lineage File: example\data\input\credit-balance-01.fw File Line: 5 File Column: 0 Record: 4 Field Lineage Account File: example\data\input\credit-balance-01.fw File Line: 5 File Column: 0 Record: 4 Field Index: 0 Field Name: Account LastName File: example\data\input\credit-balance-01.fw File Line: 5 File Column: 8 Record: 4 Field Index: 1 Field Name: LastName FirstName File: example\data\input\credit-balance-01.fw File Line: 5 File Column: 24 Record: 4 Field Index: 2 Field Name: FirstName Balance File: example\data\input\credit-balance-01.fw File Line: 5 File Column: 40 Record: 4 Field Index: 3 Field Name: Balance ---------------------------------------------------------