Declaratively Set Default Values for Missing Data

This example specializes in reading data from input files and automatically setting default values for empty fields. It provides a convenient way to ensure that missing or empty data points are populated with predefined default values, improving data completeness and consistency.

The example can be integrated into application development workflows where data input is required. It ensures that the application receives valid and complete data by automatically filling in default values for any missing or empty fields.

Input Files

csv-with-null-values.csv

ID, BOOLEAN1, STRING1, BYTE1, CHAR1, DATE1, DATETIME1, TIME1, INT1, LONG1, DOUBLE1, FLOAT1, SHORT1,BIGDECIMAL1,BIGINTEGER1
1, true, "hello world", 7, T, 1999-08-21, 1999-08-21T15:31:50.000Z, 15:31:50, 789456, 987654321, 123.456, 963.258, 17,92233720368547758079223372036854775807.92233720368547758079223372036854775807,92233720368547758079223372036854775807
2, false, goodbye world, 127, z, 1786-12-30, 1786-12-30T01:05:17.000Z, 1:05:17, "2147483647", "9223372036854775807", 1.79769313486231570e+308d, 3.40282346638528860e+38, "32767",-9406564584124654494065645841246544.9406564584124654494065645841246544,-92233720368547758079223372036854775807
3, , , , , , , , -2147483648, "-9223372036854775808", -4.94065645841246544e-324d, 1.40129846432481707e-45, -32768,0,0
4, , , , , , , , , , , , , ,
5

schema-with-default-values.xml

<?xml version="1.0"?>
<schema name="Schema" description="This is test schema.">
  <entities>
    <entity name="Entity" allowExtraFieldsInValidation="true">
      <fields>
        <field name="ID" type="INT" defaultValueExpression="0"/>
        <field name="BOOLEAN1" type="BOOLEAN" defaultValueExpression="false"/>
        <field name="STRING1" type="STRING" defaultValueExpression="'DEFAULT STRING VALUE'"/>
        <field name="BYTE1" type="BYTE" defaultValueExpression="88"/>
        <field name="CHAR1" type="CHAR" defaultValueExpression="'z'"/>
        <field name="DATE1" type="DATE" defaultValueExpression="toDate('1999-08-21T15:31:50.000Z')"/>
        <field name="DATETIME1" type="DATETIME" defaultValueExpression="toDatetime('1999-08-21T15:31:50.000Z')"/>
        <field name="TIME1" type="TIME" defaultValueExpression="toDatetime('1999-08-21T15:31:50.000Z')"/>
        <field name="INT1" type="INT" defaultValueExpression="99"/>
        <field name="LONG1" type="LONG" defaultValueExpression="1623217558746"/>
        <field name="DOUBLE1" type="DOUBLE" defaultValueExpression="10.0"/>
        <field name="FLOAT1" type="FLOAT" defaultValueExpression="99"/>
        <field name="SHORT1" type="SHORT" defaultValueExpression="99"/>
        <field name="BIGDECIMAL1" type="BIG_DECIMAL" defaultValueExpression="'111111111111111111111111111111111111111.111111111111111111111111111111111111111'"/>
        <field name="BIGINTEGER1" type="BIG_INTEGER" defaultValueExpression="'99999999999999999999999999999999999999'"/>
      </fields>
    </entity>
  </entities>
</schema>

 

Java code listing

package com.northconcepts.datapipeline.foundations.examples.datamapping;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.foundations.datamapping.DataMapping;
import com.northconcepts.datapipeline.foundations.datamapping.DataMappingReader;
import com.northconcepts.datapipeline.foundations.schema.EntityDef;
import com.northconcepts.datapipeline.foundations.schema.SchemaDef;
import com.northconcepts.datapipeline.foundations.schema.SchemaTransformer;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.transform.TransformingReader;

import java.io.File;
import java.io.FileInputStream;

public class DeclarativelySetDefaultValuesFoMissingData {

    public static void main(String... args) throws Throwable {
        DataReader reader = new CSVReader(new File("example/data/input/schema/csv-with-null-values.csv"))
                .setFieldNamesInFirstRow(true);
        
        SchemaDef schema = new SchemaDef()
                .fromXml(new FileInputStream(new File("example/data/input/schema/schema-with-default-values.xml")));
        EntityDef sourceAccountEntity = schema.getEntity("Entity");
        reader = new TransformingReader(reader)
                .add(new SchemaTransformer(sourceAccountEntity));

//        DataWriter writer = new NullWriter();
        DataWriter writer = StreamWriter.newSystemOutWriter();
        
        Job job = Job.run(reader, writer);
        System.out.println("Records Transferred: " + job.getRecordsTransferred());
        System.out.println("Running Time: " + job.getRunningTimeAsString());
    }
}

 

Code Walkthrough

  1. CSVReader instance is created corresponding to the input file csv-with-null-values.csv.
  2. Schema and entity definitions are imported from the input file schema-with-default-values.xml. The entity definition in the file contains field names, data types, and default value expressions. The last parameter defaultValueExpression specifies the value of the field when it is found empty from the source.
  3. TransformingReader is used to convert the dataset from Step 1 based on the source entity definitions in the previous step.
  4. You can print obtained records on the console by using StreamWriter.newSystemOutWriter().
  5. Data are transferred from reader to the console via Job.run() method. See how to compile and run data pipeline jobs.
  6. The number of records and running time are also printed in the console.

 

CSVReader

Obtains records from a Comma Separated Value (CSV) or delimited stream. It extends TextReader class and can be created using or Reader object. Passing true to method setFieldNamesInFirstRow() in this class enables the CSVReader to use the names specified in the first row of the input data as field names.

 

Console output

-----------------------------------------------
0 - Record (MODIFIED) {
    0:[ID]:INT=[1]:Integer
    1:[BOOLEAN1]:BOOLEAN=[true]:Boolean
    2:[STRING1]:STRING=[hello world]:String
    3:[BYTE1]:BYTE=[7]:Byte
    4:[CHAR1]:CHAR=[T]:Character
    5:[DATE1]:DATE=[1999-08-21]:Date
    6:[DATETIME1]:DATETIME=[Sat Aug 21 15:31:50 UZT 1999]:Date
    7:[TIME1]:TIME=[15:31:50]:Time
    8:[INT1]:INT=[789456]:Integer
    9:[LONG1]:LONG=[987654321]:Long
    10:[DOUBLE1]:DOUBLE=[123.456]:Double
    11:[FLOAT1]:FLOAT=[963.258]:Float
    12:[SHORT1]:SHORT=[17]:Short
    13:[BIGDECIMAL1]:BIG_DECIMAL=[92233720368547758079223372036854775807.92233720368547758079223372036854775807]:BigDecimal
    14:[BIGINTEGER1]:BIG_INTEGER=[92233720368547758079223372036854775807]:BigInteger
}

-----------------------------------------------
1 - Record (MODIFIED) {
    0:[ID]:INT=[2]:Integer
    1:[BOOLEAN1]:BOOLEAN=[false]:Boolean
    2:[STRING1]:STRING=[goodbye world]:String
    3:[BYTE1]:BYTE=[127]:Byte
    4:[CHAR1]:CHAR=[z]:Character
    5:[DATE1]:DATE=[1786-12-30]:Date
    6:[DATETIME1]:DATETIME=[Sat Dec 30 01:05:17 UZT 1786]:Date
    7:[TIME1]:TIME=[01:05:17]:Time
    8:[INT1]:INT=[2147483647]:Integer
    9:[LONG1]:LONG=[9223372036854775807]:Long
    10:[DOUBLE1]:DOUBLE=[1.7976931348623157E308]:Double
    11:[FLOAT1]:FLOAT=[3.4028235E38]:Float
    12:[SHORT1]:SHORT=[32767]:Short
    13:[BIGDECIMAL1]:BIG_DECIMAL=[-9406564584124654494065645841246544.9406564584124654494065645841246544]:BigDecimal
    14:[BIGINTEGER1]:BIG_INTEGER=[-92233720368547758079223372036854775807]:BigInteger
}

-----------------------------------------------
2 - Record (MODIFIED) {
    0:[ID]:INT=[3]:Integer
    1:[BOOLEAN1]:BOOLEAN=[false]:Boolean
    2:[STRING1]:STRING=[DEFAULT STRING VALUE]:String
    3:[BYTE1]:BYTE=[88]:Byte
    4:[CHAR1]:CHAR=[z]:Character
    5:[DATE1]:DATE=[1999-08-21]:Date
    6:[DATETIME1]:DATETIME=[Sat Aug 21 15:31:50 UZT 1999]:Date
    7:[TIME1]:TIME=[15:31:50]:Time
    8:[INT1]:INT=[-2147483648]:Integer
    9:[LONG1]:LONG=[-9223372036854775808]:Long
    10:[DOUBLE1]:DOUBLE=[-4.9E-324]:Double
    11:[FLOAT1]:FLOAT=[1.4E-45]:Float
    12:[SHORT1]:SHORT=[-32768]:Short
    13:[BIGDECIMAL1]:BIG_DECIMAL=[0]:BigDecimal
    14:[BIGINTEGER1]:BIG_INTEGER=[0]:BigInteger
}

-----------------------------------------------
3 - Record (MODIFIED) {
    0:[ID]:INT=[4]:Integer
    1:[BOOLEAN1]:BOOLEAN=[false]:Boolean
    2:[STRING1]:STRING=[DEFAULT STRING VALUE]:String
    3:[BYTE1]:BYTE=[88]:Byte
    4:[CHAR1]:CHAR=[z]:Character
    5:[DATE1]:DATE=[1999-08-21]:Date
    6:[DATETIME1]:DATETIME=[Sat Aug 21 15:31:50 UZT 1999]:Date
    7:[TIME1]:TIME=[15:31:50]:Time
    8:[INT1]:INT=[99]:Integer
    9:[LONG1]:LONG=[1623217558746]:Long
    10:[DOUBLE1]:DOUBLE=[10.0]:Double
    11:[FLOAT1]:FLOAT=[99.0]:Float
    12:[SHORT1]:SHORT=[99]:Short
    13:[BIGDECIMAL1]:BIG_DECIMAL=[111111111111111111111111111111111111111.111111111111111111111111111111111111111]:BigDecimal
    14:[BIGINTEGER1]:BIG_INTEGER=[99999999999999999999999999999999999999]:BigInteger
}

-----------------------------------------------
4 - Record (MODIFIED) {
    0:[ID]:INT=[5]:Integer
    1:[BOOLEAN1]:BOOLEAN=[false]:Boolean
    2:[STRING1]:STRING=[DEFAULT STRING VALUE]:String
    3:[BYTE1]:BYTE=[88]:Byte
    4:[CHAR1]:CHAR=[z]:Character
    5:[DATE1]:DATE=[1999-08-21]:Date
    6:[DATETIME1]:DATETIME=[Sat Aug 21 15:31:50 UZT 1999]:Date
    7:[TIME1]:TIME=[15:31:50]:Time
    8:[INT1]:INT=[99]:Integer
    9:[LONG1]:LONG=[1623217558746]:Long
    10:[DOUBLE1]:DOUBLE=[10.0]:Double
    11:[FLOAT1]:FLOAT=[99.0]:Float
    12:[SHORT1]:SHORT=[99]:Short
    13:[BIGDECIMAL1]:BIG_DECIMAL=[111111111111111111111111111111111111111.111111111111111111111111111111111111111]:BigDecimal
    14:[BIGINTEGER1]:BIG_INTEGER=[99999999999999999999999999999999999999]:BigInteger
}

-----------------------------------------------
5 records
Records Transferred: 5
Running Time: 24 Milliseconds
Mobile Analytics