Infer Date Time from a CSV File

This example automates the detection of dates, times, and datetimes types from column values within a CSV file. It also provides functionality to detect the string patterns and the extract Java type for each column. You can use this approach to reduce the manual work when creating your data ingestion pipelines.

You can also view the Show Column Statistics example to learn how DataPipeline finds column statistics and insights for data analysis and exploration.

 

Input CSV file

DateField,TimeField,DateTimeField
2023/12/21,12:00:00,21-12-2023 12:00:00 PM
2023/12/14,06:30:00,14-12-2023 06:30:00 PM
2023/12/07,09:00:00,07-12-2023 09:00:00 AM

 

Java Code

package com.northconcepts.datapipeline.foundations.examples.pipeline;

import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.foundations.pipeline.Pipeline;
import com.northconcepts.datapipeline.foundations.pipeline.dataset.Column;
import com.northconcepts.datapipeline.foundations.pipeline.dataset.Dataset;
import com.northconcepts.datapipeline.foundations.pipeline.dataset.MemoryDataset;
import com.northconcepts.datapipeline.foundations.time.DateTimePattern;

import java.io.File;
import java.util.Map;
import java.util.concurrent.atomic.LongAdder;

public class InferDateTimeFromCSVFile {
    public static void main(String[] args) {
        Pipeline pipeline = new Pipeline();
        pipeline.setInputAsDataReaderFactory(() -> new CSVReader(new File("example/data/input/DateTimeData.csv")).setFieldNamesInFirstRow(true));

        Dataset dataset = new MemoryDataset(pipeline);
        dataset.load().waitForColumnStatsToLoad();

        System.out.println("Column Count: " + dataset.getColumnCount());
        System.out.println("Record Count: " + dataset.getRecordCount());

        for (Column column : dataset.getColumns()) {
            System.out.println("===============================");
            System.out.println("Name: " + column.getName());
            System.out.println("Value Count: " + column.getValueCount());

            System.out.println("Is Temporal Column: " + column.getTemporal());
            System.out.println("    Inferred Temporal Value Count: " + column.getInferredTemporalValueCount());
            if (column.getTemporal()) {
                for (Map.Entry entry : column.getTemporalPatterns().entrySet()) {
                    System.out.println("    " + entry.getKey().getPattern() + "  --  " + entry.getValue().longValue());
                }
            }

            System.out.println("Inferred Type Name: " + column.getInferredTypeName());
            System.out.println("Inferred Field Type: " + column.getInferredFieldType());
        }
    }
}

 

Code Walkthrough

  1. After creating a Pipeline using the provided input file reader, it is loaded into a MemoryDataset to facilitate additional processing.
  2. Column and record count of the input file are first printed on the console.
  3. Based on column values, the following information, such as name, number of values, inferred data type, and patterns is detected and shown on the console. 

 

Console Output

Column Count: 3
Record Count: 3
===============================
Name: DateField
Value Count: 3
Is Temporal Column: true
    Inferred Temporal Value Count: 3
    yyyy/M/d  --  3
Inferred Type Name: Date
Inferred Field Type: DATE
===============================
Name: TimeField
Value Count: 3
Is Temporal Column: true
    Inferred Temporal Value Count: 3
    H:mm[:ss]  --  3
Inferred Type Name: Time
Inferred Field Type: TIME
===============================
Name: DateTimeField
Value Count: 3
Is Temporal Column: true
    Inferred Temporal Value Count: 3
    M-d-yyyy h:mm[:ss] a  --  1
    d-M-yyyy h:mm[:ss] a  --  3
Inferred Type Name: Datetime
Inferred Field Type: DATETIME
Mobile Analytics