Infer Date Time from a CSV File
Updated: Dec 26, 2023
This example automates the detection of dates, times, and datetimes types from column values within a CSV file. It also provides functionality to detect the string patterns and the extract Java type for each column. You can use this approach to reduce the manual work when creating your data ingestion pipelines.
You can also view the Show Column Statistics example to learn how DataPipeline finds column statistics and insights for data analysis and exploration.
Input CSV file
DateField,TimeField,DateTimeField 2023/12/21,12:00:00,21-12-2023 12:00:00 PM 2023/12/14,06:30:00,14-12-2023 06:30:00 PM 2023/12/07,09:00:00,07-12-2023 09:00:00 AM
Java Code
package com.northconcepts.datapipeline.foundations.examples.pipeline; import java.io.File; import java.util.Map; import java.util.concurrent.atomic.LongAdder; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.foundations.pipeline.Pipeline; import com.northconcepts.datapipeline.foundations.pipeline.dataset.Column; import com.northconcepts.datapipeline.foundations.pipeline.dataset.Dataset; import com.northconcepts.datapipeline.foundations.pipeline.dataset.MemoryDataset; import com.northconcepts.datapipeline.foundations.time.DateTimePattern; public class InferDateTimeFromCSVFile { public static void main(String[] args) { Pipeline pipeline = new Pipeline(); pipeline.setInputAsDataReaderFactory(() -> new CSVReader(new File("example/data/input/DateTimeData.csv")).setFieldNamesInFirstRow(true)); Dataset dataset = new MemoryDataset(pipeline); dataset.load().waitForColumnStatsToLoad(); System.out.println("Column Count: " + dataset.getColumnCount()); System.out.println("Record Count: " + dataset.getRecordCount()); for (Column column : dataset.getColumns()) { System.out.println("==============================="); System.out.println("Name: " + column.getName()); System.out.println("Value Count: " + column.getValueCount()); System.out.println("Is Temporal Column: " + column.isTemporal()); System.out.println(" Inferred Temporal Value Count: " + column.getInferredTemporalValueCount()); if (column.isTemporal()) { for (Map.Entryentry : column.getTemporalPatterns().entrySet()) { System.out.println(" " + entry.getKey().getPattern() + " -- " + entry.getValue().longValue()); } } System.out.println("Inferred Type Name: " + column.getInferredTypeName()); System.out.println("Inferred Field Type: " + column.getInferredFieldType()); } } }
Code Walkthrough
- After creating a Pipeline using the provided input file reader, it is loaded into a MemoryDataset to facilitate additional processing.
- Column and record count of the input file are first printed on the console.
- Based on column values, the following information, such as name, number of values, inferred data type, and patterns is detected and shown on the console.
Console Output
Column Count: 3 Record Count: 3 =============================== Name: DateField Value Count: 3 Is Temporal Column: true Inferred Temporal Value Count: 3 yyyy/M/d -- 3 Inferred Type Name: Date Inferred Field Type: DATE =============================== Name: TimeField Value Count: 3 Is Temporal Column: true Inferred Temporal Value Count: 3 H:mm[:ss] -- 3 Inferred Type Name: Time Inferred Field Type: TIME =============================== Name: DateTimeField Value Count: 3 Is Temporal Column: true Inferred Temporal Value Count: 3 M-d-yyyy h:mm[:ss] a -- 1 d-M-yyyy h:mm[:ss] a -- 3 Inferred Type Name: Datetime Inferred Field Type: DATETIME