How to Convert Tabular Data to Trees Using Aggregation
Updated: Jun 1, 2023
This example demonstrates how to convert tabular data to trees using aggregation.
For a more detailed explanation of this topic check out our blog on How to Convert Tabular Data to Trees Using Aggregation.
Input CSV
Date,Account,Transaction Type,Amount 2017-01-12,123,Credit,6089.78 2017-01-12,123,Fee,9.99 2017-01-12,456,Debit,1997.00 2017-01-12,123,Debit,20996.12 2017-01-13,123,Debit,17.00 2017-01-13,123,Debit,914.36 2017-01-14,789,Credit,11314.00 2017-01-14,789,Fee,9.99 2017-01-14,456,Debit,15247.89 2017-01-14,123,Debit,671.28 2017-01-15,456,Credit,5072.10 2017-01-15,456,Fee,9.99 2017-01-16,456,Debit,5109.07 2017-01-19,123,Credit,482.01 2017-01-19,789,Debit,3588.14 2017-01-19,147,Credit,4891.44
Java Code
/* * Copyright (c) 2006-2022 North Concepts Inc. All rights reserved. * Proprietary and Confidential. Use is subject to license terms. * * https://northconcepts.com/data-pipeline/licensing/ */ package com.northconcepts.datapipeline.examples.cookbook.blog; import java.io.File; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.core.SortingReader; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.filter.FieldFilter; import com.northconcepts.datapipeline.filter.FilteringReader; import com.northconcepts.datapipeline.group.GroupByReader; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.transform.BasicFieldTransformer; import com.northconcepts.datapipeline.transform.SelectFields; import com.northconcepts.datapipeline.transform.TransformingReader; public class ConvertTableToTree { private static File INPUT = new File("example/data/input/financial-transactions.csv"); public static void main(String[] args) { DataReader reader = new CSVReader(INPUT).setFieldNamesInFirstRow(true).setSkipEmptyRows(true); reader = new FilteringReader(reader) .add(new FieldFilter("Transaction Type").valueMatches("Credit", "Debit")); reader = new TransformingReader(reader) .add(new SelectFields("Account", "Transaction Type", "Date", "Amount")); reader = new TransformingReader(reader) .add(new BasicFieldTransformer("Date").stringToDate("yyyy-MM-dd")) .add(new BasicFieldTransformer("Amount").stringToBigDecimal()); reader = new SortingReader(reader).asc("Account").asc("Transaction Type").asc("Date").asc("Amount"); reader = new GroupByReader(reader, "Account").add(new GroupTree("data")); // DataWriter writer = StreamWriter.newJsonSystemOutWriter(); DataWriter writer = StreamWriter.newSystemOutWriter(); Job.run(reader, writer); } }
Code walkthrough
- The financial-transactions.csv file is used to create a CSVReader.
- The method
.setFieldNamesInFirstRow(true)
is called to use the first row as field names. - The method
.setSkipEmptyRows(true)
is used to remove any empty rows. - A FilteringReader is utilized to only show credit and debit transactions that meet certain criteria.
- A TransformingReader is created to transform the data in the reader.
- The
.add(new SelectFields("Account", "Transaction Type", "Date", "Amount"))
method is used to select and arrange fields based on their names. - BasicFieldTransformer is utilized to convert fields from strings to their actual types.
- To ensure child branches are grouped together, SortingReader is used to sort the data.
- GroupByReader is applied with the GroupTree aggregate operator to group records by account as the root of each branch.
- The data is transferred from the reader to the StreamWriter via the Job.run() method.
Console Output
0 - Record (MODIFIED) (has child records) { 0:[Account]:STRING=[123]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Credit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-12]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[6089.78]:BigDecimal }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-19]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[482.01]:BigDecimal }]]:ArrayValue }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Debit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-12]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[20996.12]:BigDecimal }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-13]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[17]:BigDecimal }, Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[914.36]:BigDecimal }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-14]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[671.28]:BigDecimal }]]:ArrayValue }]]:ArrayValue }]]:ArrayValue } ----------------------------------------------- 1 - Record (MODIFIED) (has child records) { 0:[Account]:STRING=[147]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Credit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-19]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[4891.44]:BigDecimal }]]:ArrayValue }]]:ArrayValue }]]:ArrayValue } ----------------------------------------------- 2 - Record (MODIFIED) (has child records) { 0:[Account]:STRING=[456]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Credit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-15]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[5072.1]:BigDecimal }]]:ArrayValue }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Debit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-12]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[1997]:BigDecimal }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-14]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[15247.89]:BigDecimal }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-16]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[5109.07]:BigDecimal }]]:ArrayValue }]]:ArrayValue }]]:ArrayValue } ----------------------------------------------- 3 - Record (MODIFIED) (has child records) { 0:[Account]:STRING=[789]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Credit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-14]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[11314]:BigDecimal }]]:ArrayValue }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Debit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-19]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[3588.14]:BigDecimal }]]:ArrayValue }]]:ArrayValue }]]:ArrayValue } ----------------------------------------------- 4 records