How to Convert Tabular Data to Trees Using Aggregation

Updated: Jun 1, 2023

This example demonstrates how to convert tabular data to trees using aggregation.

For a more detailed explanation of this topic check out our blog on How to Convert Tabular Data to Trees Using Aggregation.

Input CSV

Date,Account,Transaction Type,Amount
2017-01-12,123,Credit,6089.78
2017-01-12,123,Fee,9.99
2017-01-12,456,Debit,1997.00
2017-01-12,123,Debit,20996.12
2017-01-13,123,Debit,17.00
2017-01-13,123,Debit,914.36
2017-01-14,789,Credit,11314.00
2017-01-14,789,Fee,9.99
2017-01-14,456,Debit,15247.89
2017-01-14,123,Debit,671.28
2017-01-15,456,Credit,5072.10
2017-01-15,456,Fee,9.99
2017-01-16,456,Debit,5109.07
2017-01-19,123,Credit,482.01
2017-01-19,789,Debit,3588.14
2017-01-19,147,Credit,4891.44

Java Code

/*
 * Copyright (c) 2006-2022 North Concepts Inc.  All rights reserved.
 * Proprietary and Confidential.  Use is subject to license terms.
 * 
 * https://northconcepts.com/data-pipeline/licensing/
 */
package com.northconcepts.datapipeline.examples.cookbook.blog;

import java.io.File;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.SortingReader;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.filter.FieldFilter;
import com.northconcepts.datapipeline.filter.FilteringReader;
import com.northconcepts.datapipeline.group.GroupByReader;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.transform.BasicFieldTransformer;
import com.northconcepts.datapipeline.transform.SelectFields;
import com.northconcepts.datapipeline.transform.TransformingReader;

public class ConvertTableToTree {

    private static File INPUT = new File("example/data/input/financial-transactions.csv");

    public static void main(String[] args) {
        DataReader reader = new CSVReader(INPUT).setFieldNamesInFirstRow(true).setSkipEmptyRows(true);
        
        reader = new FilteringReader(reader)
                .add(new FieldFilter("Transaction Type").valueMatches("Credit", "Debit"));
        
        reader = new TransformingReader(reader)
                .add(new SelectFields("Account", "Transaction Type", "Date", "Amount"));
        
        reader = new TransformingReader(reader)
                .add(new BasicFieldTransformer("Date").stringToDate("yyyy-MM-dd"))
                .add(new BasicFieldTransformer("Amount").stringToBigDecimal());


        reader = new SortingReader(reader).asc("Account").asc("Transaction Type").asc("Date").asc("Amount");
        
        reader = new GroupByReader(reader, "Account").add(new GroupTree("data"));
        
//        DataWriter writer = StreamWriter.newJsonSystemOutWriter();
        DataWriter writer = StreamWriter.newSystemOutWriter();
        Job.run(reader, writer);
    }

}

Code walkthrough

  1. The financial-transactions.csv file is used to create a CSVReader.
  2. The method .setFieldNamesInFirstRow(true) is called to use the first row as field names.
  3. The method .setSkipEmptyRows(true) is used to remove any empty rows.
  4. FilteringReader is utilized to only show credit and debit transactions that meet certain criteria.
  5. TransformingReader is created to transform the data in the reader.
  6. The .add(new SelectFields("Account", "Transaction Type", "Date", "Amount")) method is used to select and arrange fields based on their names.
  7. BasicFieldTransformer is utilized to convert fields from strings to their actual types.
  8. To ensure child branches are grouped together, SortingReader is used to sort the data.
  9. GroupByReader is applied with the GroupTree aggregate operator to group records by account as the root of each branch.
  10. The data is transferred from the reader to the StreamWriter via the Job.run() method.

Console Output

0 - Record (MODIFIED) (has child records) {
    0:[Account]:STRING=[123]:String
    1:[data]:ARRAY of RECORD=[[
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Credit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-12]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[6089.78]:BigDecimal
                        }]]:ArrayValue
                }, 
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-19]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[482.01]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }, 
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Debit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-12]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[20996.12]:BigDecimal
                        }]]:ArrayValue
                }, 
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-13]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[17]:BigDecimal
                        }, 
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[914.36]:BigDecimal
                        }]]:ArrayValue
                }, 
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-14]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[671.28]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }]]:ArrayValue
}

-----------------------------------------------
1 - Record (MODIFIED) (has child records) {
    0:[Account]:STRING=[147]:String
    1:[data]:ARRAY of RECORD=[[
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Credit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-19]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[4891.44]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }]]:ArrayValue
}

-----------------------------------------------
2 - Record (MODIFIED) (has child records) {
    0:[Account]:STRING=[456]:String
    1:[data]:ARRAY of RECORD=[[
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Credit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-15]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[5072.1]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }, 
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Debit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-12]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[1997]:BigDecimal
                        }]]:ArrayValue
                }, 
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-14]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[15247.89]:BigDecimal
                        }]]:ArrayValue
                }, 
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-16]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[5109.07]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }]]:ArrayValue
}

-----------------------------------------------
3 - Record (MODIFIED) (has child records) {
    0:[Account]:STRING=[789]:String
    1:[data]:ARRAY of RECORD=[[
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Credit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-14]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[11314]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }, 
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Debit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-19]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[3588.14]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }]]:ArrayValue
}

-----------------------------------------------
4 records
Mobile Analytics