How to Convert Tabular Data to Trees Using Aggregation

Updated: Aug 10, 2023

This example shows how to convert tabular data into trees using aggregation techniques. It allows you to transform structured data into hierarchical structures using data aggregation and grouping related records.

This can be used to analyze and visualize hierarchical relationships within tabular data, particularly in cases such as financial data analysis, sales and marketing analytics, customer segmentation, and supply chain management. By converting tabular data into tree structures, you can gain insights into the relationships and hierarchies present in the data, enabling more advanced analysis and decision-making.

For a more detailed explanation of this topic check out our blog on How to Convert Tabular Data to Trees Using Aggregation.

 

Input CSV

Date,Account,Transaction Type,Amount
2017-01-12,123,Credit,6089.78
2017-01-12,123,Fee,9.99
2017-01-12,456,Debit,1997.00
2017-01-12,123,Debit,20996.12
2017-01-13,123,Debit,17.00
2017-01-13,123,Debit,914.36
2017-01-14,789,Credit,11314.00
2017-01-14,789,Fee,9.99
2017-01-14,456,Debit,15247.89
2017-01-14,123,Debit,671.28
2017-01-15,456,Credit,5072.10
2017-01-15,456,Fee,9.99
2017-01-16,456,Debit,5109.07
2017-01-19,123,Credit,482.01
2017-01-19,789,Debit,3588.14
2017-01-19,147,Credit,4891.44

 

Java Code

package com.northconcepts.datapipeline.examples.cookbook.blog;

import java.io.File;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.SortingReader;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.filter.FieldFilter;
import com.northconcepts.datapipeline.filter.FilteringReader;
import com.northconcepts.datapipeline.group.GroupByReader;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.transform.BasicFieldTransformer;
import com.northconcepts.datapipeline.transform.SelectFields;
import com.northconcepts.datapipeline.transform.TransformingReader;

public class ConvertTableToTree {

    private static File INPUT = new File("example/data/input/financial-transactions.csv");

    public static void main(String[] args) {
        DataReader reader = new CSVReader(INPUT).setFieldNamesInFirstRow(true).setSkipEmptyRows(true);
        
        reader = new FilteringReader(reader)
                .add(new FieldFilter("Transaction Type").valueMatches("Credit", "Debit"));
        
        reader = new TransformingReader(reader)
                .add(new SelectFields("Account", "Transaction Type", "Date", "Amount"));
        
        reader = new TransformingReader(reader)
                .add(new BasicFieldTransformer("Date").stringToDate("yyyy-MM-dd"))
                .add(new BasicFieldTransformer("Amount").stringToBigDecimal());


        reader = new SortingReader(reader).asc("Account").asc("Transaction Type").asc("Date").asc("Amount");
        
        reader = new GroupByReader(reader, "Account").add(new GroupTree("data"));
        
//        DataWriter writer = StreamWriter.newJsonSystemOutWriter();
        DataWriter writer = StreamWriter.newSystemOutWriter();
        Job.run(reader, writer);
    }

}

 

Code Walkthrough

  1. An input file financial-transactions.csv is used to create a CSVReader.
  2. The method setFieldNamesInFirstRow(true) is called to declare the first row as field names.
  3. The method setSkipEmptyRows(true) is used to remove any empty rows.
  4. FilteringReader is utilized to only show credit and debit transactions that meet certain criteria.
  5. TransformingReader is created to transform the data in the reader.
  6. The .add(new SelectFields("Account", "Transaction Type", "Date", "Amount")) method is used to select and arrange fields based on their names.
  7. BasicFieldTransformer is utilized to change the data type of fields. In the given example, data types of "Date" and "Amount" are converted from strings to their actual types.
  8. SortingReader is used to sort the data.
  9. GroupByReader is applied with the GroupTree aggregate operator to group records by the "Account" field as the root of each branch.
  10. The data is transferred from the reader to the StreamWriter via the Job.run() method.

 

Console Output

0 - Record (MODIFIED) (has child records) {
    0:[Account]:STRING=[123]:String
    1:[data]:ARRAY of RECORD=[[
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Credit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-12]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[6089.78]:BigDecimal
                        }]]:ArrayValue
                }, 
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-19]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[482.01]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }, 
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Debit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-12]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[20996.12]:BigDecimal
                        }]]:ArrayValue
                }, 
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-13]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[17]:BigDecimal
                        }, 
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[914.36]:BigDecimal
                        }]]:ArrayValue
                }, 
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-14]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[671.28]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }]]:ArrayValue
}

-----------------------------------------------
1 - Record (MODIFIED) (has child records) {
    0:[Account]:STRING=[147]:String
    1:[data]:ARRAY of RECORD=[[
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Credit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-19]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[4891.44]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }]]:ArrayValue
}

-----------------------------------------------
2 - Record (MODIFIED) (has child records) {
    0:[Account]:STRING=[456]:String
    1:[data]:ARRAY of RECORD=[[
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Credit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-15]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[5072.1]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }, 
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Debit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-12]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[1997]:BigDecimal
                        }]]:ArrayValue
                }, 
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-14]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[15247.89]:BigDecimal
                        }]]:ArrayValue
                }, 
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-16]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[5109.07]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }]]:ArrayValue
}

-----------------------------------------------
3 - Record (MODIFIED) (has child records) {
    0:[Account]:STRING=[789]:String
    1:[data]:ARRAY of RECORD=[[
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Credit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-14]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[11314]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }, 
        Record (MODIFIED) (is child record) (has child records) {
            0:[Transaction Type]:STRING=[Debit]:String
            1:[data]:ARRAY of RECORD=[[
                Record (MODIFIED) (is child record) (has child records) {
                    0:[Date]:DATE=[2017-01-19]:Date
                    1:[data]:ARRAY of RECORD=[[
                        Record (MODIFIED) (is child record) {
                            0:[Amount]:BIG_DECIMAL=[3588.14]:BigDecimal
                        }]]:ArrayValue
                }]]:ArrayValue
        }]]:ArrayValue
}

-----------------------------------------------
4 records
Mobile Analytics