How to Convert Tabular Data to Trees Using Aggregation
Updated: Aug 10, 2023
This example shows how to convert tabular data into trees using aggregation techniques. It allows you to transform structured data into hierarchical structures using data aggregation and grouping related records.
This can be used to analyze and visualize hierarchical relationships within tabular data, particularly in cases such as financial data analysis, sales and marketing analytics, customer segmentation, and supply chain management. By converting tabular data into tree structures, you can gain insights into the relationships and hierarchies present in the data, enabling more advanced analysis and decision-making.
For a more detailed explanation of this topic check out our blog on How to Convert Tabular Data to Trees Using Aggregation.
Input CSV
Date,Account,Transaction Type,Amount 2017-01-12,123,Credit,6089.78 2017-01-12,123,Fee,9.99 2017-01-12,456,Debit,1997.00 2017-01-12,123,Debit,20996.12 2017-01-13,123,Debit,17.00 2017-01-13,123,Debit,914.36 2017-01-14,789,Credit,11314.00 2017-01-14,789,Fee,9.99 2017-01-14,456,Debit,15247.89 2017-01-14,123,Debit,671.28 2017-01-15,456,Credit,5072.10 2017-01-15,456,Fee,9.99 2017-01-16,456,Debit,5109.07 2017-01-19,123,Credit,482.01 2017-01-19,789,Debit,3588.14 2017-01-19,147,Credit,4891.44
Java Code
package com.northconcepts.datapipeline.examples.cookbook.blog; import java.io.File; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.core.SortingReader; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.filter.FieldFilter; import com.northconcepts.datapipeline.filter.FilteringReader; import com.northconcepts.datapipeline.group.GroupByReader; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.transform.BasicFieldTransformer; import com.northconcepts.datapipeline.transform.SelectFields; import com.northconcepts.datapipeline.transform.TransformingReader; public class ConvertTableToTree { private static File INPUT = new File("example/data/input/financial-transactions.csv"); public static void main(String[] args) { DataReader reader = new CSVReader(INPUT).setFieldNamesInFirstRow(true).setSkipEmptyRows(true); reader = new FilteringReader(reader) .add(new FieldFilter("Transaction Type").valueMatches("Credit", "Debit")); reader = new TransformingReader(reader) .add(new SelectFields("Account", "Transaction Type", "Date", "Amount")); reader = new TransformingReader(reader) .add(new BasicFieldTransformer("Date").stringToDate("yyyy-MM-dd")) .add(new BasicFieldTransformer("Amount").stringToBigDecimal()); reader = new SortingReader(reader).asc("Account").asc("Transaction Type").asc("Date").asc("Amount"); reader = new GroupByReader(reader, "Account").add(new GroupTree("data")); // DataWriter writer = StreamWriter.newJsonSystemOutWriter(); DataWriter writer = StreamWriter.newSystemOutWriter(); Job.run(reader, writer); } }
Code Walkthrough
- An input file
financial-transactions.csv
is used to create a CSVReader. - The method
setFieldNamesInFirstRow(true)
is called to declare the first row as field names. - The method
setSkipEmptyRows(true)
is used to remove any empty rows. - A FilteringReader is utilized to only show credit and debit transactions that meet certain criteria.
- A TransformingReader is created to transform the data in the reader.
- The
.add(new SelectFields("Account", "Transaction Type", "Date", "Amount"))
method is used to select and arrange fields based on their names. - BasicFieldTransformer is utilized to change the data type of fields. In the given example, data types of "Date" and "Amount" are converted from strings to their actual types.
- SortingReader is used to sort the data.
- GroupByReader is applied with the GroupTree aggregate operator to group records by the "Account" field as the root of each branch.
- The data is transferred from the reader to the StreamWriter via the Job.run() method.
Console Output
0 - Record (MODIFIED) (has child records) { 0:[Account]:STRING=[123]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Credit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-12]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[6089.78]:BigDecimal }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-19]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[482.01]:BigDecimal }]]:ArrayValue }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Debit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-12]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[20996.12]:BigDecimal }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-13]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[17]:BigDecimal }, Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[914.36]:BigDecimal }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-14]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[671.28]:BigDecimal }]]:ArrayValue }]]:ArrayValue }]]:ArrayValue } ----------------------------------------------- 1 - Record (MODIFIED) (has child records) { 0:[Account]:STRING=[147]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Credit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-19]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[4891.44]:BigDecimal }]]:ArrayValue }]]:ArrayValue }]]:ArrayValue } ----------------------------------------------- 2 - Record (MODIFIED) (has child records) { 0:[Account]:STRING=[456]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Credit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-15]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[5072.1]:BigDecimal }]]:ArrayValue }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Debit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-12]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[1997]:BigDecimal }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-14]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[15247.89]:BigDecimal }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-16]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[5109.07]:BigDecimal }]]:ArrayValue }]]:ArrayValue }]]:ArrayValue } ----------------------------------------------- 3 - Record (MODIFIED) (has child records) { 0:[Account]:STRING=[789]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Credit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-14]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[11314]:BigDecimal }]]:ArrayValue }]]:ArrayValue }, Record (MODIFIED) (is child record) (has child records) { 0:[Transaction Type]:STRING=[Debit]:String 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) (has child records) { 0:[Date]:DATE=[2017-01-19]:Date 1:[data]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[Amount]:BIG_DECIMAL=[3588.14]:BigDecimal }]]:ArrayValue }]]:ArrayValue }]]:ArrayValue } ----------------------------------------------- 4 records