Find The Minimum Maximum Average Sum Count

This example is designed to provide statistical information about a dataset, including counts, maximum values, minimum values, averages, and sums of values. It allows users to gain valuable insights into their data by analyzing key statistical metrics and measures.

The library can assist in assessing the quality of data by providing statistical measures. Users can use metrics such as count to verify data completeness, maximum and minimum to identify outliers, and averages and sum to validate data accuracy. This helps in identifying and addressing data anomalies or inconsistencies.

Input CSV file

Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating
101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A
312,Butler,Gerard,90.00,1000.00,8/6/2003,B
868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B
761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A
317,Murray,Bill,789.65,5000.00,2/5/2007,C

Java Code Listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.group.GroupByReader;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.transform.BasicFieldTransformer;
import com.northconcepts.datapipeline.transform.TransformingReader;

public class FindTheMinimumMaximumAverageSumCount {

    public static void main(String[] args) throws Throwable {
        DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv"))
            .setFieldNamesInFirstRow(true);
        
        // CSV fields are strings, convert them to double
        reader = new TransformingReader(reader)
            .add(new BasicFieldTransformer("Balance", "CreditLimit").stringToDouble());
        
        GroupByReader groupByReader = new GroupByReader(reader, "Rating")
            .count("Account")
            .max("Balance", "MaxBalance")
            .min("Balance", "MinBalance")
            .avg("Balance", "AvgBalance")
            .sum("Balance", "SumBalance")
            .avg("CreditLimit", "AvgCreditLimit");
        
        Job.run(groupByReader, new StreamWriter(System.out));
        
/* input csv data
Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating
101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A
312,Butler,Gerard,90.00,1000.00,8/6/2003,B
868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B
761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A
317,Murray,Bill,789.65,5000.00,2/5/2007,C
*/
/* output 
should display 3 records as the csv data are grouped according to "Rating" (A,B,C).
1st record, group "A" displays Account: 2 via count(), MaxBalance equals Jada's record,
MinBalance equals Keanu's record, AvgBalance: the average of the 2 Balances, SumBalance: sum of
the 2 Balances, AvgCreditLimit: average of Keanu and Jada's CreditLimit
*/
    }
}

Code walkthrough

  1. CSVReader is created corresponding to the input file credit-balance-01.csv.
  2. The CSVReader.setFieldNamesInFirstRow(true) method is invoked to specify that the names specified in the first row should be used as field names.
  3. A TransformingReader is created to apply one or more transformations to the incoming data sequentially.
  4. In order to do calculations for the given data, the type of values should be converted from String to Double. BasicFieldTransformer object and its stringToDouble() method are used for that purpose. In the given example, values of the "Balance" and "CreditLimit" fields are converted into double type. 
  5. The GroupByReader object is used to divide records into groups and apply summary operations to each group. It works similarly to "group by" in SQL but is applied to streaming data. Records are grouped by the "Rating" field in the example.
  6. The count() method returns the total number of records in the grouped dataset and stores it in the "Account" field of the output.
  7. The following operations are done in the dataset:
    1. max() for finding the maximum value
    2. min() for finding the minimum value
    3. avg() for calculating the average value
    4. sum() for calculating the sum of values
  8. These methods accept two arguments: sourceFieldName pointing to the field in which calculations are done, and targetFieldName pointing to the field to store the output value. 
  9. Finally, data is transferred from the groupByReader to the StreamWriter(System.out) via Job.run() method.

Output

-----------------------------------------------
0 - Record (MODIFIED) {
    0:[Rating]:STRING=[A]:String
    1:[Account]:LONG=[2]:Long
    2:[MaxBalance]:DOUBLE=[49654.87]:Double
    3:[MinBalance]:DOUBLE=[9315.45]:Double
    4:[AvgBalance]:BIG_DECIMAL=[29485.16]:BigDecimal
    5:[SumBalance]:BIG_DECIMAL=[58970.32]:BigDecimal
    6:[AvgCreditLimit]:BIG_DECIMAL=[55000]:BigDecimal
}

-----------------------------------------------
1 - Record (MODIFIED) {
    0:[Rating]:STRING=[B]:String
    1:[Account]:LONG=[2]:Long
    2:[MaxBalance]:DOUBLE=[90.0]:Double
    3:[MinBalance]:DOUBLE=[0.0]:Double
    4:[AvgBalance]:BIG_DECIMAL=[45]:BigDecimal
    5:[SumBalance]:BIG_DECIMAL=[90]:BigDecimal
    6:[AvgCreditLimit]:BIG_DECIMAL=[9000]:BigDecimal
}

-----------------------------------------------
2 - Record (MODIFIED) {
    0:[Rating]:STRING=[C]:String
    1:[Account]:LONG=[1]:Long
    2:[MaxBalance]:DOUBLE=[789.65]:Double
    3:[MinBalance]:DOUBLE=[789.65]:Double
    4:[AvgBalance]:BIG_DECIMAL=[789.65]:BigDecimal
    5:[SumBalance]:BIG_DECIMAL=[789.65]:BigDecimal
    6:[AvgCreditLimit]:BIG_DECIMAL=[5000]:BigDecimal
}

-----------------------------------------------
3 records
Mobile Analytics