Find The Minimum Maximum Average Sum Count
Updated: Jun 20, 2023
This example is designed to provide statistical information about a dataset, including counts, maximum values, minimum values, averages, and sums of values. It allows users to gain valuable insights into their data by analyzing key statistical metrics and measures.
The library can assist in assessing the quality of data by providing statistical measures. Users can use metrics such as count to verify data completeness, maximum and minimum to identify outliers, and averages and sum to validate data accuracy. This helps in identifying and addressing data anomalies or inconsistencies.
Input CSV file
Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating 101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A 312,Butler,Gerard,90.00,1000.00,8/6/2003,B 868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B 761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A 317,Murray,Bill,789.65,5000.00,2/5/2007,C
Java Code Listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.group.GroupByReader; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.transform.BasicFieldTransformer; import com.northconcepts.datapipeline.transform.TransformingReader; public class FindTheMinimumMaximumAverageSumCount { public static void main(String[] args) throws Throwable { DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv")) .setFieldNamesInFirstRow(true); // CSV fields are strings, convert them to double reader = new TransformingReader(reader) .add(new BasicFieldTransformer("Balance", "CreditLimit").stringToDouble()); GroupByReader groupByReader = new GroupByReader(reader, "Rating") .count("Account") .max("Balance", "MaxBalance") .min("Balance", "MinBalance") .avg("Balance", "AvgBalance") .sum("Balance", "SumBalance") .avg("CreditLimit", "AvgCreditLimit"); Job.run(groupByReader, new StreamWriter(System.out)); /* input csv data Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating 101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A 312,Butler,Gerard,90.00,1000.00,8/6/2003,B 868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B 761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A 317,Murray,Bill,789.65,5000.00,2/5/2007,C */ /* output should display 3 records as the csv data are grouped according to "Rating" (A,B,C). 1st record, group "A" displays Account: 2 via count(), MaxBalance equals Jada's record, MinBalance equals Keanu's record, AvgBalance: the average of the 2 Balances, SumBalance: sum of the 2 Balances, AvgCreditLimit: average of Keanu and Jada's CreditLimit */ } }
Code walkthrough
- CSVReader is created corresponding to the input file
credit-balance-01.csv
. - The
CSVReader.setFieldNamesInFirstRow(true)
method is invoked to specify that the names specified in the first row should be used as field names. - A TransformingReader is created to apply one or more transformations to the incoming data sequentially.
- In order to do calculations for the given data, the type of values should be converted from String to Double. BasicFieldTransformer object and its
stringToDouble()
method are used for that purpose. In the given example, values of the "Balance" and "CreditLimit" fields are converted into double type. - The GroupByReader object is used to divide records into groups and apply summary operations to each group. It works similarly to "group by" in SQL but is applied to streaming data. Records are grouped by the "Rating" field in the example.
- The
count()
method returns the total number of records in the grouped dataset and stores it in the "Account" field of the output. - The following operations are done in the dataset:
max()
for finding the maximum valuemin()
for finding the minimum valueavg()
for calculating the average valuesum()
for calculating the sum of values
- These methods accept two arguments:
sourceFieldName
pointing to the field in which calculations are done, andtargetFieldName
pointing to the field to store the output value. - Finally, data is transferred from the
groupByReader
to theStreamWriter(System.out)
via Job.run() method.
Output
----------------------------------------------- 0 - Record (MODIFIED) { 0:[Rating]:STRING=[A]:String 1:[Account]:LONG=[2]:Long 2:[MaxBalance]:DOUBLE=[49654.87]:Double 3:[MinBalance]:DOUBLE=[9315.45]:Double 4:[AvgBalance]:BIG_DECIMAL=[29485.16]:BigDecimal 5:[SumBalance]:BIG_DECIMAL=[58970.32]:BigDecimal 6:[AvgCreditLimit]:BIG_DECIMAL=[55000]:BigDecimal } ----------------------------------------------- 1 - Record (MODIFIED) { 0:[Rating]:STRING=[B]:String 1:[Account]:LONG=[2]:Long 2:[MaxBalance]:DOUBLE=[90.0]:Double 3:[MinBalance]:DOUBLE=[0.0]:Double 4:[AvgBalance]:BIG_DECIMAL=[45]:BigDecimal 5:[SumBalance]:BIG_DECIMAL=[90]:BigDecimal 6:[AvgCreditLimit]:BIG_DECIMAL=[9000]:BigDecimal } ----------------------------------------------- 2 - Record (MODIFIED) { 0:[Rating]:STRING=[C]:String 1:[Account]:LONG=[1]:Long 2:[MaxBalance]:DOUBLE=[789.65]:Double 3:[MinBalance]:DOUBLE=[789.65]:Double 4:[AvgBalance]:BIG_DECIMAL=[789.65]:BigDecimal 5:[SumBalance]:BIG_DECIMAL=[789.65]:BigDecimal 6:[AvgCreditLimit]:BIG_DECIMAL=[5000]:BigDecimal } ----------------------------------------------- 3 records