Obtain Statistics

Updated: Feb 21, 2022

This examples shows you how to obtain statistics about data transfer like total time for data transfer, number of records transferred, number of bytes transferred per second, etc. using the MeteredReader class.

This demo code transfers data from a CSV file to an Excel file via MeteredReader and then logs statistics regarding the data transfer. However, we can use MeteredReader to log statistics for other types of data transfer as well.

This example can easily be modified to show how to measure data being read and written.

Input CSV file

Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating
101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A
312,Butler,Gerard,90.00,1000.00,8/6/2003,B
868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B
761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A
317,Murray,Bill,789.65,5000.00,2/5/2007,C

Java Code Listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import org.apache.log4j.Logger;

import com.northconcepts.datapipeline.core.DataEndpoint;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.excel.ExcelDocument;
import com.northconcepts.datapipeline.excel.ExcelWriter;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.meter.Meter;
import com.northconcepts.datapipeline.meter.MeteredReader;

public class ObtainStatistics {
    
    public static final Logger log = DataEndpoint.log; 

    public static void main(String[] args) throws Throwable {
        MeteredReader reader = new MeteredReader(new CSVReader(new File("example/data/input/credit-balance-01.csv"))
            .setFieldNamesInFirstRow(true));
        
        ExcelDocument document = new ExcelDocument();
        DataWriter writer = new ExcelWriter(document)
            .setSheetName("balance");

        Job job = Job.run(reader, writer);
        document.save(new File("example/data/output/credit-balance-04.xls"));
        
        final long ended = System.currentTimeMillis();
        
        Meter meter = reader.getMeter();
        
        log.info("started: " + meter.getStarted());
        log.info("ended: " + ended);
        log.info("running time: " + ((ended - meter.getStarted()) / 1000.0) + " seconds");
        log.info("meter time: " + (meter.getElapsedTime() / 1000.0) + " seconds");
        log.info("records read: " + meter.getCount());
        log.info("rate: " + meter.getUnitsPerSecondAsString());
        log.info("records transferred: " + job.getRecordsTransferred());
        log.info("started on: " + job.getStartedOn());
        log.info("finished on: " + job.getFinishedOn());
        log.info("running time: " + job.getRunningTimeAsString());
    }

}

Code Walkthrough

  1. First, a MeteredReader is created using a CSVReader corresponding to the input file credit-balance-01.csv.
  2. ExcelDocument and ExcelWriter classes are created corresponding to the output file.
  3. Data is transferred from the CSV file to the Excel file via JobTemplate.DEFAULT.transfer.
  4. ExcelDocument is then saved.
  5. Statistics related to data transfer are logged via Meter instance.

Statistics logged

The following statistics are logged related to data transfer:

  1. Started time - The time at which reading started. This is displayed is milliseconds and can be easily converted to a proper date/time format. The meter.getStarted() method is used to retrieve this information.
  2. Ended - The time at which the data transfer ended. Again, this is displayed in milliseconds. This is obtained by using System.currentTimeMillis().
  3. Running time - Total time required for data transfer. This is obtained as endtime-starttime and is converted to "seconds" for display purpose.
  4. Meter time - This is the time required for reading or writing. It tracks the time the MeteredReader or MeteredWriter was open (time between open() and close() method calls). This is obtained via meter.getElapsedTime()
  5. Records read - This is the number of records read. This is obtained via meter.getCount().
  6. Rate - This specifies the rate of data transfer. This is obtained via meter.getUnitsPerSecondAsString.

MeteredReader and MeteredWriter

MeteredReader and MeteredWriter classes assist in recording statistics related to data transfer. This is done by wrapping them around the respective reader and writer classes. In this demo code, only MeteredReader is used to record statistics, but we can use MeteredWriter instead of or in addition to MeteredReader. In most cases, the number of records read and written are the same and so it is not necessary to use both MeteredReader and MeteredWriter. If a transfer includes records being removed, filtered out, or added between the reader or writer, then the developer would need to decide if they wanted to track reads, writes, or both.

Metered Interface and Meter

The MeteredReader and MeteredWriter classes implement the Metered interface. This interface defines a method called getMeter which returns a Meter object. Meter is a generic counter which has the capability of recording statistics related to data transfer. In the Meter class, there are various constants like K, M, G corresponding to kilo,mega,giga,etc which are nothing but multipliers for the unit of measurement. K is th default multiplier.

RecordMeter

Both MeteredReader and MeteredWriter override the getMeter method to return an instance of RecordMeter which is a sub-class of Meter. RecordMeter provides additional functionality on top of Meter by defining a nested class called MeterUnit which defines 2 units of measurement: BYTES and RECORDS.

The default unit of measurement is BYTES but MeteredReader and MeteredWriter can override this by calling RecordMeter.setMeasure prior to the transfer. The actual record counting (or measurement) happens inside RecordMeter.add(Record) based on the units selected. The meter.getCount() method which is overridden by RecordMeter is used to return the number of records.

The meter.getUnitsPerSecondAsString method which is overridden by RecordMeter returns the data transfer rate. Since the default unit is BYTES and default multilplier is KILO, the data transfer rate is displayed as kilo-bytes/s.

Console output

15:06:39,982 DEBUG [main] datapipeline:37 - Data Pipeline v2.3.3 by North Concepts Inc.
15:06:41,074 DEBUG [main] datapipeline:60 - job::Start
15:06:41,354 DEBUG [main] datapipeline:72 - job::Success
15:06:41,612  INFO [main] datapipeline:42 - started: 1402047400031
15:06:41,612  INFO [main] datapipeline:43 - ended: 1402047401612
15:06:41,613  INFO [main] datapipeline:44 - running time: 1.581 seconds
15:06:41,613  INFO [main] datapipeline:45 - meter time: 1.323 seconds
15:06:41,614  INFO [main] datapipeline:46 - records read: 696
15:06:41,614  INFO [main] datapipeline:47 - rate: 0.5 kilo-bytes/s
Mobile Analytics