Obtain Statistics
This examples shows you how to obtain statistics about data transfer like total time for data transfer, number of records transferred, number of bytes transferred per second, etc. using the MeteredReader class.
This demo code transfers data from a CSV file to an Excel file via MeteredReader and then logs statistics regarding the data transfer. However, we can use MeteredReader to log statistics for other types of data transfer as well.
This example can easily be modified to show how to measure data being read and written.
Input CSV file
Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating 101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A 312,Butler,Gerard,90.00,1000.00,8/6/2003,B 868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B 761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A 317,Murray,Bill,789.65,5000.00,2/5/2007,C
Java Code Listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import org.apache.log4j.Logger; import com.northconcepts.datapipeline.core.DataEndpoint; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.excel.ExcelDocument; import com.northconcepts.datapipeline.excel.ExcelWriter; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.meter.Meter; import com.northconcepts.datapipeline.meter.MeteredReader; public class ObtainStatistics { public static final Logger log = DataEndpoint.log; public static void main(String[] args) throws Throwable { MeteredReader reader = new MeteredReader(new CSVReader(new File("example/data/input/credit-balance-01.csv")) .setFieldNamesInFirstRow(true)); ExcelDocument document = new ExcelDocument(); DataWriter writer = new ExcelWriter(document) .setSheetName("balance"); Job job = Job.run(reader, writer); document.save(new File("example/data/output/credit-balance-04.xls")); final long ended = System.currentTimeMillis(); Meter meter = reader.getMeter(); log.info("started: " + meter.getStarted()); log.info("ended: " + ended); log.info("running time: " + ((ended - meter.getStarted()) / 1000.0) + " seconds"); log.info("meter time: " + (meter.getElapsedTime() / 1000.0) + " seconds"); log.info("records read: " + meter.getCount()); log.info("rate: " + meter.getUnitsPerSecondAsString()); log.info("records transferred: " + job.getRecordsTransferred()); log.info("started on: " + job.getStartedOn()); log.info("finished on: " + job.getFinishedOn()); log.info("running time: " + job.getRunningTimeAsString()); } }
Code Walkthrough
- First, a MeteredReader is created using a
CSVReader corresponding to the input file
credit-balance-01.csv
. - ExcelDocument and ExcelWriter classes are created corresponding to the output file.
- Data is transferred from the CSV file to the Excel file via JobTemplate.DEFAULT.transfer.
- ExcelDocument is then saved.
- Statistics related to data transfer are logged via Meter instance.
Statistics logged
The following statistics are logged related to data transfer:
- Started time - The time at which reading started. This is displayed is milliseconds and can be easily converted to a proper date/time format. The meter.getStarted() method is used to retrieve this information.
- Ended - The time at which the data transfer ended. Again, this is displayed in milliseconds. This is obtained by using System.currentTimeMillis().
- Running time - Total time required for data transfer. This is obtained as
endtime-starttime
and is converted to "seconds" for display purpose. - Meter time - This is the time required for reading or writing. It tracks the time the MeteredReader or MeteredWriter was open (time between
open()
andclose()
method calls). This is obtained via meter.getElapsedTime() - Records read - This is the number of records read. This is obtained via meter.getCount().
- Rate - This specifies the rate of data transfer. This is obtained via meter.getUnitsPerSecondAsString.
MeteredReader and MeteredWriter
MeteredReader and
MeteredWriter classes assist in recording statistics related to
data transfer. This is done by wrapping them around the respective reader
and writer
classes.
In this demo code, only MeteredReader is used to record statistics,
but we can use MeteredWriter instead of or in addition to
MeteredReader. In most cases, the number of records read and written
are the same and so it is not necessary to use both MeteredReader and
MeteredWriter. If a transfer includes records being removed, filtered out,
or added between the reader or writer, then the developer would need to decide if they wanted to track reads, writes, or both.
Metered Interface and Meter
The MeteredReader and MeteredWriter
classes implement the Metered interface. This interface defines a method
called getMeter which returns a
Meter object. Meter
is a generic counter which has the capability of recording statistics related to data transfer. In the Meter class, there are various constants like
K,
M,
G corresponding
to kilo,mega,giga
,etc which are nothing but multipliers for the unit of measurement. K
is th default multiplier.
RecordMeter
Both MeteredReader and MeteredWriter override the getMeter method to return an instance of RecordMeter which is a sub-class of Meter. RecordMeter provides additional functionality on top of Meter by defining a nested class called MeterUnit which defines 2 units of measurement: BYTES and RECORDS.
The default unit of measurement is BYTES but MeteredReader and MeteredWriter can override this by calling RecordMeter.setMeasure prior to the transfer. The actual record counting (or measurement) happens inside RecordMeter.add(Record) based on the units selected. The meter.getCount() method which is overridden by RecordMeter is used to return the number of records.
The meter.getUnitsPerSecondAsString
method which is overridden by RecordMeter returns the data transfer rate.
Since the default unit is BYTES
and default multilplier is KILO
, the data transfer rate is displayed as kilo-bytes/s
.
Console output
15:06:39,982 DEBUG [main] datapipeline:37 - Data Pipeline v2.3.3 by North Concepts Inc. 15:06:41,074 DEBUG [main] datapipeline:60 - job::Start 15:06:41,354 DEBUG [main] datapipeline:72 - job::Success 15:06:41,612 INFO [main] datapipeline:42 - started: 1402047400031 15:06:41,612 INFO [main] datapipeline:43 - ended: 1402047401612 15:06:41,613 INFO [main] datapipeline:44 - running time: 1.581 seconds 15:06:41,613 INFO [main] datapipeline:45 - meter time: 1.323 seconds 15:06:41,614 INFO [main] datapipeline:46 - records read: 696 15:06:41,614 INFO [main] datapipeline:47 - rate: 0.5 kilo-bytes/s