How To Transfer Columns From One CSV File Into Another Using Java

This blog will show you how to pull selected columns from a CSV file containing IP geolocation data and save them into a second CSV file using our Data Pipeline Java library.  As part of the transformation, you’ll also have the option to rearrange the order of the resulting columns.

IP Geolocation CSV file

The code reads an input CSV file created from the free IP geolocation database at http://dev.maxmind.com/geoip/legacy/geolite/.

Column Extraction Code

The Java code below creates a new CSV file containing the last two columns from the input CSV file.

CSVReader

A new CSVReader is created to parse the GeoliteDataInput.csv input file.  The setFieldNamesInFirstRow(false) method call causes CSVReader to create synthetic names for each column starting with “A”, “B”, “C”, etc.  Similar to MS Excel’s column naming.  If true had been passed in, then the actual column names in the file would have been used.

Select Fields Data Transformation

A TransformingReader is wrapped around the initial CSVReader to allow the data to be manipulated on-the-fly. The actual modifications are specified by adding Transformer subclasses — IncludeFields in this case —  to the TransformingReader.

The IncludeFields class specifies which columns from the input CSV file need to be included in the output CSV file.  All other columns are discarded from this point forward.  IncludeFields also arranges the columns in the specified order.  Passing in “F”, then “E” would have swapped the country name and country code columns.

ExcludeFields

An alternative to selecting the fields to retain with IncludeFields would be to specify the fields to drop using ExcludeFields.  Unlike IncludeFields, this transformer cannot rearrange columns.  Deciding which to use usually comes down to which is easier — whitelisting or blacklisting — and whether or not columns need to be rearranged.

CSVWriter

A CSVWriter is created to generate the new GeoliteDataOutput.csv output file.  The setFieldNamesInFirstRow(false) method is called in this case to to prevent CSVWriter from saving the synthetic field names added during the read step.

Running the Transfer Job

The actual data is transfered from the reader to writer using the default implementation of JobTemplate.

 Output CSV file

Download Data Pipeline

You can get started with Data Pipeline by downloading it and reviewing the getting started page.

You can also view additional examples by visiting https://northconcepts.com/data-pipeline/examples/.

About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

5 thoughts on “How To Transfer Columns From One CSV File Into Another Using Java

  • By haneef - Reply

    This above code generate csv file with limit filesize .Ex:-i am giving input file size=21.4 MB
    but it generates output file 224kb only.how to resolve this.

    • By Dele Taylor - Reply

      Haneef, I’ll follow-up with you over email to get more info on what you need.

  • Pingback: How To Transfer Columns From One CSV File Into Another Using Java | Dinesh Ram Kali.

  • By dia - Reply

    when I am adding this code in eclipse I am getting errors for all import statements
    like import ********** can not be resolved can any one help to resolve this issue plz ASAP

    • By Dele Taylor - Reply

      Dia — you’ll need to download our Data Pipeline library to resolve the imports. Cheers.

Leave a Reply

Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">