This blog will show you how to pull selected columns from a CSV file containing IP geolocation data and save them into a second CSV file using our Data Pipeline Java library. As part of the transformation, you’ll also have the option to rearrange the order of the resulting columns.
IP Geolocation CSV file
The code reads an input CSV file created from the free IP geolocation database at http://dev.maxmind.com/geoip/legacy/geolite/.
1 2 3 4 5 6 7 8 |
1.0.0.0,1.0.0.255,16777216,16777471,AU,Australia 1.0.1.0,1.0.3.255,16777472,16778239,CN,China 1.0.4.0,1.0.7.255,16778240,16779263,AU,Australia 1.0.8.0,1.0.15.255,16779264,16781311,CN,China 1.0.16.0,1.0.31.255,16781312,16785407,JP,Japan 1.0.32.0,1.0.63.255,16785408,16793599,CN,China 1.0.64.0,1.0.127.255,16793600,16809983,JP,Japan 1.0.128.0,1.0.255.255,16809984,16842751,TH,Thailand |
Column Extraction Code
The Java code below creates a new CSV file containing the last two columns from the input CSV file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import com.northconcepts.datapipeline.core.DataEndpoint; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.csv.CSVWriter; import com.northconcepts.datapipeline.job.JobTemplate; import com.northconcepts.datapipeline.transform.IncludeFields; import com.northconcepts.datapipeline.transform.TransformingReader; public class SelectivelyTransferCsvFile2 { public static void main(String[] args) { //Read the input CSV file DataReader reader = new CSVReader(new File( "example/data/input/GeoliteDataInput.csv")) .setFieldNamesInFirstRow(false); //Add a transforming reader to only include the necessary fields TransformingReader transformingReader = new TransformingReader(reader); transformingReader.add(new IncludeFields("E","F")); //Create output CSV file DataWriter writer = new CSVWriter(new File("example/data/output/GeoliteDataOutput.csv")) .setFieldNamesInFirstRow(false); //Transfer data from the input to output file JobTemplate.DEFAULT.transfer(transformingReader, writer); } } |
CSVReader
A new CSVReader is created to parse the GeoliteDataInput.csv
input file. The setFieldNamesInFirstRow(false) method call causes CSVReader
to create synthetic names for each column starting with “A”, “B”, “C”, etc. Similar to MS Excel’s column naming. If true
had been passed in, then the actual column names in the file would have been used.
1 2 |
DataReader reader = new CSVReader(new File("example/data/input/GeoliteDataInput.csv")) .setFieldNamesInFirstRow(false); |
Select Fields Data Transformation
A TransformingReader is wrapped around the initial CSVReader
to allow the data to be manipulated on-the-fly. The actual modifications are specified by adding Transformer subclasses — IncludeFields in this case — to the TransformingReader
.
1 2 |
TransformingReader transformingReader = new TransformingReader(reader); transformingReader.add(new IncludeFields("E","F")); |
The IncludeFields
class specifies which columns from the input CSV file need to be included in the output CSV file. All other columns are discarded from this point forward. IncludeFields
also arranges the columns in the specified order. Passing in “F”, then “E” would have swapped the country name and country code columns.
ExcludeFields
An alternative to selecting the fields to retain with IncludeFields
would be to specify the fields to drop using ExcludeFields. Unlike IncludeFields
, this transformer cannot rearrange columns. Deciding which to use usually comes down to which is easier — whitelisting or blacklisting — and whether or not columns need to be rearranged.
CSVWriter
A CSVWriter is created to generate the new GeoliteDataOutput.csv
output file. The setFieldNamesInFirstRow(false) method is called in this case to to prevent CSVWriter
from saving the synthetic field names added during the read step.
1 2 |
DataWriter writer = new CSVWriter(new File("example/data/output/GeoliteDataOutput.csv")) .setFieldNamesInFirstRow(false); |
Running the Transfer Job
The actual data is transfered from the reader to writer using the default implementation of JobTemplate.
1 |
JobTemplate.DEFAULT.transfer(transformingReader, writer); |
Output CSV file
1 2 3 4 5 6 7 8 |
AU,Australia CN,China AU,Australia CN,China JP,Japan CN,China JP,Japan TH,Thailand |
Download Data Pipeline
You can get started with Data Pipeline by downloading it and reviewing the getting started page.
You can also view additional examples by visiting https://northconcepts.com/data-pipeline/examples/.
This above code generate csv file with limit filesize .Ex:-i am giving input file size=21.4 MB
but it generates output file 224kb only.how to resolve this.
Haneef, I’ll follow-up with you over email to get more info on what you need.
Pingback: How To Transfer Columns From One CSV File Into Another Using Java | Dinesh Ram Kali.
when I am adding this code in eclipse I am getting errors for all import statements
like import ********** can not be resolved can any one help to resolve this issue plz ASAP
Dia — you’ll need to download our Data Pipeline library to resolve the imports. Cheers.