This blog will show you how to pull selected columns from a CSV file containing IP geolocation data and save them into a second CSV file using our Data Pipeline Java library. As part of the transformation, you’ll also have the option to rearrange the order of the resulting columns.
All posts by The DataPipeline Team
Data Pipeline 2.3 Now Available
- added streaming JSON reading and writing (simple and template based)
- added SimpleXmlWriter
- improved handling of recursive XML-to-records
- added user-definable demux strategies
- DeMuxReader is no longer a public class since it should not be reference directly
- improved exception handling in JdbcReader
- BUGFIX: JavaBeanReader now handles xpath for recursive text children
- updated Apache POI to v3.9
- IncludeFields & ExcludeFields now accept a collection of field names in their constructor and add method
- added JdbcReader.useColumnLabel property to allow fields to be named using the column labels (or aliases) instead of the underlying, real column names
- added Excel 2007 provider (POI_XSSF)
Excel handling now defaults to the Apache POI_XSSF (Excel 2007) provider, instead of POI (Excel 2003) - added FixedWidthField.align to allow left-filled (right aligned) fields
- added FixedWidthField.fillChar to allow fields to specify a different filler from their reader/writer
- reduced memory overhead for fields and records
- CSV performance improvements
- exception property values now truncated to 256 chars
- using StringBuilder (instead of StringBuffer) internally to improve performance
Data Pipeline 2.2.8 Now Available
- added TemplateWriter for writing text streams using FreeMarker templates
- added new examples for writing XML and HTML files using TemplateWriter
- BUGFIX: XmlWriter’s (XmlTemplate, File) constructor now calls setFieldNamesInFirstRow(false) by default
- BUGFIX: The JxlProvider now converts intervals and user-defined types to string when generating Excel files
- Intervals are no longer converted to strings when added to a field/record
- BasicFieldTransformer can now convert numbers to intervals (seconds, months, days, minutes, etc.)
- JdbcWriter now has public accessors for connection, tableName, batchMode, and jdbcTypes
- individual fields can now be removed from a FieldList
- FieldList can now accept collections of strings
- updated Apache POI to v3.8
Data Pipeline 2.2.7 Now Available
- added JdbcMultiWriter for multi-threaded writing to one or more database connections concurrently
- added multi-threaded AsyncWriter to compliment AsyncReader
- data writers now have an available() method to indicate the number of records that can probably be written without blocking
- MultiWriter now supports configurable write strategies (ReplicateWriteStrategy, RoundRobinWriteStrategy, AvailableCapacityWriteStrategy, and user defined)
- added support for CLOB fields (see JdbcValueReader.DEFAULT)
- Field and Record’s toString() methods now limit displayed strings to the first 128 characters
- RecordMeter is now public and returned by MeteredReader and MeteredWriter’s getMeter() method
- BUGFIX: record count is no longer off by 1 in some cases
Data Pipeline 2.2.6 Now Available
- performance improvements in CSV and fixed width handling
- untyped expression evaluation is now based on the value’s type, instead of the field’s declared type
- BUGFIX: now handles untyped expressions between primitive and object values
- float expressions are now upgraded to doubles during evaluation
- all non doubles and floats numbers are now upgraded to longs during evaluation
- expressions can now reference Java beans, not just primitive values
- method call expression now finds the most appropriate method based on the runtime argument types (http://en.wikipedia.org/wiki/Multiple_dispatch)
- improved handling for collections and arrays in DataException properties
- Apache PoiProvider can now distinguish between date, time, and datetimes fields in Excel
Data Pipeline 2.2.5 Now Available
- Added JavaBeanReader whice uses XPath expressions to identify field values and break records (see the Read from Java beans example)
- AbstractReader’s setStartingRow and setLastRow no return this
- Filter rule IsInstanceOfJavaType now returns false for null values
- Added number-to-date methods to BasicFieldTransformer (numberToDate(), minutesToDate(), hoursToDate(), and daysToDate())
- BasicFieldTransformer.Operation and BasicFieldTransformer.StringOperation are now public classes
- BasicFieldTransformer.add(Operation … operation) is now public
- ConditionalTransformer is now private (use TransformingReader.filter instead)
- TransformingReader now contains an optional Filter, allowing any transformer to be conditionally applied
- Removed TransformingReader.add(Filter filter, Transformer … transformer) method
Data Pipeline 2.2.3 Now Available
Data Pipeline 2.2.3 is now available with the following enhancements:
- added
JdbcValueReader
to allow clients to override column reading strategy - added the
JdbcReader.valueReader
property
Data Pipeline 2.2.2 Now Available
Data Pipeline 2.2.2 is now available with the following enhancements:
- added the
XmlTemplate
functionality XmlWriter
now usesXmlTemplate
to describe output patterns
Data Pipeline 2.2.1 Now Available
Data Pipeline 2.2.1 is now available with the following enhancements:
- added batch execution to
JdbcWriter
(seeJdbcWriter.setBatchSize
) - added callback mechanism to track job progress (see
JobTemplate.transfer(R reader, W writer, boolean async, JobCallback callback)
) - early access to
DeMuxReader
Data Pipeline 2.2 Released
We are happy to announce that Data Pipeline 2.2 is now released. We have added the enhancements below:
- added an XPath-based
XmlReader
- updates so that Excel now defaults to the Apache POI instead of JXL
- updates so that the following classes use
java.util.List
instead ofjava.util.ArrayList
in their public APIs:CompositeValue
,FieldList
,Lookup
,LookupTransformer
,Record
,RecordList
- « Previous
- 1
- 2
- 3
- 4
- Next »