All posts by The DataPipeline Team

About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com. View all posts by The DataPipeline Team →

Data Pipeline 3.0 Now Available

We’re pleased to announce the release of version 3.0 of our Data Pipeline engine.

This release includes the new Sliding Window Aggregations feature to perform continuous SQL group-by operations on streaming data.

We’ve improved the performance of the XPath based readers (JsonReader, XmlReader, and JavaBeanReader), included new conveniences to reduce your code size, and added several new transformers and filters.

We’re also now offering a free 30-day trial for you to take the premium and enterprise features out for a test drive.

Continue reading →

Data Pipeline 2.3.4 Now Available

A new release of Data Pipeline is now available for download: https://northconcepts.com/downloads/. This release includes a new Twitter search reader, custom aggregate operations, and much more.

Continue reading →

How To Transfer Columns From One CSV File Into Another Using Java

Posted On 24 May 2014
By The DataPipeline Team
In CSV, Data Pipeline, Geolocation, Java
View all 5 comments

This blog will show you how to pull selected columns from a CSV file containing IP geolocation data and save them into a second CSV file using our Data Pipeline Java library. As part of the transformation, you’ll also have the option to rearrange the order of the resulting columns.

Continue reading →

Data Pipeline 2.3 Now Available

Posted On 2 Aug 2013
By The DataPipeline Team
In News
Leave a comment

added streaming JSON reading and writing (simple and template based)
added SimpleXmlWriter
improved handling of recursive XML-to-records
added user-definable demux strategies
DeMuxReader is no longer a public class since it should not be reference directly
improved exception handling in JdbcReader
BUGFIX: JavaBeanReader now handles xpath for recursive text children
updated Apache POI to v3.9
IncludeFields & ExcludeFields now accept a collection of field names in their constructor and add method
added JdbcReader.useColumnLabel property to allow fields to be named using the column labels (or aliases) instead of the underlying, real column names
added Excel 2007 provider (POI_XSSF)
Excel handling now defaults to the Apache POI_XSSF (Excel 2007) provider, instead of POI (Excel 2003)
added FixedWidthField.align to allow left-filled (right aligned) fields
added FixedWidthField.fillChar to allow fields to specify a different filler from their reader/writer
reduced memory overhead for fields and records
CSV performance improvements
exception property values now truncated to 256 chars
using StringBuilder (instead of StringBuffer) internally to improve performance

Data Pipeline 2.2.8 Now Available

Posted On 28 Nov 2012
By The DataPipeline Team
In News
Leave a comment

added TemplateWriter for writing text streams using FreeMarker templates
added new examples for writing XML and HTML files using TemplateWriter
BUGFIX: XmlWriter’s (XmlTemplate, File) constructor now calls setFieldNamesInFirstRow(false) by default
BUGFIX: The JxlProvider now converts intervals and user-defined types to string when generating Excel files
Intervals are no longer converted to strings when added to a field/record
BasicFieldTransformer can now convert numbers to intervals (seconds, months, days, minutes, etc.)
JdbcWriter now has public accessors for connection, tableName, batchMode, and jdbcTypes
individual fields can now be removed from a FieldList
FieldList can now accept collections of strings
updated Apache POI to v3.8

Data Pipeline 2.2.7 Now Available

Posted On 14 Jul 2012
By The DataPipeline Team
In News
Leave a comment

added JdbcMultiWriter for multi-threaded writing to one or more database connections concurrently
added multi-threaded AsyncWriter to compliment AsyncReader
data writers now have an available() method to indicate the number of records that can probably be written without blocking
MultiWriter now supports configurable write strategies (ReplicateWriteStrategy, RoundRobinWriteStrategy, AvailableCapacityWriteStrategy, and user defined)
added support for CLOB fields (see JdbcValueReader.DEFAULT)
Field and Record’s toString() methods now limit displayed strings to the first 128 characters
RecordMeter is now public and returned by MeteredReader and MeteredWriter’s getMeter() method
BUGFIX: record count is no longer off by 1 in some cases

Data Pipeline 2.2.6 Now Available

Posted On 22 Apr 2012
By The DataPipeline Team
In News
Leave a comment

performance improvements in CSV and fixed width handling
untyped expression evaluation is now based on the value’s type, instead of the field’s declared type
BUGFIX: now handles untyped expressions between primitive and object values
float expressions are now upgraded to doubles during evaluation
all non doubles and floats numbers are now upgraded to longs during evaluation
expressions can now reference Java beans, not just primitive values
method call expression now finds the most appropriate method based on the runtime argument types (http://en.wikipedia.org/wiki/Multiple_dispatch)
improved handling for collections and arrays in DataException properties
Apache PoiProvider can now distinguish between date, time, and datetimes fields in Excel

Data Pipeline 2.2.5 Now Available

Posted On 8 Jan 2012
By The DataPipeline Team
In News
Leave a comment

Added JavaBeanReader whice uses XPath expressions to identify field values and break records (see the Read from Java beans example)
AbstractReader’s setStartingRow and setLastRow no return this
Filter rule IsInstanceOfJavaType now returns false for null values
Added number-to-date methods to BasicFieldTransformer (numberToDate(), minutesToDate(), hoursToDate(), and daysToDate())
BasicFieldTransformer.Operation and BasicFieldTransformer.StringOperation are now public classes
BasicFieldTransformer.add(Operation … operation) is now public
ConditionalTransformer is now private (use TransformingReader.filter instead)
TransformingReader now contains an optional Filter, allowing any transformer to be conditionally applied
Removed TransformingReader.add(Filter filter, Transformer … transformer) method

Data Pipeline 2.2.3 Now Available

Posted On 14 May 2011
By The DataPipeline Team
In News

Data Pipeline 2.2.3 is now available with the following enhancements:

added JdbcValueReaderto allow clients to override column reading strategy
added the JdbcReader.valueReader property

Data Pipeline 2.2.2 Now Available

Posted On 11 May 2011
By The DataPipeline Team
In News

Data Pipeline 2.2.2 is now available with the following enhancements:

added the XmlTemplate functionality
XmlWriter now uses XmlTemplate to describe output patterns

All posts by The DataPipeline Team

About The DataPipeline Team

Data Pipeline 3.0 Now Available

Data Pipeline 2.3.4 Now Available

How To Transfer Columns From One CSV File Into Another Using Java

Data Pipeline 2.3 Now Available

Data Pipeline 2.2.8 Now Available

Data Pipeline 2.2.7 Now Available

Data Pipeline 2.2.6 Now Available

Data Pipeline 2.2.5 Now Available

Data Pipeline 2.2.3 Now Available

Data Pipeline 2.2.2 Now Available

Data Pipeline

Docs

Company

Tools