All posts by The DataPipeline Team

About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

Data Pipeline 3.0 Now Available

We’re pleased to announce the release of version 3.0 of our Data Pipeline engine.

This release includes the new Sliding Window Aggregations feature to perform continuous SQL group-by operations on streaming data.

We’ve improved the performance of the XPath based readers (JsonReader, XmlReader, and JavaBeanReader), included new conveniences to reduce your code size, and added several new transformers and filters.

We’re also now offering a free 30-day trial for you to take the premium and enterprise features out for a test drive.

Continue reading

Data Pipeline 2.3 Now Available

  • added streaming JSON reading and writing (simple and template based)
  • added SimpleXmlWriter
  • improved handling of recursive XML-to-records
  • added user-definable demux strategies
  • DeMuxReader is no longer a public class since it should not be reference directly
  • improved exception handling in JdbcReader
  • BUGFIX: JavaBeanReader now handles xpath for recursive text children
  • updated Apache POI to v3.9
  • IncludeFields & ExcludeFields now accept a collection of field names in their constructor and add method
  • added JdbcReader.useColumnLabel property to allow fields to be named using the column labels (or aliases) instead of the underlying, real column names
  • added Excel 2007 provider (POI_XSSF)
    Excel handling now defaults to the Apache POI_XSSF (Excel 2007) provider, instead of POI (Excel 2003)
  • added FixedWidthField.align to allow left-filled (right aligned) fields
  • added FixedWidthField.fillChar to allow fields to specify a different filler from their reader/writer
  • reduced memory overhead for fields and records
  • CSV performance improvements
  • exception property values now truncated to 256 chars
  • using StringBuilder (instead of StringBuffer) internally to improve performance

Data Pipeline 2.2.8 Now Available

  • added TemplateWriter for writing text streams using FreeMarker templates
  • added new examples for writing XML and HTML files using TemplateWriter
  • BUGFIX: XmlWriter’s (XmlTemplate, File) constructor now calls setFieldNamesInFirstRow(false) by default
  • BUGFIX: The JxlProvider now converts intervals and user-defined types to string when generating Excel files
  • Intervals are no longer converted to strings when added to a field/record
  • BasicFieldTransformer can now convert numbers to intervals (seconds, months, days, minutes, etc.)
  • JdbcWriter now has public accessors for connection, tableName, batchMode, and jdbcTypes
  • individual fields can now be removed from a FieldList
  • FieldList can now accept collections of strings
  • updated Apache POI to v3.8

Data Pipeline 2.2.7 Now Available

  • added JdbcMultiWriter for multi-threaded writing to one or more database connections concurrently
  • added multi-threaded AsyncWriter to compliment AsyncReader
  • data writers now have an available() method to indicate the number of records that can probably be written without blocking
  • MultiWriter now supports configurable write strategies (ReplicateWriteStrategy, RoundRobinWriteStrategy, AvailableCapacityWriteStrategy, and user defined)
  • added support for CLOB fields (see JdbcValueReader.DEFAULT)
  • Field and Record’s toString() methods now limit displayed strings to the first 128 characters
  • RecordMeter is now public and returned by MeteredReader and MeteredWriter’s getMeter() method
  • BUGFIX: record count is no longer off by 1 in some cases

Data Pipeline 2.2.6 Now Available

  • performance improvements in CSV and fixed width handling
  • untyped expression evaluation is now based on the value’s type, instead of the field’s declared type
  • BUGFIX: now handles untyped expressions between primitive and object values
  • float expressions are now upgraded to doubles during evaluation
  • all non doubles and floats numbers are now upgraded to longs during evaluation
  • expressions can now reference Java beans, not just primitive values
  • method call expression now finds the most appropriate method based on the runtime argument types (http://en.wikipedia.org/wiki/Multiple_dispatch)
  • improved handling for collections and arrays in DataException properties
  • Apache PoiProvider can now distinguish between date, time, and datetimes fields in Excel

Data Pipeline 2.2.5 Now Available

  • Added JavaBeanReader whice uses XPath expressions to identify field values and break records (see the Read from Java beans example)
  • AbstractReader’s setStartingRow and setLastRow no return this
  • Filter rule IsInstanceOfJavaType now returns false for null values
  • Added number-to-date methods to BasicFieldTransformer (numberToDate(), minutesToDate(), hoursToDate(), and daysToDate())
  • BasicFieldTransformer.Operation and BasicFieldTransformer.StringOperation are now public classes
  • BasicFieldTransformer.add(Operation … operation) is now public
  • ConditionalTransformer is now private (use TransformingReader.filter instead)
  • TransformingReader now contains an optional Filter, allowing any transformer to be conditionally applied
  • Removed TransformingReader.add(Filter filter, Transformer … transformer) method