All posts by The DataPipeline Team

About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

Data Pipeline 2.3 Now Available

  • added streaming JSON reading and writing (simple and template based)
  • added SimpleXmlWriter
  • improved handling of recursive XML-to-records
  • added user-definable demux strategies
  • DeMuxReader is no longer a public class since it should not be reference directly
  • improved exception handling in JdbcReader
  • BUGFIX: JavaBeanReader now handles xpath for recursive text children
  • updated Apache POI to v3.9
  • IncludeFields & ExcludeFields now accept a collection of field names in their constructor and add method
  • added JdbcReader.useColumnLabel property to allow fields to be named using the column labels (or aliases) instead of the underlying, real column names
  • added Excel 2007 provider (POI_XSSF)
    Excel handling now defaults to the Apache POI_XSSF (Excel 2007) provider, instead of POI (Excel 2003)
  • added FixedWidthField.align to allow left-filled (right aligned) fields
  • added FixedWidthField.fillChar to allow fields to specify a different filler from their reader/writer
  • reduced memory overhead for fields and records
  • CSV performance improvements
  • exception property values now truncated to 256 chars
  • using StringBuilder (instead of StringBuffer) internally to improve performance

Data Pipeline 2.2.8 Now Available

  • added TemplateWriter for writing text streams using FreeMarker templates
  • added new examples for writing XML and HTML files using TemplateWriter
  • BUGFIX: XmlWriter’s (XmlTemplate, File) constructor now calls setFieldNamesInFirstRow(false) by default
  • BUGFIX: The JxlProvider now converts intervals and user-defined types to string when generating Excel files
  • Intervals are no longer converted to strings when added to a field/record
  • BasicFieldTransformer can now convert numbers to intervals (seconds, months, days, minutes, etc.)
  • JdbcWriter now has public accessors for connection, tableName, batchMode, and jdbcTypes
  • individual fields can now be removed from a FieldList
  • FieldList can now accept collections of strings
  • updated Apache POI to v3.8

Data Pipeline 2.2.7 Now Available

  • added JdbcMultiWriter for multi-threaded writing to one or more database connections concurrently
  • added multi-threaded AsyncWriter to compliment AsyncReader
  • data writers now have an available() method to indicate the number of records that can probably be written without blocking
  • MultiWriter now supports configurable write strategies (ReplicateWriteStrategy, RoundRobinWriteStrategy, AvailableCapacityWriteStrategy, and user defined)
  • added support for CLOB fields (see JdbcValueReader.DEFAULT)
  • Field and Record’s toString() methods now limit displayed strings to the first 128 characters
  • RecordMeter is now public and returned by MeteredReader and MeteredWriter’s getMeter() method
  • BUGFIX: record count is no longer off by 1 in some cases

Data Pipeline 2.2.6 Now Available

  • performance improvements in CSV and fixed width handling
  • untyped expression evaluation is now based on the value’s type, instead of the field’s declared type
  • BUGFIX: now handles untyped expressions between primitive and object values
  • float expressions are now upgraded to doubles during evaluation
  • all non doubles and floats numbers are now upgraded to longs during evaluation
  • expressions can now reference Java beans, not just primitive values
  • method call expression now finds the most appropriate method based on the runtime argument types (http://en.wikipedia.org/wiki/Multiple_dispatch)
  • improved handling for collections and arrays in DataException properties
  • Apache PoiProvider can now distinguish between date, time, and datetimes fields in Excel

Data Pipeline 2.2.5 Now Available

  • Added JavaBeanReader whice uses XPath expressions to identify field values and break records (see the Read from Java beans example)
  • AbstractReader’s setStartingRow and setLastRow no return this
  • Filter rule IsInstanceOfJavaType now returns false for null values
  • Added number-to-date methods to BasicFieldTransformer (numberToDate(), minutesToDate(), hoursToDate(), and daysToDate())
  • BasicFieldTransformer.Operation and BasicFieldTransformer.StringOperation are now public classes
  • BasicFieldTransformer.add(Operation … operation) is now public
  • ConditionalTransformer is now private (use TransformingReader.filter instead)
  • TransformingReader now contains an optional Filter, allowing any transformer to be conditionally applied
  • Removed TransformingReader.add(Filter filter, Transformer … transformer) method

Data Pipeline 2.2.1 Now Available

Data Pipeline 2.2.1 is now available with the following enhancements:

  • added batch execution to JdbcWriter (see JdbcWriter.setBatchSize)
  • added callback mechanism to track job progress (see JobTemplate.transfer(R reader, W writer, boolean async, JobCallback callback))
  • early access to DeMuxReader

Data Pipeline 2.2 Released

We are happy to announce that Data Pipeline 2.2 is now released. We have added the enhancements below:

  • added an XPath-based XmlReader
  • updates so that Excel now defaults to the Apache POI instead of JXL
  • updates so that the following classes use java.util.List instead of java.util.ArrayList in their public APIs: CompositeValue, FieldList, Lookup, LookupTransformer, Record, RecordList

Data Pipeline 2.1 Released

We are happy to announce that Data Pipeline 2.1 is now released. We have added a great deal of enhancements below:

  • support for Excel 2003 XLS files
  • support for Excel XLSX (XML format) files
  • added FixedWidthReader.setLastFieldConsumesRemaining(boolean lastFieldConsumesRemaining) functionality
  • added ExcelReader.setUseSheetColumnCount(boolean useSheetColumnCount) functionality
  • added more string utils to BasicFieldTransformer
  • added ConditionalTransformer class
  • added TransformingReader.add(Filter filter, Transformer ... transformer) functionality
  • setField now has type-specific constructors
  • added Eclipse project files
  • added Ant build project
  • bug-fix: whitespace (like tab) can now be used as the field separator in CSVReader
  • bug-fix: handle null variable names in expressions