Data Pipeline 3.1.4 Now Available

Data Pipeline v3.1.4 is now available for download.  This release includes support for MySQL upserts, lower JSON and XML memory usage, bug fixes, and more.

Download the release here


CSV Writer

  • CSVWriter now supports different starting and ending quote string via setStartingQuote() and setEndingQuote()
  • You can now change how nulls and empty strings are written using the new IValuePolicy/ValuePolicy types
    • See CSVWriter.setNullValuePolicy()
    • See CSVWriter.setEmptyStringValuePolicy()
  • The forceQuote flag now makes it possible to force all values to be quoted
  • The fieldSeparator and quote characters have been updated from char to String


Excel Documents

  • The Excel providers (PoiProvider, PoiXssfProvider, and JxlProvider) now guard against being used concurrently by multiple readers
  • The JavaDocs now clarify that ExcelDocument is not thread safe, but can be read and re-read very quickly since it uses an in-memory buffer


  • MySQL upserts (INSERT … ON DUPLICATE KEY UPDATE) are now supported by passing com.northconcepts.datapipeline.jdbc.upsert.MySqlUpsert to JdbcUpsertWriter
  • JdbcWriter and JdbcUpsertWriter now support commits after each batch is sent to the database using setCommitBatch()

XML, JSON, Java Beans

  • added isAddTextToParent() and setAddTextToParent() to XmlReader, JsonReader, and JavaBeanReader to indicate if each child node’s text should be concatenated to its parent during parsing (now defaults to false).  Setting this to true will increase your memory usage.

Text Writers

  • The following text-based writers now explicitly support file appending.  When the new append flag is set, these writers will no longer write the header line if a non-zero length file already exists.
    • CSVWriter, FixedWidthWriter, TextWriter, LinedTextWriter
  • the text writers now include a couple new flags:
    • autoCloseWriter – indicate if the underlying should be closed when the writer is closed (defaults to true)
    • flushOnWrite – indicate if the underlying should be flushed after each record is written (defaults to false)


The Twitter package now includes the following new classes:

  • TwitterFilterStreamReader – continuously reads tweets matching several criteria (hashtag, user, etc.)
  • TwitterSampleStreamReader – continuously reads a small random sample of all tweets
  • TwitterFollowerIDsReader – reads the IDs of accounts following a specified account
  • TwitterFollowerListReader – reads details of accounts following a specified account
  • TwitterFollowingIDsReader – reads the IDs of accounts a specified account follows
  • TwitterFollowingListReader – reads the details of accounts a specified account follows
  • TwitterFollowWriter – follows accounts written to it
  • TwitterUnfollowWriter – unfollows accounts written to it

Bug Fixes

  • AsyncWriter now throws an exception in the next call to writeImpl(Record record) if the asynchronous writer thread failed
  • The default implementation of JobTemplate.transfer() now calls JobCallback.onFailure() if the transfer was successful, but the reader or writer failed to close


  • Added TeeReader to operate like tee in UNIX and write every record passing through it to a DataWriter
  • Various error message improvements


See the change log for more details:


Download Data Pipeline


About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at

Leave a Reply

Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">