Data Pipeline 3.1.4 Now Available

Data Pipeline v3.1.4 is now available for download. This release includes support for MySQL upserts, lower JSON and XML memory usage, bug fixes, and more.

Download the release here

CSV Writer

CSVWriter now supports different starting and ending quote string via setStartingQuote() and setEndingQuote()
You can now change how nulls and empty strings are written using the new IValuePolicy/ValuePolicy types
- See CSVWriter.setNullValuePolicy()
- See CSVWriter.setEmptyStringValuePolicy()
The forceQuote flag now makes it possible to force all values to be quoted
The fieldSeparator and quote characters have been updated from char to String

Excel Documents

The Excel providers (PoiProvider, PoiXssfProvider, and JxlProvider) now guard against being used concurrently by multiple readers
The JavaDocs now clarify that ExcelDocument is not thread safe, but can be read and re-read very quickly since it uses an in-memory buffer

JDBC

MySQL upserts (INSERT … ON DUPLICATE KEY UPDATE) are now supported by passing com.northconcepts.datapipeline.jdbc.upsert.MySqlUpsert to JdbcUpsertWriter
JdbcWriter and JdbcUpsertWriter now support commits after each batch is sent to the database using setCommitBatch()

XML, JSON, Java Beans

added isAddTextToParent() and setAddTextToParent() to XmlReader, JsonReader, and JavaBeanReader to indicate if each child node’s text should be concatenated to its parent during parsing (now defaults to false). Setting this to true will increase your memory usage.

Text Writers

The following text-based writers now explicitly support file appending. When the new append flag is set, these writers will no longer write the header line if a non-zero length file already exists.
- CSVWriter, FixedWidthWriter, TextWriter, LinedTextWriter
the text writers now include a couple new flags:
- autoCloseWriter – indicate if the underlying java.io.BufferedWriter should be closed when the writer is closed (defaults to true)
- flushOnWrite – indicate if the underlying java.io.BufferedWriter should be flushed after each record is written (defaults to false)

Twitter

The Twitter package now includes the following new classes:

TwitterFilterStreamReader – continuously reads tweets matching several criteria (hashtag, user, etc.)
TwitterSampleStreamReader – continuously reads a small random sample of all tweets
TwitterFollowerIDsReader – reads the IDs of accounts following a specified account
TwitterFollowerListReader – reads details of accounts following a specified account
TwitterFollowingIDsReader – reads the IDs of accounts a specified account follows
TwitterFollowingListReader – reads the details of accounts a specified account follows
TwitterFollowWriter – follows accounts written to it
TwitterUnfollowWriter – unfollows accounts written to it

Bug Fixes

AsyncWriter now throws an exception in the next call to writeImpl(Record record) if the asynchronous writer thread failed
The default implementation of JobTemplate.transfer() now calls JobCallback.onFailure() if the transfer was successful, but the reader or writer failed to close

Misc

Added TeeReader to operate like tee in UNIX and write every record passing through it to a DataWriter
Various error message improvements

See the change log for more details: https://northconcepts.com/changelog/

Download Data Pipeline

Data Pipeline 3.1.4 Now Available

CSV Writer

Excel Documents

JDBC

XML, JSON, Java Beans

Text Writers

Twitter

Bug Fixes

Misc

About The DataPipeline Team

Leave a Reply Cancel reply

Data Pipeline

Docs

Company

Tools