Data Pipeline v3.1.4 is now available for download. This release includes support for MySQL upserts, lower JSON and XML memory usage, bug fixes, and more.
CSV Writer
- CSVWriter now supports different starting and ending quote string via
setStartingQuote()
andsetEndingQuote()
- You can now change how nulls and empty strings are written using the new
IValuePolicy
/ValuePolicy
types- See
CSVWriter.setNullValuePolicy()
- See
CSVWriter.setEmptyStringValuePolicy()
- See
- The
forceQuote
flag now makes it possible to force all values to be quoted - The
fieldSeparator
and quote characters have been updated from char to String
Excel Documents
- The Excel providers (PoiProvider, PoiXssfProvider, and JxlProvider) now guard against being used concurrently by multiple readers
- The JavaDocs now clarify that ExcelDocument is not thread safe, but can be read and re-read very quickly since it uses an in-memory buffer
JDBC
- MySQL upserts (INSERT … ON DUPLICATE KEY UPDATE) are now supported by passing
com.northconcepts.datapipeline.jdbc.upsert.MySqlUpsert
to JdbcUpsertWriter - JdbcWriter and JdbcUpsertWriter now support commits after each batch is sent to the database using
setCommitBatch()
XML, JSON, Java Beans
- added
isAddTextToParent()
andsetAddTextToParent()
to XmlReader, JsonReader, and JavaBeanReader to indicate if each child node’s text should be concatenated to its parent during parsing (now defaults to false). Setting this to true will increase your memory usage.
Text Writers
- The following text-based writers now explicitly support file appending. When the new append flag is set, these writers will no longer write the header line if a non-zero length file already exists.
- CSVWriter, FixedWidthWriter, TextWriter, LinedTextWriter
- the text writers now include a couple new flags:
- autoCloseWriter – indicate if the underlying java.io.BufferedWriter should be closed when the writer is closed (defaults to true)
- flushOnWrite – indicate if the underlying java.io.BufferedWriter should be flushed after each record is written (defaults to false)
The Twitter package now includes the following new classes:
- TwitterFilterStreamReader – continuously reads tweets matching several criteria (hashtag, user, etc.)
- TwitterSampleStreamReader – continuously reads a small random sample of all tweets
- TwitterFollowerIDsReader – reads the IDs of accounts following a specified account
- TwitterFollowerListReader – reads details of accounts following a specified account
- TwitterFollowingIDsReader – reads the IDs of accounts a specified account follows
- TwitterFollowingListReader – reads the details of accounts a specified account follows
- TwitterFollowWriter – follows accounts written to it
- TwitterUnfollowWriter – unfollows accounts written to it
Bug Fixes
- AsyncWriter now throws an exception in the next call to
writeImpl(Record record)
if the asynchronous writer thread failed - The default implementation of
JobTemplate.transfer()
now callsJobCallback.onFailure()
if the transfer was successful, but the reader or writer failed to close
Misc
- Added
TeeReader
to operate like tee in UNIX and write every record passing through it to a DataWriter - Various error message improvements
See the change log for more details: https://northconcepts.com/changelog/