Data Pipeline 2.3.4 Now Available

A new release of Data Pipeline is now available for download:  https://northconcepts.com/downloads/.  This release includes a new Twitter search reader, custom aggregate operations, and much more.

Twitter Search Reader

Data Pipeline now provides a reader for searching Twitter.  You can pull tweets into your pipelines for processing and/or conversion to Excel, database, or other formats.

See the TwitterSearchReader JavaDoc.

Aggregate Operations

You can now add your own on-the-fly aggregate operations.  This can be done by creating an instance of AggregateReader.AggregateOperation (anonymous or subclass) and then adding it to AggregateReader.add(AggregateOperation ... operations).  You’ll then be able to collect and compile stats about your data as it streams by.

XPath Debugging

All XPath-based readers (XmlReader, JsonReader, and JavaBeanReader) now support a debug flag.

Setting the flag forces the readers to log all XPaths they can see from your data stream in real-time.  Regardless of whether or not they match your field and break rules.

This allows you to quickly build new data extraction jobs or fix broken ones by seeing exactly what the engine sees.

Transformations

This release includes transformations to:

  • split datetime fields (BasicFieldTransformer.dateTimeToDate() and dateTimeToTime()).
  • set a default value on failure instead of throwing an exception (FieldTransformer.setValueOnException(Object valueOnException), getValueOnException(), and hasValueOnException()).
  • move fields around relative to other fields (added com.northconcepts.datapipeline.transform.MoveFieldBefore and MoveFieldAfter).

License

The software license has been updated to reflect the new plans and make the grant irrevocable (except as set forth in the termination provisions).  https://northconcepts.com/license/commercial/.

Other Changes

  • added RecordList.addAll(DataReader reader)
  • added RecordList.RecordList(DataReader reader) constructor
  • JavaBeanReader now returns node/field names separate from values; use “//firstName/text()” instead of “//firstName” for values
  • Field.getValueAsString() now returns the first and last bytes for BLOBS instead of the array’s default toString()
  • replaced “\n” with OS line separator in DataException.getPropertiesAsString() and getMessageWithProperties()
  • Record.clone now returns Record instead of Object
  • added Record.moveFieldBefore(String fieldName, String beforeFieldName) and moveFieldAfter(String columnName, String afterFieldName)
  • added CSVReader.getLineText() and getLineParser()
  • added filter rule: com.northconcepts.datapipeline.filter.rule.IsNull
  • added examples from blogs: com.northconcepts.datapipeline.examples.cookbook.blog.*

Bug Fixes

  • XPath engine now matches root and ancestor attributes
  • JobTemplateImpl no longer tries to reopen supplied endpoints if they are already open
  • DataReaderLookup no longer tries to reopen supplied endpoints if they are already open
  • DeMux now prevents sinks/readers from blocking forever if open fails
  • DataException.getRecord() no longer throws ClassCastException

 

You can download your copy of Data Pipeline at https://northconcepts.com/downloads/.

 

 

 

About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

Leave a Reply

Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">