Data Pipeline 3.1 Now Available

Data Pipeline 3.1 is now available for download. This is a milestone release that adds native support for hierarchical data (nested records and multidimensional arrays).

Hierarchical Data Support

We’ve added first class support for fields containing Records.  We’ve also improved array support to now include:

  • fields containing arrays of arrays.
  • fields containing arrays of records.

These changes means you can now build apps to convert and transform complex data types.  You’re no longer limited to mapping hierarchical data streams to tabular data using XPath.

Output

Dynamic expression language

The built-in expression language has also undergone an update to support hierarchical expressions in your filters and calculated fields.

We’ve also added an evaluate() method you can call right on each record.

Output

The expression language now also allows fields to be accessed positionally like arrays.

The statement Phone[1][0] refers to the PhoneNumber property of the first phone record.

The expression language also allows:

  • negative array indexes — to retrieve elements starting from the end of the list.
  • string array indexes — to retrieve elements with spaces in their names or to retrieve elements dynamically by getting their property name from another variable.

For example, Phone[-1]['Type'] refers to the Type property of the last phone record.

JSON conversion

Nested records means we can support direct Record-to-JSON conversions.

Output

Data conversion

The data conversion workhorse, BasicFieldTransformer, can now operate on trees of data and can also continue operating even when some operations fail.

Output

Parallel Reading

You can now read from several data sources concurrently in a single pipeline with AsyncMultiReader.

Read our blog post How to read data in parallel using AsyncMultiReader to see how to use it in your apps.

Streaming Aggregation Updates

The GroupByReader, introduced last year, can now perform streaming aggregations on array values.  It can also exclude null groups and null values from its results.

Excel Updates

The ExcelWriter includes a couple small changes to make your work easier.  You can now style fields using ExcelWriter.setStyleFormat().  You can also add a filter row to the top of each column with ExcelWriter.setAutoFilterColumns(true).

Twitter Search Tool

We used some of the changes in this release to update our Twitter search exporter tool.

The search results now show a summary of the top 10 retweets, favourites, hashtags, mentions, URLs, and users for the tweets matching your search.

The results also include a link to download the Excel file which contains all of the raw tweets plus all of the top items mentioned above (in separate tabs).

Other Changes

View the CHANGELOG.txt file included in the download for a list of all the updates in this release.

 

Happy Coding!

About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

One thought on “Data Pipeline 3.1 Now Available

Leave a Reply

Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">