Data Pipeline 3.1 is now available for download. This is a milestone release that adds native support for hierarchical data (nested records and multidimensional arrays).
Hierarchical Data Support
We’ve added first class support for fields containing Records. We’ve also improved array support to now include:
- fields containing arrays of arrays.
- fields containing arrays of records.
These changes means you can now build apps to convert and transform complex data types. You’re no longer limited to mapping hierarchical data streams to tabular data using XPath.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Record phone1 = new Record() .setField("PhoneNumber", "515-555-1515") .setField("Type", "Work"); Record phone2 = new Record() .setField("PhoneNumber", "717-777-1717") .setField("Type", "Mobile"); Record contact = new Record() .setField("Name", "John Smith") .setField("Phone", new Object[]{phone1, phone2}); System.out.println(contact); |
Output
1 2 3 4 5 6 7 8 9 10 11 12 |
Record (MODIFIED) (has child records) { 0:[Name]:STRING=[John Smith]:String 1:[Phone]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[PhoneNumber]:STRING=[515-555-1515]:String 1:[Type]:STRING=[Work]:String }, Record (MODIFIED) (is child record) { 0:[PhoneNumber]:STRING=[717-777-1717]:String 1:[Type]:STRING=[Mobile]:String }]]:ArrayValue } |
Dynamic expression language
The built-in expression language has also undergone an update to support hierarchical expressions in your filters and calculated fields.
1 2 3 4 5 |
reader = new TransformingReader(reader).add( new SetCalculatedField("discount", "order.amount * 0.10")); reader = new FilteringReader(reader).add( new FilterExpression("order.amount > 1000.0")); |
We’ve also added an evaluate()
method you can call right on each record.
1 2 |
System.out.println(contact.evaluate( "Name + ' ' + Phone[0].PhoneNumber + ' (' + Phone[0].Type + ')'")); |
Output
1 |
John Smith 515-555-1515 (Work) |
The expression language now also allows fields to be accessed positionally like arrays.
1 2 |
System.out.println(contact.evaluate( "Name + ' ' + Phone[1][0] + ' (' + Phone[-1]['Type'] + ')'")); |
The statement Phone[1][0]
refers to the PhoneNumber property of the first phone record.
The expression language also allows:
- negative array indexes — to retrieve elements starting from the end of the list.
- string array indexes — to retrieve elements with spaces in their names or to retrieve elements dynamically by getting their property name from another variable.
For example, Phone[-1]['Type']
refers to the Type property of the last phone record.
JSON conversion
Nested records means we can support direct Record-to-JSON conversions.
1 2 3 4 5 |
String json = contact.toJson(); System.out.println(json); Record contact2 = (Record) Record.fromJson(json); System.out.println(contact2); |
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
{"Name":"John Smith","Phone":[{"PhoneNumber":"515-555-1515","Type":"Work"},{"PhoneNumber":"717-777-1717","Type":"Mobile"}]} Record (MODIFIED) (has child records) { 0:[Name]:STRING=[John Smith]:String 1:[Phone]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[PhoneNumber]:STRING=[515-555-1515]:String 1:[Type]:STRING=[Work]:String }, Record (MODIFIED) (is child record) { 0:[PhoneNumber]:STRING=[717-777-1717]:String 1:[Type]:STRING=[Mobile]:String }]]:ArrayValue } |
Data conversion
The data conversion workhorse, BasicFieldTransformer
, can now operate on trees of data and can also continue operating even when some operations fail.
1 2 3 4 5 6 |
DataReader reader = new MemoryReader(new RecordList(contact)); reader = new TransformingReader(reader) .add(new BasicFieldTransformer("Phone") .lowerCase().replaceString("-", "").setContinueOnException(true)); DataWriter writer = new StreamWriter(System.out); JobTemplate.DEFAULT.transfer(reader, writer); |
Output
1 2 3 4 5 6 7 8 9 10 11 12 |
0 - Record (MODIFIED) (has child records) { 0:[Name]:STRING=[John Smith]:String 1:[Phone]:ARRAY of RECORD=[[ Record (MODIFIED) (is child record) { 0:[PhoneNumber]:STRING=[5155551515]:String 1:[Type]:STRING=[work]:String }, Record (MODIFIED) (is child record) { 0:[PhoneNumber]:STRING=[7177771717]:String 1:[Type]:STRING=[mobile]:String }]]:ArrayValue } |
Parallel Reading
You can now read from several data sources concurrently in a single pipeline with AsyncMultiReader
.
1 2 3 |
DataReader reader = new AsyncMultiReader(reader1, reader2, reader3); DataWriter writer = new CSVWriter(new FileWriter("multi.csv")); JobTemplate.DEFAULT.transfer(reader, writer); |
Read our blog post How to read data in parallel using AsyncMultiReader to see how to use it in your apps.
Streaming Aggregation Updates
The GroupByReader, introduced last year, can now perform streaming aggregations on array values. It can also exclude null groups and null values from its results.
1 2 3 |
reader = new GroupByReader(reader, "TweetHashtags", "TweetUserCountry") .setExcludeNulls(true) .count("CountOfHashtags", true); |
Excel Updates
The ExcelWriter includes a couple small changes to make your work easier. You can now style fields using ExcelWriter.setStyleFormat()
. You can also add a filter row to the top of each column with ExcelWriter.setAutoFilterColumns(true)
.
Twitter Search Tool
We used some of the changes in this release to update our Twitter search exporter tool.
The search results now show a summary of the top 10 retweets, favourites, hashtags, mentions, URLs, and users for the tweets matching your search.
The results also include a link to download the Excel file which contains all of the raw tweets plus all of the top items mentioned above (in separate tabs).
Other Changes
View the CHANGELOG.txt file included in the download for a list of all the updates in this release.
Happy Coding!
Pingback: Data Pipeline v3.1 is now available for download | JAVA