Data Pipeline 3.1 Now Available

Data Pipeline 3.1 is now available for download. This is a milestone release that adds native support for hierarchical data (nested records and multidimensional arrays).

Download Data Pipeline

Hierarchical Data Support

We’ve added first class support for fields containing Records. We’ve also improved array support to now include:

fields containing arrays of arrays.
fields containing arrays of records.

These changes means you can now build apps to convert and transform complex data types. You’re no longer limited to mapping hierarchical data streams to tabular data using XPath.

Record phone1 = new Record()
    .setField("PhoneNumber", "515-555-1515")
    .setField("Type", "Work");

Record phone2 = new Record()
    .setField("PhoneNumber", "717-777-1717")
    .setField("Type", "Mobile");

Record contact = new Record()
    .setField("Name", "John Smith")
    .setField("Phone", new Object[]{phone1, phone2});

System.out.println(contact);

Record phone1 = new Record()

.setField("PhoneNumber", "515-555-1515")

.setField("Type", "Work");

Record phone2 = new Record()

.setField("PhoneNumber", "717-777-1717")

.setField("Type", "Mobile");

Record contact = new Record()

.setField("Name", "John Smith")

.setField("Phone", new Object[]{phone1, phone2});

System.out.println(contact);

Output

Record (MODIFIED) (has child records) {
    0:[Name]:STRING=[John Smith]:String
    1:[Phone]:ARRAY of RECORD=[[
        Record (MODIFIED) (is child record) {
            0:[PhoneNumber]:STRING=[515-555-1515]:String
            1:[Type]:STRING=[Work]:String
        }, 
        Record (MODIFIED) (is child record) {
            0:[PhoneNumber]:STRING=[717-777-1717]:String
            1:[Type]:STRING=[Mobile]:String
        }]]:ArrayValue
}

Record (MODIFIED) (has child records) {

0:[Name]:STRING=[John Smith]:String

1:[Phone]:ARRAY of RECORD=[[

Record (MODIFIED) (is child record) {

0:[PhoneNumber]:STRING=[515-555-1515]:String

1:[Type]:STRING=[Work]:String

Record (MODIFIED) (is child record) {

0:[PhoneNumber]:STRING=[717-777-1717]:String

1:[Type]:STRING=[Mobile]:String

}]]:ArrayValue

}

Dynamic expression language

The built-in expression language has also undergone an update to support hierarchical expressions in your filters and calculated fields.

reader = new TransformingReader(reader).add(
    new SetCalculatedField("discount", "order.amount * 0.10"));

reader = new FilteringReader(reader).add(
    new FilterExpression("order.amount > 1000.0"));

reader = new TransformingReader(reader).add(

new SetCalculatedField("discount", "order.amount * 0.10"));

reader = new FilteringReader(reader).add(

new FilterExpression("order.amount > 1000.0"));

We’ve also added an evaluate() method you can call right on each record.

System.out.println(contact.evaluate(
    "Name + ' ' + Phone[0].PhoneNumber + ' (' + Phone[0].Type + ')'"));

1 2	System.out.println(contact.evaluate( "Name + ' ' + Phone[0].PhoneNumber + ' (' + Phone[0].Type + ')'"));

Output

John Smith 515-555-1515 (Work)

1	John Smith 515-555-1515 (Work)

The expression language now also allows fields to be accessed positionally like arrays.

System.out.println(contact.evaluate(
    "Name + ' ' + Phone[1][0] + ' (' + Phone[-1]['Type'] + ')'"));

1 2	System.out.println(contact.evaluate( "Name + ' ' + Phone[1][0] + ' (' + Phone[-1]['Type'] + ')'"));

The statement Phone[1][0] refers to the PhoneNumber property of the first phone record.

The expression language also allows:

negative array indexes — to retrieve elements starting from the end of the list.
string array indexes — to retrieve elements with spaces in their names or to retrieve elements dynamically by getting their property name from another variable.

For example, Phone[-1]['Type'] refers to the Type property of the last phone record.

JSON conversion

Nested records means we can support direct Record-to-JSON conversions.

String json = contact.toJson();
System.out.println(json);
       
Record contact2 = (Record) Record.fromJson(json);
System.out.println(contact2);

String json = contact.toJson();

System.out.println(json);

Record contact2 = (Record) Record.fromJson(json);

System.out.println(contact2);

Output

{"Name":"John Smith","Phone":[{"PhoneNumber":"515-555-1515","Type":"Work"},{"PhoneNumber":"717-777-1717","Type":"Mobile"}]}

Record (MODIFIED) (has child records) {
    0:[Name]:STRING=[John Smith]:String
    1:[Phone]:ARRAY of RECORD=[[
        Record (MODIFIED) (is child record) {
            0:[PhoneNumber]:STRING=[515-555-1515]:String
            1:[Type]:STRING=[Work]:String
        }, 
        Record (MODIFIED) (is child record) {
            0:[PhoneNumber]:STRING=[717-777-1717]:String
            1:[Type]:STRING=[Mobile]:String
        }]]:ArrayValue
}

{"Name":"John Smith","Phone":[{"PhoneNumber":"515-555-1515","Type":"Work"},{"PhoneNumber":"717-777-1717","Type":"Mobile"}]}

Record (MODIFIED) (has child records) {

0:[Name]:STRING=[John Smith]:String

1:[Phone]:ARRAY of RECORD=[[

Record (MODIFIED) (is child record) {

0:[PhoneNumber]:STRING=[515-555-1515]:String

1:[Type]:STRING=[Work]:String

Record (MODIFIED) (is child record) {

0:[PhoneNumber]:STRING=[717-777-1717]:String

1:[Type]:STRING=[Mobile]:String

}]]:ArrayValue

}

Data conversion

The data conversion workhorse, BasicFieldTransformer, can now operate on trees of data and can also continue operating even when some operations fail.

DataReader reader = new MemoryReader(new RecordList(contact));
reader = new TransformingReader(reader)
    .add(new BasicFieldTransformer("Phone")
        .lowerCase().replaceString("-", "").setContinueOnException(true));
DataWriter writer = new StreamWriter(System.out);
JobTemplate.DEFAULT.transfer(reader, writer);

DataReader reader = new MemoryReader(new RecordList(contact));

reader = new TransformingReader(reader)

.add(new BasicFieldTransformer("Phone")

.lowerCase().replaceString("-", "").setContinueOnException(true));

DataWriter writer = new StreamWriter(System.out);

JobTemplate.DEFAULT.transfer(reader, writer);

Output

0 - Record (MODIFIED) (has child records) {
    0:[Name]:STRING=[John Smith]:String
    1:[Phone]:ARRAY of RECORD=[[
        Record (MODIFIED) (is child record) {
            0:[PhoneNumber]:STRING=[5155551515]:String
            1:[Type]:STRING=[work]:String
        }, 
        Record (MODIFIED) (is child record) {
            0:[PhoneNumber]:STRING=[7177771717]:String
            1:[Type]:STRING=[mobile]:String
        }]]:ArrayValue
}

0 - Record (MODIFIED) (has child records) {

0:[Name]:STRING=[John Smith]:String

1:[Phone]:ARRAY of RECORD=[[

Record (MODIFIED) (is child record) {

0:[PhoneNumber]:STRING=[5155551515]:String

1:[Type]:STRING=[work]:String

Record (MODIFIED) (is child record) {

0:[PhoneNumber]:STRING=[7177771717]:String

1:[Type]:STRING=[mobile]:String

}]]:ArrayValue

}

Parallel Reading

You can now read from several data sources concurrently in a single pipeline with AsyncMultiReader.

DataReader reader = new AsyncMultiReader(reader1, reader2, reader3);
DataWriter writer = new CSVWriter(new FileWriter("multi.csv")); 
JobTemplate.DEFAULT.transfer(reader, writer);

DataReader reader = new AsyncMultiReader(reader1, reader2, reader3);

DataWriter writer = new CSVWriter(new FileWriter("multi.csv"));

JobTemplate.DEFAULT.transfer(reader, writer);

Read our blog post How to read data in parallel using AsyncMultiReader to see how to use it in your apps.

Streaming Aggregation Updates

The GroupByReader, introduced last year, can now perform streaming aggregations on array values. It can also exclude null groups and null values from its results.

reader = new GroupByReader(reader, "TweetHashtags", "TweetUserCountry")
    .setExcludeNulls(true)
    .count("CountOfHashtags", true);

reader = new GroupByReader(reader, "TweetHashtags", "TweetUserCountry")

.setExcludeNulls(true)

.count("CountOfHashtags", true);

Excel Updates

The ExcelWriter includes a couple small changes to make your work easier. You can now style fields using ExcelWriter.setStyleFormat(). You can also add a filter row to the top of each column with ExcelWriter.setAutoFilterColumns(true).

Twitter Search Tool

We used some of the changes in this release to update our Twitter search exporter tool.

The search results now show a summary of the top 10 retweets, favourites, hashtags, mentions, URLs, and users for the tweets matching your search.

The results also include a link to download the Excel file which contains all of the raw tweets plus all of the top items mentioned above (in separate tabs).

Other Changes

View the CHANGELOG.txt file included in the download for a list of all the updates in this release.

Happy Coding!

Download Data Pipeline

Data Pipeline 3.1 Now Available

Hierarchical Data Support

Dynamic expression language

JSON conversion

Data conversion

Parallel Reading

Streaming Aggregation Updates

Excel Updates

Twitter Search Tool

Other Changes

About The DataPipeline Team

One thought on “Data Pipeline 3.1 Now Available”

Leave a Reply Cancel reply

Data Pipeline

Docs

Company

Tools