Data Pipeline Features and Formats

Data Pipeline's embedded data migration engine makes it easy to add data transfer, conversion, and transformation functionality to your Java applications.

Data Formats

Format Streaming Read Write
CSV
Comma or user-defined delimited fields
Excel
Microsoft Excel formats 97, 2003, 2007, and 2010
 
Fixed Width / Fixed Length Records (FLR)
In-Memory
 
Java Beans
Read Java beans, arrays, and collections using XPath queries.
   
JDBC
JSON
Read streaming JSON using XPath queries. Write streaming JSON using code templates with the built-in expression language or using a simple writer.
Native
Built-in, binary serialization format.
PDF
   
RTF
Microsoft Word
   
Template
FreeMarker templates
 
Web Server Logs
 
XML
Read streaming XML using XPath queries. Write streaming XML using FreeMarker templates, code templates with the built-in expression language, or using a simple writer.

Features

Filter

Select records using the built-in expression language or programmatic rules.

Validate

Ensure records match your criteria by using the built-in expression language or programmatic rules.

Transform

Manipulate data using predefined or user-defined transformations.

Sort

Sort huge datasets with the external, disk-based sorting. Smaller sets can be sorted in-memory.

Exclude Fields

Remove fields from records using a black-list approach.

Include Fields

Select and arrange fields in records using a white-list approach.

Lookups / Joins

Enrich data with information from secondary sources.

Rename Field

Change field names to match targets.

Copy Field

Duplicate fields within records.

Assign Field

Create or update fields with specific values.

Calculated Field

Assign field values using the built-in expression language or programmatically.

DeMux

Split or duplicate a data source using one of the defined policies.

Meter

Record stats about your data transfer.

Throttle

Limit your data transfer to a specified number of bytes or records per secod.

Remove Duplicates (Dedup)

Delete records where one or more fields contain repeating values.

Sequence

Combine multiple data sources into one.

Aggregate

Perform calculations (count, max, etc.) using all records.

Async

Read or write using additional threads.

JDBC Batch updates

Chunk database writes to improve performance.

JDBC Multi-Writer

Write to a single database using multiple connections.

Multi-Writer

Write to multiple targets simultaneously.

Streaming data

Start writing data as soon as the first record is read.

Expression language

Use the runtime expression language to save on code changes.

Job management

Use automatic reader-writer transfer or create hooks for greater control and visibility.

Detailed exception reporting

Get vital state info along with the stack trace when exceptions occur.

Out-of-band data

Attach temporary, transient data to any field or record