Build Pipelines in Code

Overview

DataPipeline provides the building blocks to add data processing functionality to your applications, services, and batch jobs. You can code them using Java or other languages that runs on the Java Virtual Machine (JVM).

Features

Data Readers and Writers

DataPipeline supports a variety of file types, data sources, and APIs.

Read and write flat files: CSV, Excel, JSON, XML, fixed-length, and more
Columnar file support: Parquet, Orc
API endpoints: Google, Twitter, Jira, Email, and more
Custom connectors
Learn more about data formats

Databases

Built-in support for generic and database-specific access.

JDBC readers and writers
Configurable insert and upsert strategies
JDBC multi-threaded writer
MongoDB
Learn more about database support

Transformers

Over 30 transformations provided.

Filter and validate
Lookups/joins
Streaming window aggregation
Calculated fields
Remove duplicates
Split/join
Learn more about transformations

Multi-threading

DataPipeline can use threads in a variety of way to speed up processing.

Multi-threaded jobs
Read and write concurrently
Read and write to multiple sources and targets
Event bus decoupled jobs
Pause, resume, cancel running jobs
Learn more about multi-threading

Expression Language

The DataPipeline Expression Language (DPEL) is a dynamically executed language similar to Java with several additions that make it easier to read for both developers and analysts.

Dynamic expression language
A large set of built-in functions
Callouts to custom Java functions
Restricted list of unsafe methods
Learn more about the expression language

Other resources

Get DataPipeline 10.0.0 See Examples

Build Pipelines in Code

Overview

Features

Data Readers and Writers

Databases

Transformers

Multi-threading

Expression Language

Other resources

Data Pipeline

Docs

Company

Tools