Build Pipelines in Code
Overview
DataPipeline provides the building blocks to add data processing functionality to your applications, services, and batch jobs. You can code them using Java or other languages that runs on the Java Virtual Machine (JVM).
Features
Data Readers and Writers
DataPipeline supports a variety of file types, data sources, and APIs.
- Read and write flat files: CSV, Excel, JSON, XML, fixed-length, and more
- Columnar file support: Parquet, Orc
- API endpoints: Google, Twitter, Jira, Email, and more
- Custom connectors
- Learn more about data formats
Databases
Built-in support for generic and database-specific access.
- JDBC readers and writers
- Configurable insert and upsert strategies
- JDBC multi-threaded writer
- MongoDB
- Learn more about database support
Transformers
Over 30 transformations provided.
- Filter and validate
- Lookups/joins
- Streaming window aggregation
- Calculated fields
- Remove duplicates
- Split/join
- Learn more about transformations
Multi-threading
DataPipeline can use threads in a variety of way to speed up processing.
- Multi-threaded jobs
- Read and write concurrently
- Read and write to multiple sources and targets
- Event bus decoupled jobs
- Pause, resume, cancel running jobs
- Learn more about multi-threading
Expression Language
The DataPipeline Expression Language (DPEL) is a dynamically executed language similar to Java with several additions that make it easier to read for both developers and analysts.
- Dynamic expression language
- A large set of built-in functions
- Callouts to custom Java functions
- Restricted list of unsafe methods
- Learn more about the expression language