DataPipeline 10.1 Released

DataPipeline 10.1 includes new methods for working with arrays, collections, and filters. It adds Excel support for styling, hyperlinks, and configurable formula error handling. It improves data type detection and adds JDBC-backed dataset caching. It also adds DDL and DML code generation for the H2 Database. As well as new S3 operations.

ArrayValue

New methods added to ArrayValue class:

Async Processing

FieldPath

RecordList

New methods added to RecordList:

Sorting and Comparison

Excel

Failed Expression Handling

  • Added ExcelReader.FailedExpressionStrategy enum to control how failed formula evaluations are handled (default: FAIL)
    • FAIL – Throw exception when cell formula/expression evaluation fails (causes pipeline/job to abort)
    • SET_CACHED_VALUE – Use the last value cached in the Excel spreadsheet when evaluation fails
    • SET_EXPRESSION – Use the formula/expression as the field’s value when evaluation fails
    • SET_NULL – Use null as the field’s value when evaluation fails
    • SET_EXCEPTION_MESSAGE – Use the failure’s exception message as the field’s value
  • Added ExcelReader.setFailedExpressionStrategy(FailedExpressionStrategy) to configure the strategy
  • Added ExcelReader.getFailedExpressionStrategy() to retrieve the current strategy

Styling Support

New Excel styling classes for applying custom styles to cells:

Freeze Panes

Filtering

Grouping

JSON Processing

New BigInteger support for JSON readers:

New Readers and Lookups

Transformer

Retrying Operations

  • Added RetryingOperation.retryPredicate to allow conditional retry logic based on RetryContext
    • RetryContext provides access to retryCountexceptionCount, and lastException
    • By default, retries until maxRetryCount (default 5) or maxErrorCount (default Long.MAX_VALUE-1) is reached

JdbcDataset

  • Added JdbcDataset for persistent database-backed dataset caching, providing an alternative to in-memory or file-based dataset storage

Code Generation

Type Detection

  • Added maxColumnsToAnalyze property to limit the number of columns analyzed during type detection and schema inference, significantly improving performance in Dataset
  • Improved type inferrence in Dataset to determine field types for untyped file formats (such as CSV)

H2 Database Integration

  • H2InsertWriter class for converting Records into H2 INSERT statements with configurable batch sizes
  • CreateH2DdlFromSchemaDef class for generating complete H2 table definitions from SchemaDefinition objects
  • New DDL generation classes for programmatic database schema creation:
  • Support for common H2 data types including INT, BIGINT, VARCHAR, TEXT, DECIMAL, DATE, TIME, TIMESTAMP, BLOB, BOOLEAN, and JSON
  • Fluent API for building H2 SQL statements with proper identifier escaping and pretty-printing options
  • Upsert support with ON DUPLICATE KEY UPDATE semantics for conditional insert/update operations
  • Follows the same architectural pattern as existing MySQL and PostgreSQL integrations

Amazon S3

New methods added to AmazonS3FileSystem:

For the complete list of changes and detailed information, please refer to the CHANGELOG.txt file.

About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

Leave a Reply

Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">