DataPipeline 10.1 Released

DataPipeline 10.1 includes new methods for working with arrays, collections, and filters. It adds Excel support for styling, hyperlinks, and configurable formula error handling. It improves data type detection and adds JDBC-backed dataset caching. It also adds DDL and DML code generation for the H2 Database. As well as new S3 operations.

ArrayValue

New methods added to ArrayValue class:

addAll(Collection<?> collection) – Add all elements from a collection to the array
forEachRecord(Consumer<Record> action) – Iterate over record elements
forEachArrayValue(Consumer<ArrayValue> action) – Iterate over array value elements
forEachSingleValue(Consumer<SingleValue> action) – Iterate over single value elements
findFirst(Predicate<ValueNode<?>> predicate, int fromIndex) – Find first matching element
findLast(Predicate<ValueNode<?>> predicate, int fromIndex) – Find last matching element
findAll(Predicate<ValueNode<?>> predicate) – Find all matching elements
count(Predicate<ValueNode<?>> predicate) – Count matching elements
sort() – Sort array elements
sort(Comparator<ValueNode<?>> comparator) – Sort array elements with custom comparator
Related examples:
- Sort ArrayValues Using Custom Comparator

Async Processing

AsyncReader.exception and AsyncWriter.exception are now volatile to ensure the latest values are always seen
BUGFIX: AsyncWriter‘s super close() is now called inside the overridden close() instead of eagerly when async writing completes. This was causing AsyncWriter to appear closed after processing, but before close() was explicitly called
Related examples:
- Use Multi-Threading In A Single Job

FieldPath

FieldPath now supports array wildcard expressions and returning arrays
Added Record.getFieldValue(FieldPath fieldPath, T defaultValue) – Get field value with default fallback

RecordList

New methods added to RecordList:

getValuesAsArray(String fieldPath, boolean includeNulls, boolean flattenArrayValues) – Extract values as array using string path
getValuesAsArray(FieldPath fieldPath, boolean includeNulls, boolean flattenArrayValues) – Extract values as array using FieldPath
count(Filter filter) – Count records matching a filter
Related examples:
- Write To Memory
- Read From Memory

Sorting and Comparison

Added ValueNodeComparator to support sorting and comparing records, arrays, and single values
Related examples:
- Sort ArrayValues Using Custom Comparator
- Sort Records

Excel

Failed Expression Handling

Added ExcelReader.FailedExpressionStrategy enum to control how failed formula evaluations are handled (default: FAIL)
- FAIL – Throw exception when cell formula/expression evaluation fails (causes pipeline/job to abort)
- SET_CACHED_VALUE – Use the last value cached in the Excel spreadsheet when evaluation fails
- SET_EXPRESSION – Use the formula/expression as the field’s value when evaluation fails
- SET_NULL – Use null as the field’s value when evaluation fails
- SET_EXCEPTION_MESSAGE – Use the failure’s exception message as the field’s value
Added ExcelReader.setFailedExpressionStrategy(FailedExpressionStrategy) to configure the strategy
Added ExcelReader.getFailedExpressionStrategy() to retrieve the current strategy

Hyperlink Support

Added ExcelHyperlink to model hyperlinks pointing to:
- Another cell in the same workbook (forCell(sheetName, columnIndex, rowIndex))
- URLs, email addresses, files, and document locations
Added ExcelHyperlinkType enum to define hyperlink types (DOCUMENT, URL, EMAIL, FILE, NONE)
Added ExcelHyperlinkFunction interface to create ExcelHyperlink instances dynamically based on FieldLocation
Added ExcelFieldMetadata to attach Excel-specific metadata to a Field, including an ExcelHyperlink via session properties
Added ExcelReader.readMetadata flag to enable the reading of hyperlinks (default: false)
Added ExcelWriter.writeMetadata flag to enable the writing of hyperlinks (default: false)
Related examples:
- Create Hyperlink In Excel
- Read Hyperlink From Excel

Styling Support

New Excel styling classes for applying custom styles to cells:

Added ExcelCellStyle for defining cell styles
Added ExcelCellStyleDecorator for applying styles to cells
Added ExcelColor for color management
Added ExcelColorPalette for color palette operations
Added ExcelFont for font styling
Added FieldLocation and FieldLocationPredicate for field location-based operations, starting with Excel styling
Added ExcelWriter methods for applying hyperlinks and cell styles:
- addHeaderHyperlink(FieldLocationPredicate, ExcelHyperlinkFunction) – Add hyperlinks to header cells
- addHeaderCellStyle(FieldLocationPredicate, ExcelCellStyleDecorator) – Apply styles to header cells
- addDataHyperlink(FieldLocationPredicate, ExcelHyperlinkFunction) – Add hyperlinks to data cells
- addDataCellStyle(FieldLocationPredicate, ExcelCellStyleDecorator) – Apply styles to data cells
Related examples:
- Write Excel Formatting And Styles
- Read Excel Formatting And Styles

Freeze Panes

Added ExcelWriter.freezeRows – the number of rows to freeze, starting from the top
Added ExcelWriter.freezeColumns – the number of columns to freeze, starting from the left
Related examples:
- Freeze Panes in Excel

Filtering

Added IsEmpty filter – returns true if field value is null, field is an empty array, or string value is empty or contains only white space characters
Added IsNotEmpty – returns true if !IsEmpty
Added FieldFilter.isEmpty() and isNotEmpty() – convenience methods
Added Filter.of(Predicate<Record> predicate) – create filters using Java closures
Related examples:

Grouping

Added GroupCollect to add values from selected field in grouped records into an ArrayValue
Added overloaded GroupByReader.collect() methods
Related examples:
- Add Values To An Array Using GroupByReader

JSON Processing

New BigInteger support for JSON readers:

Added JsonReader.useBigInteger property for use when reading numeric values (default: false)
Added JsonRecordReader.useBigInteger property for use when reading numeric values (default: false)
Added SimpleJsonReader.useBigInteger property for use when reading numeric values (default: false)
Related examples:

New Readers and Lookups

Added CollectionReader to read/create records from a java.util.Collection or Iterator
Added RecordListLookup to join in-memory from a RecordList
Related examples:

Transformer

Added Transformer.of(Function<Record, Boolean> function) to allow creation of transformers using Java closures
Related examples:

Retrying Operations

Added RetryingOperation.retryPredicate to allow conditional retry logic based on RetryContext
- RetryContext provides access to retryCount, exceptionCount, and lastException
- By default, retries until maxRetryCount (default 5) or maxErrorCount (default Long.MAX_VALUE-1) is reached

JdbcDataset

Added JdbcDataset for persistent database-backed dataset caching, providing an alternative to in-memory or file-based dataset storage

Code Generation

Added GenerateTableDaoClasses.useDatabaseIdentifiersAsDisplayNames flag for FormDef generation to control whether database identifiers or display names are used
DAO code generation now assigns database column default values to DAO fields during initialization and in setter methods, ensuring generated code properly handles default values
BUGFIX: Fixed NullPointerException in GenerateEntityFromDataset.createFieldDef that occurred when processing certain field definitions

Type Detection

Added maxColumnsToAnalyze property to limit the number of columns analyzed during type detection and schema inference, significantly improving performance in Dataset
Improved type inferrence in Dataset to determine field types for untyped file formats (such as CSV)

H2 Database Integration

H2InsertWriter class for converting Records into H2 INSERT statements with configurable batch sizes
CreateH2DdlFromSchemaDef class for generating complete H2 table definitions from SchemaDefinition objects
New DDL generation classes for programmatic database schema creation:
Support for common H2 data types including INT, BIGINT, VARCHAR, TEXT, DECIMAL, DATE, TIME, TIMESTAMP, BLOB, BOOLEAN, and JSON
Fluent API for building H2 SQL statements with proper identifier escaping and pretty-printing options
Upsert support with ON DUPLICATE KEY UPDATE semantics for conditional insert/update operations
Follows the same architectural pattern as existing MySQL and PostgreSQL integrations

Amazon S3

New methods added to AmazonS3FileSystem:

deleteFile(String bucket, String key) – Delete S3 objects
exists(String bucket, String key) – Check if an S3 object exists
Related examples:
- Read from Amazon S3
- Write Parquet to Amazon S3

For the complete list of changes and detailed information, please refer to the CHANGELOG.txt file.