Welcome to the 9.2 release of DataPipeline.
FieldPath Improvements
- FieldPath now implements RecordSerializable, XmlSerializable, and JavaCodeGenerator. These allow it to be converted to and from JSON, XML, and records. It can also generate the appropriate Java code to represent a new instance of itself.
- FieldPath now includes a new toExpression() method to emit a string that can be passed to its parse() method.
- GroupByReader‘s operations now also accepts FieldPath: GroupOperation, GroupAverage, GroupCount, GroupFirst, GroupLast, GroupMaximum, GroupMinimum, GroupSum.
- SetCalculatedField and SetField now also accepts FieldPath.
- See detail in the Field Path Expressions guide.
Excel Changes
- ExcelReader now supports Binary Excel files (.xlsb) via a new provider (ExcelDocument.ProviderType.POI_XSSFB and PoiXssfbProvider). See the Read a binary Excel file (.xlsb) example.
- ExcelWriter now fails early by default when a field/cell value is over 32767 characters in length.
- ExcelWriter also includes a new largeCellHandler property to configure how field/cell values over 32767 characters are handled (FAIL, TRUNCATE, SKIP). See the Truncate large Excel cell values example.
- Added ExcelPipelineOutput.largeCellHandler to configure how field/cell values over 32767 characters are handled (FAIL, TRUNCATE, SKIP) in declarative pipelines.
- BUGFIX: the streaming Excel reader (ExcelDocument.ProviderType.POI_XSSF_SAX / PoiXssfSaxProvider) now evaluates formulates if ExcelReader.evaluateExpressions is set.
XML Changes
- The XPath engine now supports partial wildcards when matching XML, JSON, and Java Beans. See the Use partial wildcards in XPath when reading XML and Use partial wildcards in XPath when reading JSON
- XmlReader and JsonReader now skips records with record matches, but no field matches instead of returning empty records.
Foundations Changes
- Added parseFieldNamesAsFieldPath flag (default to true) to AggregateGroupFieldsAction, AddFieldsAction, and RenameFieldsAction.
- Added new RemoveDuplicatesAction for use in declarative pipelines
Dataset Changes
- Dataset now has a jobExecutor property to allow changing the java.util.concurrent.Executor used to run the internal job. See Dataset examples.
- BUGFIX: Dataset now set the dataLoadException property even if an exception occurs during cancellation.
- The Tree and TreeNode classes have been moved from the com.northconcepts.datapipeline.foundations.pipeline.dataset package to com.northconcepts.datapipeline.foundations.pipeline.tree.
- Tree has been improved to identify XML elements and JSON field names that should likely be treated as values instead of field names.
Schema Changes
- Added trim property to allow automatic removal of leading or trailing whitespaces in schema-based mappings and transformations. See schema examples.
- getFieldNames() now always returns the exact field name instead of the display name if one was set.
- Added getFieldNamesForErrorMessage() to retrieve the display names where one was set or fallback to field name otherwise.
PDF Changes
- BUGFIX: PdfWriter no longer fails on null values. See the Generate a PDF example.
See the CHANGELOG for the full set of updates in DP 9.2.0.
Also see the JavaDocs and examples for more info.
Happy coding!