Welcome to the fourth quarter release of DataPipeline for 2023.
Core Changes
- RecordList now implements the RecordSerializable interface as well as to/fromRecord(), to/fromBinary(), to/fromJson(), and to/fromArrayValue() methods. See examples of how to convert RecordList to JSON and XML:
- RecordList now includes an ensureCapacity(int minCapacity) method to optimize memory allocation.
- We added a CommaSeparatedValues class to hold a collection string values that can be represented or parsed from a comma delimited string. See the Read and Write CSV as String
- BUGFIX: CSVReader now excludes the configured fieldSeparators characters when trimming fields to support trimming values in tab separated files without trimming tabs. See the Read a Tab Separated Values File
- The expression language now allows nested properties to be surrounded by ${…} to better support field names with whitespaces and symbols.
- The SQL Select class now provides convenience setter methods to replace the following fields: selection, where, grouping, having, order.
- The DataReaderDecorator and DataWriterDecorator classes now use parameterized types to lock down the types supplied to them while still returning any DataReader/DataWriter.
- BUGFIX: SimpleXmlReader, XmlReader, XmlRecordReader now closes the supplied file or reader if an exception is raised while still in their constructor. Previously, you would have had to close any resources passed to these classes if the constructor call failed and threw an exception.
Foundations Changes
- DP Foundation now support reading invalid expressions from Record, XML, and JSON without throwing exception in the following classes: AbstractFieldMapping, FieldMapping, DecisionTableCondition, DecisionTableOutcome, DecisionTreeOutcome, CalculatedField, EntityDef. This allows you to store partial, broken, and work-in-progress expressions.
- BUGFIX: DataMapping no longer throws an exception if an excluded field doesn’t exist in the dataset. This allows you to exclude optional fields that might not exist in the source data.
- DataMappingEditor is now initialized with a DataReaderFactory instead of a Dataset.
- DataMappingEditor no longer provides sorting.
- FileType now supports a defaultFileExtension property and also add JSON_LINES, PARQUET, AVRO, and ORC constants.
- Several data mapping and schema validation messages have been improved
- Dataset now has separate flags for recordsLoaded and columnStatsLoaded to compliment the existing dataLoading flag
- Dataset also has new, overridable methods for afterRecordsLoaded() and afterColumnStatsLoaded().
- BUGFIX: DatasetReader now finishes after records are loaded instead of waiting for column stats to be processed.
- Tree now accepts Reader to detect candidate fields and record breaks in XML and JSON streams.
- The GenerateSchemaFromJdbc tool now skips primary key indexes in the source database.
Integration Changes
- DataPipeline now integrates with Shopify to read orders, inventory, products, locations, and customers. See the Shopify data extraction and conversion examples.
- Added a new MySQL integration to build DDL statements programmatically. This is similar to the existing PostgreSQL DDL See the Generate MySQL DDL Programmatically example.
- Added overloaded RecordTemplateModel wrap() methods in template integration to wrap RecordList and Collection<Record> as FreeMarker models.
See the CHANGELOG for the full set of updates in DP 8.3.0.
Also see the JavaDocs and examples for more info.
Happy coding!