DataPipeline 7.1 is now available. It includes improvements in the areas of file I/O, data mapping, database integration, decisioning, debugging, and more. You can get started with Maven or Gradle, browse our Java examples, and review the changelog.
Upgrade Log4J from v1.2.17 to v2.17.0
DataPipeline has been using an older version of Log4J that was not affected by the recent set of issues. However, since we were already moving from 1.x to 2.x, we’ll step over the issues you’ve been hearing about. This release uses the 1.2 API bridge allowing for an upgrade without code changes.
EventBus now uses strong references for listeners by default instead of SoftReferences. No more disappearing listeners when the JVM runs out of memory. See EventBus.setUseStrongListenerReference(boolean useStrongListenerReference) to toggle.
- Added a JdbcLookup.autoCloseConnection property to force database connection closing when the lookup is closed
- The dynamic SQL code builder classes under northconcepts.datapipeline.jdbc.sql now include a Delete class as well as new getParameterValues() methods to retrieve candidate values for insert and update.
- GenericUpsert, MergeUpsert, MySqlUpsert, PostgreSqlUpsert, and SybaseUpsert now support upserts with no non-key fields.
- The overloaded JdbcConnectionFactory.wrap() to accept driverClassName, url, username, and password for direct connection creation.
- JdbcMultiWriter, JdbcReader, JdbcUpsertWriter, JdbcWriter, and JdbcLookup now accept JdbcConnectionFactory to wrap DataSource and other connection creation options
- Added JdbcWriter.setJdbcType(Class<?> type, int jdbcType) to allow overriding of the JDBC type sent to the database based on the Java class type
- JdbcWriter now uses the type overrides for both null and non-null values
- Added Record.getFieldValueAsBytes(String fieldPath, byte defaultValue):bye
- Overrode the setter methods to return the superclass in the following classes: CSVReader, CSVWriter, ExcelReader, ExcelWriter, FixedWidthReader, FixedWidthWriter, JdbcReader, JdbcWriter, JsonReader, JsonWriter, XmlReader, XmlWriter
- Added open() and close() methods to the Filter, Transformer, and Lookup classes as callbacks from the endpoints that use them
- Added default constructor to MemoryReader and add(Record record) method
- Excluded more method prefixes in DPEL (for example java.security, java.util.concurrent, java.util.prefs, and more)
- FileWriter now flushes on close if autoCloseWriter is false
- ParsingReader now accepts a charsetName
- Added support for XML 1.1 declarations to XmlSerializable interface: XmlSerializable.writeXml(T bean, StreamResult outputTarget, boolean closeStream, boolean addXml11Delcaration)
- Added an autoCloseReader property to JsonRecordPipelineInput, XmlPipelineInput, and XmlRecordPipelineInput
- Added DebugWriter to ease logging of outgoing records
- Added a JdbcWriter.debug property to turn on logging of the generated SQL
- The SQL builder classes under the northconcepts.datapipeline.jdbc.* packages now support a “debug” property
- toString() now includes its key-value properties
DecisionTable and DecisionTree Improvements
- Added an optional defaultOutcomes property to DecisionTable to explicitly define the results when no rules match/fire
- Overloaded addField(String variable, String expression, boolean includeInOutcome) on DecisionTable and DecisionTree to easily include a calculated field in the results
- DecisionTableCondition now allows null variable names
Pipeline and Dataset Improvements
- Dataset has several new ways to read its cached data including an overloaded createDataReader() and getRecordList(long offset, int count) methods.
- Added an optional maxColumnStatsRecords property to Dataset to indicate the number of records to use when calculating column-based stats
- Since column stats calculation is done asynchronously, Dataset add a columnStatsReaderThreads property for the number of threads to use when processing column stats (default is 2)
- Several overloaded waitForColumnStatsToLoad() methods are now in Dataset to block until asynchronous column stats processing completes
- Dataset’s asynchronous data loading and column stats calculation can be terminated with the new cancelLoad() method
- Column now includes new counts and date-time inferencing for the data loaded: getNonNullCount(), getNonNullNonBlankCount(), getInferredNumericValueCount(), getInferredTemporalValueCount(), getInferredBooleanValueCount(), getTemporalPatternCount(), getTemporalPatterns()
- Added AbstractPipeline now includes a dateTimePatternDetector property to configure the date-time patterns tested during column processing
Other DataPipeline Foundations Improvements
- FieldMapping now has an optional type property for automatic conversion
- FieldDef includes a new example property to aid in schema documentation
- 19-digit whole numbers are mapped to long instead BigDecimal in the JdbcTableColumn class and code generators
- Maps on Bean subclasses are now sorted by key when serialized to JSON
- DataMapping now implements the JavaCodeGenerator interface for use-cases that need to emit Java code
- Added getTablesSorted() and getTablesSortedTopologically() to JdbcConnection to retrieve tables sorted by name or by dependencies
- JdbcTable and JdbcTableColumn now have getNameAsJavaClassName() and getNameAsJavaIdentifier() methods to aide in code generation
- Added com.northconcepts.datapipeline.foundations.time.DataTimePatternMatch, DateTimePattern, and DateTimePatternDetector for use in dataset date-time inferencing or on their own
- Added GenerateTableDaoClasses and GenerateQueryDaoClasses to generate data access Java beans using the table metadata from a live database
- Added GenerateSpringDataJpaClasses to generate Spring Data JPA entities and repositories using the table metadata from a live database
DataPipeline Integration Improvements
- Added AvroPipelineInput, AvroPipelineOutput, ParquetPipelineInput, ParquetPipelineOutput to allow Avro and Parquet endpoints to more easily participate in pipelines
- Added ParquetDataWriter.setSchema(String schema) to allow override of the schema used when reading Parquet files
- AsymmetricEncryptingReader and AsymmetricDecryptingReader now explicitly rely on PublicKey and PrivateKey respectively for clarity instead of their Key superclass
- AsyncWriter ensures that its threads are shutdown when exceptions are thrown during the nested endpoint’s open() and close() methods. This fix also ensures that the JobCallback.onFailure() method is called if an asynchronous failure occurs while JdbcMultiWriter.close() is executing.
- XmlReader and XmlRecordReader will always use sun.xml.internal.stream.XMLInputFactoryImpl as the XML input factory regardless of the JDK. This release also introduces an optional com.northconcepts.datapipeline.xml.XMLInputFactory JVM param to override the Sun class if needed.
- getTypedListenerCount() and getUntypedEventListenerCount() now account for nullified SoftReferences before they are removed by the event bus’s next cleanup. There was a small window in which the count would reflect the old numbers.
- Binary deserialization now returns DATETIME field values as java.util.Date instead java.sql.Timestamp while reading in FileReader
- CombinedLogReader now explicitly uses Locale.ENGLISH to prevent month name parsing issues outside of English locales
- AbstractFieldMapping now clones more types to prevent side effects on the source data during mapping
- ExcelPipelineInput now escapes the file path in the generated Java code to open Excel files
- OrcDataReader and OrcDataWriter now treat Orc’s TIMESTAMP type as a local datetime instead of adjusting it for the current time zone
- NPE when record doesn’t contain an optional field during mapping