DataPipeline 7.1 Released

DataPipeline 7.1 is now available. It includes improvements in the areas of file I/O, data mapping, database integration, decisioning, debugging, and more. You can get started with Maven or Gradle, browse our Java examples, and review the changelog.

Upgrade Log4J from v1.2.17 to v2.17.0

DataPipeline has been using an older version of Log4J that was not affected by the recent set of issues. However, since we were already moving from 1.x to 2.x, we’ll step over the issues you’ve been hearing about. This release uses the 1.2 API bridge allowing for an upgrade without code changes.

EventBus Listeners

EventBus now uses strong references for listeners by default instead of SoftReferences. No more disappearing listeners when the JVM runs out of memory. See EventBus.setUseStrongListenerReference(boolean useStrongListenerReference) to toggle.

Database Improvements

Added a JdbcLookup.autoCloseConnection property to force database connection closing when the lookup is closed
The dynamic SQL code builder classes under northconcepts.datapipeline.jdbc.sql now include a Delete class as well as new getParameterValues() methods to retrieve candidate values for insert and update.
GenericUpsert, MergeUpsert, MySqlUpsert, PostgreSqlUpsert, and SybaseUpsert now support upserts with no non-key fields.
The overloaded JdbcConnectionFactory.wrap() to accept driverClassName, url, username, and password for direct connection creation.
JdbcMultiWriter, JdbcReader, JdbcUpsertWriter, JdbcWriter, and JdbcLookup now accept JdbcConnectionFactory to wrap DataSource and other connection creation options
Added JdbcWriter.setJdbcType(Class<?> type, int jdbcType) to allow overriding of the JDBC type sent to the database based on the Java class type
JdbcWriter now uses the type overrides for both null and non-null values

API Improvements

Added Record.getFieldValueAsBytes(String fieldPath, byte[] defaultValue):bye[]
Overrode the setter methods to return the superclass in the following classes: CSVReader, CSVWriter, ExcelReader, ExcelWriter, FixedWidthReader, FixedWidthWriter, JdbcReader, JdbcWriter, JsonReader, JsonWriter, XmlReader, XmlWriter
Added open() and close() methods to the Filter, Transformer, and Lookup classes as callbacks from the endpoints that use them
Added default constructor to MemoryReader and add(Record record) method
Excluded more method prefixes in DPEL (for example java.security, java.util.concurrent, java.util.prefs, and more)

I/O Improvements

FileWriter now flushes on close if autoCloseWriter is false
ParsingReader now accepts a charsetName
Added support for XML 1.1 declarations to XmlSerializable interface: XmlSerializable.writeXml(T bean, StreamResult outputTarget, boolean closeStream, boolean addXml11Delcaration)
Added an autoCloseReader property to JsonRecordPipelineInput, XmlPipelineInput, and XmlRecordPipelineInput

Debugging Improvements

Added DebugWriter to ease logging of outgoing records
Added a JdbcWriter.debug property to turn on logging of the generated SQL
The SQL builder classes under the northconcepts.datapipeline.jdbc.* packages now support a “debug” property
toString() now includes its key-value properties

DecisionTable and DecisionTree Improvements

Added an optional defaultOutcomes property to DecisionTable to explicitly define the results when no rules match/fire
Overloaded addField(String variable, String expression, boolean includeInOutcome) on DecisionTable and DecisionTree to easily include a calculated field in the results
DecisionTableCondition now allows null variable names

Pipeline and Dataset Improvements

Dataset has several new ways to read its cached data including an overloaded createDataReader() and getRecordList(long offset, int count) methods.
Added an optional maxColumnStatsRecords property to Dataset to indicate the number of records to use when calculating column-based stats
Since column stats calculation is done asynchronously, Dataset add a columnStatsReaderThreads property for the number of threads to use when processing column stats (default is 2)
Several overloaded waitForColumnStatsToLoad() methods are now in Dataset to block until asynchronous column stats processing completes
Dataset’s asynchronous data loading and column stats calculation can be terminated with the new cancelLoad() method
Column now includes new counts and date-time inferencing for the data loaded: getNonNullCount(), getNonNullNonBlankCount(), getInferredNumericValueCount(), getInferredTemporalValueCount(), getInferredBooleanValueCount(), getTemporalPatternCount(), getTemporalPatterns()
Added AbstractPipeline now includes a dateTimePatternDetector property to configure the date-time patterns tested during column processing

Other DataPipeline Foundations Improvements

FieldMapping now has an optional type property for automatic conversion
FieldDef includes a new example property to aid in schema documentation
19-digit whole numbers are mapped to long instead BigDecimal in the JdbcTableColumn class and code generators
Maps on Bean subclasses are now sorted by key when serialized to JSON
DataMapping now implements the JavaCodeGenerator interface for use-cases that need to emit Java code
Added getTablesSorted() and getTablesSortedTopologically() to JdbcConnection to retrieve tables sorted by name or by dependencies
JdbcTable and JdbcTableColumn now have getNameAsJavaClassName() and getNameAsJavaIdentifier() methods to aide in code generation
Added com.northconcepts.datapipeline.foundations.time.DataTimePatternMatch, DateTimePattern, and DateTimePatternDetector for use in dataset date-time inferencing or on their own
Added GenerateTableDaoClasses and GenerateQueryDaoClasses to generate data access Java beans using the table metadata from a live database
Added GenerateSpringDataJpaClasses to generate Spring Data JPA entities and repositories using the table metadata from a live database

DataPipeline Integration Improvements

Added AvroPipelineInput, AvroPipelineOutput, ParquetPipelineInput, ParquetPipelineOutput to allow Avro and Parquet endpoints to more easily participate in pipelines
Added ParquetDataWriter.setSchema(String schema) to allow override of the schema used when reading Parquet files
AsymmetricEncryptingReader and AsymmetricDecryptingReader now explicitly rely on PublicKey and PrivateKey respectively for clarity instead of their Key superclass

Bugfixes

AsyncWriter ensures that its threads are shutdown when exceptions are thrown during the nested endpoint’s open() and close() methods. This fix also ensures that the JobCallback.onFailure() method is called if an asynchronous failure occurs while JdbcMultiWriter.close() is executing.
XmlReader and XmlRecordReader will always use sun.xml.internal.stream.XMLInputFactoryImpl as the XML input factory regardless of the JDK. This release also introduces an optional com.northconcepts.datapipeline.xml.XMLInputFactory JVM param to override the Sun class if needed.
getTypedListenerCount() and getUntypedEventListenerCount() now account for nullified SoftReferences before they are removed by the event bus’s next cleanup. There was a small window in which the count would reflect the old numbers.
Binary deserialization now returns DATETIME field values as java.util.Date instead java.sql.Timestamp while reading in FileReader
CombinedLogReader now explicitly uses Locale.ENGLISH to prevent month name parsing issues outside of English locales
AbstractFieldMapping now clones more types to prevent side effects on the source data during mapping
ExcelPipelineInput now escapes the file path in the generated Java code to open Excel files
OrcDataReader and OrcDataWriter now treat Orc’s TIMESTAMP type as a local datetime instead of adjusting it for the current time zone
NPE when record doesn’t contain an optional field during mapping

Happy 2022!

DataPipeline 7.1 Released

Upgrade Log4J from v1.2.17 to v2.17.0

EventBus Listeners

Database Improvements

API Improvements

I/O Improvements

Debugging Improvements

DecisionTable and DecisionTree Improvements

Pipeline and Dataset Improvements

Other DataPipeline Foundations Improvements

DataPipeline Integration Improvements

Bugfixes

About The DataPipeline Team

Leave a Reply Cancel reply

Data Pipeline

Docs

Company

Tools