DataPipeline 8.1 Released

DataPipeline 8.1.0 is now available.  It adds support for multi-connection upserting to database tables, JDBC read fetch size, and more.  Enjoy.

Java Stream Support

This release expands Java steam support to make it easier for you to work with records, fields, and datasets using modern Java idioms.  See the Java Streams and Iterators user guide page for examples.

 

JDBC Improvements

DataPipeline 8.1 includes another wave of JDBC additions and improvements.

  1. Added a new JdbcMultiUpsertWriter for multi-connection, multi-threaded upserting. See the Upsert Records With Multiple JDBC Connections
  2. The existing JdbcMultiWriter now supports configurable insert strategies, commitBatch, debug flag, and overridable JDBC types. See how to Configure the Insert Strategy used in JdbcMultiWriter.
  3. JdbcReader now has a fetchSize property to configure the frequency of network calls to the database while reading.
  4. JdbcUpsertWriter now supports overridable JDBC types by name and by class.
  5. This released adds two Oracle-specific, multi-record insert strategies:
    1. OracleMultiRowInsertAllStatementInsert uses INSERT ALL
    2. OracleMultiRowSelectUnionAllStatementInsert uses INSERT ALL…INSERT INTO…UNION ALL
  6. The JdbcConnection class now provides progress callbacks as it loads schema metadate from a relational database via a new JdbcConnectionListener
  7. JdbcConnection also has several new methods including loadCatalogAndSchemas(), getCatalogNames() and getSchemas().
  8. The loadTables() method can now configure what is loaded along with the basic table info (columns, indexes, etc.).
  9. The GenerateTableDaoClasses now makes the emitted classes implement Serializable and RecordSerializable if not already implemented by their superclass.

 

SQL Builders

Several of DataPipeline’s features require generating SQL on-the-fly.  Instead of relying on string concatenation, we use a set of builder classes that extend SqlPart.  The release includes improvements to those classes as well as a new PostgreSQL specific package.

  1. Added a new northconcepts.datapipeline.sql.postgresql package for generating PostgreSQL statements, starting with the DDL to create tables, indexes, and more.
  2. SqlPart now has a pretty property to generate formatted SQL.
  3. The criteria() method on Select has been renamed to where().
  4. Select now has a having() method for generating group-by criteria.
  5. Select also has new getJoins(), containsTable(String table), and clearXXX() methods.
  6. Select now supports multiple from tables, right joins, and full join explicitly.
  7. QueryCriteria can now be used for the Select.where() and Select.having() clauses when generating SQL.

 

Code Generation

In addition to the SQL builders DataPipeline has classes for generating code in general.  These classes have been used by the SQL builders and others for some time now.  This release promotes these classes to the core, public API for wider usage.

  1. The CodeWriter, JavaCodeBuilder, and JavaCodeGenerator classes have been moved to the northconcepts.datapipeline.sourcecode package. This is a breaking change, but updating the import should be the only action needed.
  2. The northconcepts.datapipeline.xml.builder.* classes now use CodeWriter instead of java.io.Writer (or the old, internal CodeWriter). This is a breaking change for anyone overriding the XML builder classes.

 

Fast Classes

We’re starting an effort to introduce faster performing transformation classes when dealing with well structured, tabular data.  This release kicks it off with a new FastRenameField class for use when the flexibility of the existing RenameField class’s support non-tabular, varying, and nested datasets are not needed.  See the Rename Fields Quickly in Flat And Tabular Data example.

 

Schema Validation

  1. When mapping and validating, EntityDef previously would not perform any validation if at least one mapping failed. With this release, EntityDef will now continue on and validate any successfully mapped fields.  This will allow you to see the mapping and validation failures together in the result.  See how to perform schema-based validation in the Transform Records using Schema
  2. The GenerateSchemaFromJdbc class has a new generate(String catalog, String schemaPattern, List<String> tableNames, String… types) method to load metadata for specific tables and views instead of relying on name patterns.

 

Data Mapping

  1. DataMapping has a new autoMapping property to map incoming fields from source to target that don’t have an explicit mapping. Explicitly mapped fields will override any auto-mapped values.  This allows your mappings to focused on special cases.  See the Map Data Using Automatic Mapping And Exclude Fields
  2. DataMapping also includes a new excludedFields collection to remove/blacklist fields from mapping output regardless of whether they were auto or manually mapped. Since field exclusion is performed just before the mapping data is returned, excluded values can be used in expression as temporary values.  See the same example above or the declarative version using XML.

 

Dataset

  1. The Column class has a new nullValueFieldTypes property to track type-specific null values to help determine the best type for a column.
  2. The algorithms used in Column.getBestFitFieldType() and getFieldType() have been improved to include null values when type info is available.

 

Parquet

  1. ParquetDataWriter has a new setMaxRecordsAnalyzed(Long) method to indicate how many records should be analyzed and cached in order to determine the Parquet schema. The property only applies when no Parquet schema is explicitly set on the writer.  The default is 1000 and null means analyze everything.  See the Generate Parquet schema by analyzing the first 500 records and Generate Parquet schema by analyzing all records
  2. ParquetDataReader and ParquetDataWriter can now handle unsigned whole numbers, even though Java doesn’t directly support them. See the Read and Write Unsigned Numbers in Parquet Files

 

Jira

  1. The Jira integration now includes a service method to update issues using their ID or key: updateIssue(String issueIdOrKey, JiraIssue jiraIssue). See the Update Jira Issue example.
  2. A new setField(String fieldName, Object value) method was added to replace the now deprecated addField(String fieldName, Object value). Both methods now overwrite the previous value for the given fieldName on subsequent calls.
  3. BUGFIX: JiraSearch.searchIssuesById() was not adding the ID criteria to the JQL.

 

Bug Fixes

  1. The open() and close() methods on ProxyReader and ProxyWriter now checks if their nestedDataReader/Writer are not already open/closed before attempting to open or close them.
  2. The FileReader.autoCloseWriter property has been renamed to autoCloseReader. The previous getter/setter methods are deprecated.
  3. The AsyncWriter(DataWriter nestedDataWriter) constructor now defaults the internal record queue to 500 elements instead of Integer.MAX_VALUE in order to reduce memory usage and prevent possible OOM.

 

Other Changes

See the CHANGELOG for the full set of updates in DP 8.1.0.

About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

Leave a Reply

Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">