DataPipeline 9.0 Released

Welcome to the 9.0 release of DataPipeline.

DataConverter.io

Before we go over the changes, we like to introduce you to https://DataConverter.io/. DataConverter is a place where you can view and convert files from one format to another. Over the next few months, we plan to add more features and formats using DataPipeline as the core. Please reach out to us if you have any questions, requests, or feedback.

Core Changes

  1. Expression language (DPEL) improvements to method call performance and memory usage.
  2. BUGFIX: method calls in expression language now match against methods with char arguments instead of looking for strings.
  3. The expression language now blacklists the jakarta.activation package similar to javax.activation.
  4. UUID field types are now supported across DataPipeline.
  5. BUGFIX: DataException.wrap(Throwable) no longer throws exception when passed a null exception or an InvocationTargetException with a null cause.
  6. TimedReader now has an isTimeExpired() method to indicate if reading stopped due to the time expiring.
  7. ExcelDocument now implements java.io.Closeable and adds close() and isClosed() methods to release resources immediately.
  8. ExcelReader now has an autoCloseDocument flag (default to false) to eagerly close the document.
  9. The Excel provider that streams writing (ExcelDocument.ProviderType.POI_SXSSF) has improved performance and memory usage.
  10. SimpleJsonWriter now supports writing nested records and arrays.
  11. The RenameField trasformer now has a allowDuplicateFieldNames flag (default to false).
  12. XML processing now includes several security improvements.

Foundations Changes

  1. FileType now has constants for HTML, PDF, AVRO_SCHEMA, and FIXED_WIDTH.
  2. FileType also now has lookupByMimeType(String mimeType) and lookupByFileExtension(String fileExtension) methods.
  3. JdbcTable now has a convenience getNameAsJavaIdentifier() method to aid in code generation
  4. JdbcTableColumn also now has a convenience getNameAsJavaClassName() method to aid in code generation as well as new getXXX and isXXX methods.
  5. Column has improved type inferencing/detection (and also detects UUIDs).
  6. Column now includes null records in its null value count.
  7. Dataset has improved exception handling while loading.
  8. Added new LocalFileDataset class to cache the dataset’s records on disk as binary data.
  9. ExcelPipelineInput and ExcelPipelineOutput now has improved ExcelDocument handling if the source/target is a LocalFileSource.
  10. PipelineInput and PipelineOutput now has methods to get the nested and root pipeline input/output.
  11. DateTimePatternDetector can now detect more datetime patterns and has improved performance and memory usage.
  12. GenerateTableDaoClasses now preserves custom code at end of the old class.

Integration Changes

  1. AvroReader now has a maxInvalidRecords property that allows it to continue reading even when some records are broken.
  2. AvroWriter now supports BIG_DECIMAL, BIG_INTEGER, and UUID field types and has improved datetime handling.
  3. OrcDataReader is now better able to read bad Orc files with corrupt records and missing schema
  4. ParquetDataReader now has the following properties to support reading broken Parquet files: makeRequiredFieldsOptional, makeOptionalFieldsRequired, removeFieldsWithoutColumnMetadata, removeFieldsWithoutValues.
  5. ParquetDataWriter now has the following properties and methods to improve writing performance: maxRecordsAnalyzed, cacheFolder, recordsPerCacheFile, setSchema(String schema).
  6. PdfWriter now uses Apache PDF Box instead of the lowagie package.
  7. RtfWriter now uses Apache Poi instead of the lowagie package.
  8. Added ShopifyEventReader to read events.
  9. The MySQL SQL modeling package (com.northconcepts.datapipeline.sql.mysql) has several improvements and additions, including upserts and identifier handling.
  10. Added MySqlInsertWriter and MySqlUpsertWriter to generate “insert into” and “insert into…on duplicate” statements.
  11. Built out the PostgreSQL expression modelling package (com.northconcepts.datapipeline.sql.postgresql) to model insert and upsert statements.
  12. Added PostgreSqlInsertWriter and PostgreSqlUpsertWriter to generate SQL statements.

See the CHANGELOG for the full set of updates in DP 9.0.0.

Also see the JavaDocs and examples for more info.

Happy coding!

About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

Leave a Reply

Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">