Changelog



--------------------------------
8.3 - Nov 10, 2023
--------------------------------

Core Changes
- RecordList now implements the RecordSerializable interface and to/fromRecord(), to/fromBinary(), to/fromJson(), to/fromArrayValue() methods
- added RecordList.ensureCapacity(int minCapacity) 
- added CommaSeparatedValues as a collection string values that can be represented or parsed from a comma delimited string
- BUGFIX: CSVReader now excludes the fieldSeparators when trimming fields to support trimming values in tab separated files
- Expression language now allows nested properties to be surrounded by $ {...} to better support field names with whitespaces and symbols
- the com.northconcepts.datapipeline.jdbc.sql.select.Select class now allows the following fields to be replaced using setters: selection, where, grouping, having, order
- DataReaderDecorator and DataWriterDecorator now use parameterized types to lock down the types supplied to them
- BUGFIX: SimpleXmlReader, XmlReader, XmlRecordReader now closes the supplied file or reader if an exception is raised while still in their constructor


Foundations Changes
- Foundation classes now supports reading invalid expressions in Record, XML, and JSON without throwing exception: AbstractFieldMapping, FieldMapping, DecisionTableCondition, DecisionTableOutcome, DecisionTreeOutcome, CalculatedField, EntityDef
- BUGFIX: DataMapping no longer throws exception if any excluded fields don't exist in the dataset
- DataMappingEditor is now initialized with a DataReaderFactory instead of a Dataset
- DataMappingEditor no longer provides sorting
- FileType now supports a defaultFileExtension property and now includes JSON_LINES, PARQUET, AVRO, and ORC
- Several data mapping and schema validation messages have been improved
- Dataset now has separate flags for recordsLoaded and columnStatsLoaded to compliment the existing dataLoading flag
- Dataset also has new, overridable methods for afterRecordsLoaded() and afterColumnStatsLoaded()
- BUGFIX: DatasetReader now stops after records are loaded instead of waiting for column stats to be processed
- Tree now accepts Reader to detect candidate fields and record breaks in XML and JSON streams
- the GenerateSchemaFromJdbc tool now skips primary key indexes


Integration Changes
- added Shopify integration to read orders, inventory (items & levels), products, locations
- added MySQL integration to build DDL statements programmatically
- added RecordTemplateModel.wrap() in template integration to wrap RecordList and Collection as FreeMarker models




--------------------------------
8.2 - Jul 6, 2023
--------------------------------

Core Changes
- Added FieldList.contains(List fieldNames), containedWithin(String ... fieldNames), and containedWithin(List fieldNames)
- The expression languages improves ClassCastException messages by adding the candidate's value and type
- The expression languages can now optionally produce expressions even when they contain syntax errors 
- SQL builder classes now support unions and sub-queries
- MergeUpsert now automatically terminates with a semicolon for Microsoft SQL Server and SAP ASE (Sybase) databases
- SybaseUpsert now terminates with a semicolon by default
- Added JdbcConnectionFactory.close() for scenarios where the factory needs to handle lifecycle termination
- Job now emits less debug logs


Foundations Changes
- DataMapping and FieldMapping now return any invalid expressions with syntax errors as problems
- Added DataMappingEditor for interactive, UI-based use-cases (in SimpleDataHub) that involve data mapping, sorting, and pagination of a source dataset 
- Added DataMappingReader/DataMappingWriter.onFailure(Record record, DataException exception, DataMappingResult result) to allow error handling to be intercepted/overridden
- Added DataMappingResult.getDataMappingResult(Record record) and setDataMappingResult(Record record, DataMappingResult result) to attach DataMappingResult to Record as session properties
- Added DataMappingValidator.checkValidExpression() to check for invalid expressions with syntax errors
- Added Column.getInferredTextualValueCount(), getTextual() to count text values in column
- Improved type detection algorithm  in Column.getInferredFieldType()
- Added Dataset.isDataLoading(), getDataLoadException(), getMaxRecordsToLoad(), and afterLoad() to better support interactive, UI-based use-cases
- BUGFIX: DatasetReader.readImpl() will now wait until the dataset produces records or closes instead of returning null when no records are immediately available
- BUGFIX: SchemaValidator.checkEntityRelationshipCardinality now correctly checks for invalid one-to-many relationships
- Added initial release of DetectPrimaryKeysInDataset tool


Integration Changes
- Added JiraService.close() to explicitly, eagerly close the REST endpoint


FileSystem Changes
- Added AmazonS3FileSystem.getEndpointConfiguration(), setEndpointConfiguration(EndpointConfiguration endpointConfiguration), setClient(AmazonS3 client)
- AmazonS3FileSystem.getClient() now returns AmazonS3 instead of AmazonS3Client
- BUGFIX: AmazonS3FileSystem can now write empty files to S3


--------------------------------
8.1 - May 8, 2023
--------------------------------

Core Changes
- Added FieldList.toArray(), forEach(Consumer consumer), stream(), parallelStream()
- Added RecordList.forEach(Consumer consumer), stream(), parallelStream()
- Added Record.forEach(Consumer consumer), stream(), parallelStream()
- Added ConcurrentRecordList.forEach(Consumer consumer), stream(), parallelStream()
- Added ArrayValue getValues(), getValueStream(), forEach(Consumer> consumer), stream(), parallelStream()
- The AsyncWriter(DataWriter nestedDataWriter) constructor now defaults to a queue of 500 elements instead of Integer.MAX_VALUE to reduce memory usage
- BUGFIX: ProxyReader & ProxyWriter.open() now checks if the nestedDataReader is not open before attempting to open it
- BUGFIX: ProxyReader & ProxyWriter.close() now checks if the nestedDataWriter is not closed before attempting to close it
- BUGFIX: Renamed the FileReader.autoCloseWriter property with autoCloseReader property (previous name is deprecated)
- IInsert and IUpsert are now cloneable
- MultiRowStatementInsert exposes more of its internal SQL generation methods for subclasses
- Added OracleMultiRowInsertAllStatementInsert to support multi-record insert using INSERT ALL
- Added OracleMultiRowSelectUnionAllStatementInsert to support multi-record insert using INSERT ALL...INSERT INTO...UNION ALL
- Added SqlPart.pretty to generate formatted SQL
- SqlPart now delegates SQL generation to collectSqlFragment(CodeWriter writer) instead of getSqlFragment() directly
- Select now always wraps AND and OR criteria in parenthesizes
- Select.criteria is now Select.where() (previous methods are deprecated)
- Select.offset and count are now nullable instead of relying on -1
- Added Select.having() and related method for generating group-by criteria
- Added Select.getJoins(), containsTable(String table), clearXX()
- Select now supports mutiple from tables, right joins, and full join explicitly
- QueryCriteria can now be used for Select.where and Select.having clauses when generating SQL
- Added JdbcMultiUpsertWriter for multi connection upserting
- JdbcMultiWriter now supports configurable insert strategies, commitBatch, debug flag, and overridable JDBC types
- Added JdbcReader.fetchSize to configure the network calls to the database
- JdbcUpsertWriter now supports overridable JDBC types
- Promoted internal code generation classes to the public API: CodeWriter, JavaCodeBuilder, JavaCodeGenerator
- Added FastRenameField for use when the slower RenameField that handles non-tabular datasets is not needed (see the Javadoc for requirements)
- The com.northconcepts.datapipeline.xml.builder.* classes now use CodeWriter instead of java.io.Writer (or the old, internal CodeWriter).  This is a breaking change where these classes were being overridden
- XmlWriter now uses the new CodeWriter internally; this should have no visible side-effect


Foundations Changes
- The DP Foundation classses in com.northconcepts.datapipeline.foundations.sourcecode have been moved to com.northconcepts.datapipeline.sourcecode.CodeWriter in DP Core.  This is a breaking change, but updating the import should be the only action needed.
- Added DataMapping.autoMapping to map the field from source to target that don't have an explicit mapping
- Added DataMapping.excludedFields to blacklist fields from mapping (regardless if they were auto or manually mapped)
- DataMappingValidator no longer identifies source expressions with "source.", "target.", "previousSource.", or "previousTarget." as mapping problems 
- Added JdbcConnection.listener to provide progress callbacks as it load schema metadate from a relational database
- Added JdbcConnection.getIsCatalogAtStart(), getSupportsCatalogsInDataManipulation(), getSupportsCatalogsInTableDefinitions(), getSupportsCatalogsInPrivilegeDefinitions(), getCatalogNames(), getSchemas()
- JdbcConnection.loadTables() can now configure what is loaded along with the basic table info (columns, indexes, etc.)
- Added JdbcConnection.loadCatalogAndSchemas()
- Added Column.nullValueFieldTypes to track type-specific nulls to help determine best type for column
- Improve algorithm used in Column.getBestFitFieldType() and getFieldType() to include null values  when type info is available
- Rename DataTimePatternMatch to DateTimePatternMatch.  This is a breaking change where the old class name was used.
- Added Dataset.forEach(Consumer consumer) and stream()
- Added DatasetSpliterator to traverse records cached in a Dataset
- Added "private static final long serialVersionUID = 1L;" to vairous classes to ease compatibility when used in Java serialization
- Added GenerateSchemaFromJdbc.generate(String catalog, String schemaPattern, List tableNames, String... types) to load metadata for specific tables
- GenerateTableDaoClasses now makes the emitted classes Serializable and RecordSerializable if not already through their optional superclasses
- EntityDef will now validate successfully mapped fields instead of preventing validation on any mapping failure


Integration Changes
- Added JiraService.updateIssue(String issueIdOrKey, JiraIssue jiraIssue)
- JiraIssue.addField(String fieldName, Object value) now overwrites the previous value on subsequent calls
- Added JiraIssue.setField(String fieldName, Object value) and deprecated addField(String fieldName, Object value)
- BUGFIX: JiraSearch.searchIssuesById() was not adding the ID criteria to the JQL
- Added ParquetDataWriter.setMaxRecordsAnalyzed(Long) to indicate how many records should be analyzed and cached to generate the Parquet schema if no schema was explicitly set
- ParquetDataReader and ParquetDataWriter now handles unsigned whole numbers
- Added com.northconcepts.datapipeline.sql.postgresql for classes used to generate PostgreSQL DDL SQL



--------------------------------
8.0 - Dec 22, 2022
--------------------------------

Core Changes
- ArrayValue now has getValueAsInstant(int) to retrieve date-time values as Instant
- Field now has getValueAsInstant() and setValue(Instant value) to convert Instant to and from date-time values
- Record now has getFieldValueAsInstant(String fieldPath, Instant defaultValue) to retrieve date-time values as Instant
- SingleValue now has getValueAsInstant() to retrieve date-time values as Instant
- ValueNode now converts java.time.Instant to a DATETIME field instead of treating it as UNDEFINED
- BUGFIX: AsyncMultiReader no longer blocks when repeatedly attempting to read past the last record
- Added AsyncTaskReader as a convenient way to apply work to a DataReader using multiple threads 
- FieldType now has methods to determine if a type's byte count is fixed or varies, if a numeric type is an integer/whole or real type, and the number of bytes in numeric types: isArbitrarySizeInBytes(), isFixedSizeInBytes(), isIntegerNumber(), isRealNumber(), getNumericSizeInBytes(), getMinimumSizeInBytes()
- LineParser now has methods to getLineText() and getRemainingLineText()
- DP has started moving to use DateTimeFormatter instead of SimpleDateFormatter
- Added convenience Record.getFieldOrNull(String fieldName) method 
- BUGFIX: Record.moveField(), moveFieldBefore(), moveFieldAfter() no longer decrements the target index if it's less than the source index
- Added Record.sortFieldsByName()
- RecordSerializable now has static methods to convert ArrayValue to/from collections
- XmlSerializable now uses a standard way to convert date, times, and bytes to/from string
- SingleValue to standardizes getValueAsString() by using FieldType.valueToString() for its conversion
- BUGFIX: ExcelReader now respects the startingColumn property
- ValueMatch is now cloneable and implements equals and hashCode
- FieldCount, FieldExists, FieldFilter, FieldNotExists, FilterExpression now implements equals and hashCode
- The expression language (DPEL) now understands the "this" reference as the current expression context itself
- The expression language (DPEL) function "isOneOf(Object actual, Object ... expected)" now returns boolean instead of Object
- BUGFIX: SimpleJsonReader no longer skips fields containing only empty string
- The SQL builder classes under com.northconcepts.datapipeline.jdbc.sql now share a common SqlPart base class
- JdbcReader now uses a new OPINIONATED default JdbcValueReader to map and read databases types to Java 
- JdbcValueReader now also contains built-in STRICT and ORIGINAL options for databases-to-Java type mapping and reading
- Added JsonLinesWriter to support writing streaming JSON formats
- JsonReader has a new setUseBigDecimal() method to deprecate useBigDecimal()
- Expanded the field-level data lineage metadata provided for JDBC data sources
- BUGFIX: RetryingReader and RetryingWriter now retries up to maxRetryCount instead of just one before
- RetryingReader now includes more details when an exception occurs
- Added RetryStrategy.NO_BACKOFF to immediately retry without any delay
- Added BasicFieldTransformer.nullToValue(final Instant value) to handle java.time.Instant
- RenameField now allows FieldPath directly as the source/old field name
- SetField now supports java.time.Instant
- Added Transformer.getEndpoint() to deprecate geEndpoint()


Foundations Changes
- Added com.northconcepts.datapipeline.foundations.core.Attributes and Tags for use in Foundation classes
- Added getDataMappingProblems() and getSchemaProblems() methods to retrieve model issues in data mapping and schema classes for presentation to users
- BUGFIX: the Java code generator in DataMapping now generates escaped strings for non-printable characters 
- Added a com.northconcepts.datapipeline.foundations.difference package and classes to compare schemas, records, etc.
- Added com.northconcepts.datapipeline.foundations.schema.diff with schema-specific diff classes
- Added FileType.lookup(String fileTypeAsString) to find FileType by name
- JdbcConnection now uses JdbcValueReader as its sqlToJavaTypeMapper, defaulting to OPINIONATED
- Added JdbcIndex, JdbcFieldIndex, and JdbcQueryParameter to the com.northconcepts.datapipeline.foundations.jdbc package to model more RDBMS metadata
- Added parameters and cardinality to JdbcQuery
- Added javaType, className, and signed to JdbcQueryColumn
- Added className, signed, isNumericSqlType(), isBooleanSqlType(), and isTemporalSqlType() to JdbcTableColumn 
- Added JdbcResultPage for use in generated DAO classes for tables, views, and queries
- Added indexes to JdbcTable
- Added com.northconcepts.datapipeline.foundations.number.NumberDescriptor, NumberDetector, and NumberMatch to parse and analyze numbers
- Dataset analysis now includes arrayValueCount, minimumArrayElements, maximumArrayElements, numberDescriptor, bigNumberDescriptor, getBestFitFieldType(), and getInferredFieldType() in each Column
- Dataset analysis can now optionally disable detectTemporalValues, detectNumericValues, detectBooleanValues, detectBigNumberValues, collectUniqueValues
- Dataset is now implements Iterable for easy use in enhanced for loops
- MvStoreDataset now has static convenience methods to create and open datasets on disk: createTempDataset(AbstractPipeline pipeline), createTempDataset(File databaseFolder, AbstractPipeline pipeline), createDataset(File databaseFile, AbstractPipeline pipeline), openDataset(File databaseFile)
- Added MvStoreDataset.deleteDatabaseFileOnClose flag for auto cleanup
- BUGFIX: ExcelPipelineInput now escapes the sheet name in generated Java code
- BUGFIX: FixedWidthPipelineInput now escapes the field names in generated Java code
- JdbcPipelineInput now generates JdbcReader Java code with the valueReader set to JdbcValueReader.OPINIONATED instead of DEFAULT
- XmlPipelineInput, XmlRecordPipelineInput now escapes stringd in generated Java code
- Added numberDetector to AbstractPipeline for use in dataset analysis
- Added NullPipeline, ProxyPipelineInput, ProxyPipelineOutput
- Pipeline now has convenience methods to setInputAsDataReaderFactory(DataReaderFactory factory), setInputAsDataReader(DataReader reader), setOutputAsDataWriterFactory(DataWriterFactory factory), setOutputAsDataWriter(DataWriter writer)
- The schema classes can now generate Java code: SchemaDef.generateJavaCode(JavaCodeBuilder), EntityDef.generateJavaCode(JavaCodeBuilder)
- The schema classes now implement equals() and hashCode()
- EntityDef can now extend another EntityDef in the same schema by its name (is-a)
- EntityDef can now have indexes like a database table's indexes
- SchemaDef can now have a list of EntityRelationshipDef to connect tables like relationships in a RDBMS
- FieldDef can now model primary keys and arrays
- NumericFieldDef now validates the precision and scale of values it sees
- Added RecordFieldDef to model nested structures (has-a and uses-a)
- TemporalFieldDef now uses DateTimeFormatter instead of SimpleDateFormat
- Added TemporalFieldDef.lenientPattern (default true) to indicate if parsing should interpret inputs that do not precisely match the pattern
- TextFieldDef.allowBlank now defaults to true
- ValidationMessage now includes a stacktrace field
- Added DataTimePatternMatch.fieldType to indicate if the match is a DATETIME, DATE, or TIME
- Added ConvertSnakeCaseToCamelCase tool to convert the names of entities, field, indexes, etc from database naming convention to Java naming convention
- Added GenerateEntityFromDataset tool to create schemas from a Dataset loaded from any source
- Added GenerateEqualsAndHashCodeMethods tool to create the equals() and hashCode() Java code for a set of classes based on their fields
- Added GenerateSchemaFromJdbc tool to create schemas from a database
- Added GenerateSchemaFromRecord tool to create schemas from JSON or any record with nested structures from any source
- The GenerateQueryDaoClasses tool is now public and can now be used to to generate DAOs based on queries
- The GenerateRecordSerializers tool now better handle nested fields
- The GenerateSpringDataJpaClasses tool now allows entities and repositories to be generated to separate packages
- The GenerateTableDaoClasses tool now generates static count() and findAll() methods 


Integration Changes
- The JiraClient was expanded to create, update, delete, and transition issues; the client also now includes methods to create and retrieve issue comments
- Added JiraService to ease working with the JiraClient/API
- JiraSearch can now search by a set of Jira IDs
- mailchimp now includes the "archived" status to the existing subscribed, unsubscribed, cleaned, and pending
- Added GenerateSchemaDefFromParquet tool generate a SchemaDef from a Parquet file or schema
- Added ParquetDataWriter.defaultBigNumberPrecision for writing BigDecimal and BigInteger schema where all values were null
- ParquetDataWriter now analyzes the entire dataset to generate a schema when no schema is explicitly provided
- Added ParquetDataWriter.columnStatsReaderThreads to indicate the number of threads to use analyzing values to generate the schema (default 2)


--------------------------------
7.1 - Jan 10, 2022
--------------------------------

Core Changes
- Upgrade Log4J dependency from v1.2.17 to v2.17.0
- Bugfix: ensure "com.sun.xml.internal.stream.XMLInputFactoryImpl" is always used as the XML input factory in XmlReader and XmlRecordReader
- ParsingReader now accepts a charsetName
- Added convenience Record.getFieldValueAsBytes(String fieldPath, byte[] defaultValue):bye[] method
- Added support for XML 1.1 declarations to XmlSerializable interface: XmlSerializable.writeXml(T bean, StreamResult outputTarget, boolean closeStream, boolean addXml11Delcaration)
- XmlSerializable.setAttribute() and getAttribute() now supports non-empty, whitespace XML attribute values
- Overrode the setter methods to return the superclass in the following classes: CSVReader, CSVWriter, ExcelReader, ExcelWriter, FixedWidthReader, FixedWidthWriter, JdbcReader, JdbcWriter, JsonReader, JsonWriter, XmlReader, XmlWriter
- EventBus now uses strong references for listeners by default instead of SoftReferences (no more disappearing listeners when the JVM runs out of memory)
- Added EventBus.useStrongListenerReference property to select between strong references and SoftReferences for listeners
- Bugfix: EventBus.getTypedListenerCount() and getUntypedEventListenerCount() now account for nullified SoftReferences before they are removed by the event bus's next cleanup
- FileWriter now flushes on close if autoCloseWriter is false
- Bugfix: Binary deserialization now returns DATETIME field values as java.util.Date instead java.sql.Timestamp while reading in FileReader
- Parsing explicitly logical expressions now handles untyped expressions and performs method coercion when necessary
- The SQL builder classes under the com.northconcepts.datapipeline.jdbc.* packages now support a "debug" property
- Added com.northconcepts.datapipeline.jdbc.sql.delete.Delete to SQL builder classes
- Added com.northconcepts.datapipeline.jdbc.sql.insert.Insert.getRows() and getLastRow() to access row data and retrieve candidate values to insert
- Added com.northconcepts.datapipeline.jdbc.sql.insert.InsertRow.getParameterValues() to retrieve candidate values to insert
- Added com.northconcepts.datapipeline.jdbc.sql.update.Update.getParameterValues()  to retrieve candidate values to update
- GenericUpsert, MergeUpsert, MySqlUpsert, PostgreSqlUpsert, and SybaseUpsert now support upserts with no non-key fields
- Overloaded JdbcConnectionFactory.wrap() to accept driverClassName, url, username, and password
- JdbcMultiWriter, JdbcReader, JdbcUpsertWriter, JdbcWriter, and JdbcLookup now accepts JdbcConnectionFactory to wrap DataSource and other connection creation options
- Added JdbcWriter.debug to optionally log generated SQL
- Added JdbcWriter.setJdbcType(Class type, int jdbcType) to allow overriding the JDBC type sent to the database based on the Java class type
- JdbcWriter now uses the type overrides for both null and non-null values
- Bugfix: CombinedLogReader now explicitly uses Locale.ENGLISH to prevent month name parsing issues outside of English locales
- Added open() and close() methods to Filter, Transformer, and Lookup to called by the endpoints that use them
- Added JdbcLookup.autoCloseConnection to force database connection closing when the lookup is closed
- Added DebugWriter to ease logging of outgoing records
- Bugfix: Ensure AsyncWriter closes down threads when exceptions are thrown during the nested endpoint's open() and close().  This fix also ensures that the JobCallback.onFailure() method is called if an asynchronous failure occurs while JdbcMultiWriter.close() is executing.
- DataException.toString() now includes its key-value properties
- Added toTimestamp(Object value) to DPEL functions
- improved to/from Timestamp and Moment handling in DPEL functions
- Added default constructor to MemoryReader and add(Record record) method
- Added Select.page(int page, int pageSize) to remove the need to calculate limit and offset 
- Excluded more method prefixes in DPEL (for example java.security, java.util.concurrent, java.util.prefs, and more)

Foundations Changes
- 19-digit whole numbers are mapped to Long instead BigDecimal in JdbcTableColumn and code generators
- Map fields on a subclass of Bean are now sorted by key instead of random ordering when serialized to JSON
- Added optional AbstractFieldMapping.type property to auto convert values after mapping
- Bugfix: AbstractFieldMapping now clones other field values types besides records to prevent moving references from source to target during mapping
- DataMapping now implements JavaCodeGenerator for Java code emitting use-cases
- Added optional DecisionTable.defaultOutcomes to explicitly define the results when no rules match/fire
- Added overloaded DecisionTable/DecisionTree.addField(String variable, String expression, boolean includeInOutcome) to easily include a calculated field in the results
- Added CalculatedField.includeInOutcome property
- Allow DecisionTableCondition to have a null variable
- Added convenience constructor DecisionTableCondition(String expression) that doesn't explicitly require a variable 
- Added convenience overloaded DecisionTableRule.addCondition(String expression)
- Added FileSink/FileSource.getPath() since the subclasses already had getPath()
- Added JdbcConnection.connectionFactory property to support DataSource and other connection creation options
- Added JdbcConnection.getTablesSorted() and getTablesSortedTopologically()
- JdbcConnection.getJavaTypeOverride(String databaseTypeName) now treats databaseTypeName case insensitively
- Added JdbcImportedKey.getQualifiedPrimaryKeyTable() and getQualifiedForiegnKeyTable()
- Added JdbcImportedKey.isPrimaryKeyTable(JdbcTable table) and isForiegnKeyTable(JdbcTable table)
- Added JdbcTable.getQualifiedName(), getNameAsJavaClassName(), getNonPrimaryKeyColumns(), isDirectlyDependentOn(JdbcTable table), and isDependentOn(JdbcTable table)
- Added JdbcTableColumn.getNameAsJavaIdentifier(), isBinarySqlType(), and isTextSqlType()
- Added parameters to JdbcQuery for input arguments
- Added JdbcConnection.loadQueries() to retrieve query metadata
- Column now uses long instead of int for counts and LongAdder instead of LongPointer for histograms/summaries
- Added Column.getNonNullCount(), getNonNullNonBlankCount(), getInferredNumericValueCount(), getInferredTemporalValueCount(), getInferredBooleanValueCount(), getTemporalPatternCount(), getTemporalPatterns() 
- Added overloaded Dataset.createDataReader to read cached data
- Added Dataset.getRecordList(long offset, int count) to read a page of cached data
- Added Dataset.maxColumnStatsRecords property for the number of records to use when calculating column stats
- Added Dataset.columnStatsReaderThreads property for the number of threads to use to process column stats (default 2)
- Added Dataset.cancelLoad() to terminate the asynchronous data loading and column stats calculation
- Added overloaded Dataset.waitForColumnStatsToLoad() to block until asynchronous column stats processing completes
- DatasetReader can now be created with an offset and count to facilitate pagination
- Renamed MemoryDataset.reset() to clear()
- Improved internal concurrent dataset processing
- Tree now extends Bean
- Rename Tree.fromJson(File file) to loadJson(File file)
- Bugfix: ExcelPipelineInput now escapes the file path in the generated Java code to open Excel files
- Expanded the value types supported in the generated Java code from JdbcPipelineInput
- Added autoCloseReader property to JsonRecordPipelineInput, XmlPipelineInput, and XmlRecordPipelineInput
- Added AbstractPipeline.dateTimePatternDetector property
- Added AbstractPipeline.generateJavaCodePreProcess(JavaCodeBuilder code), generateJavaCodeImpl(JavaCodeBuilder code), generateJavaCodePostProcess(JavaCodeBuilder code), and getJavaCode()
- Added FieldDef.example property
- JavaCodeBuilder.addImport(Class importClass) now ignores primitive types, classes in the default package, and classes in the java.lang package
- Added com.northconcepts.datapipeline.foundations.time.DataTimePatternMatch, DateTimePattern, and DateTimePatternDetector
- Added GenerateTableDaoClasses and GenerateQueryDaoClasses to generate data access Java beans using the table and query metadata from a live database
- Added GenerateSpringDataJpaClasses to generate Spring Data JPA entities and repositories using the table metadata from a live database
- Bugfix: NPE when record doesn't contain an optional field during mapping

Integration Changes
- Added AvroPipelineInput, AvroPipelineOutput to allow Avro to participate in pipelines
- Bugfix: OrcDataReader and OrcDataWriter now treat TIMESTAMP as a local datetime instead of adjusting it for the current time zone
- Added ParquetPipelineInput, ParquetPipelineOutput to allow Parquet to participate in pipelines
- Added ParquetDataWriter.setSchema(String schema) to override the schema used when reading
- Exposed the configuration property used in ParquetDataWriter
- AsymmetricDecryptingReader now explicitly relies on PrivateKey instead of Key for clarity
- AsymmetricEncryptingReader now explicitly relies on PublicKey instead of Key for clarity


--------------------------------
7.0 - Aug 10, 2021
--------------------------------

Core Changes
- DataPipeline Express no longer has a record limit, it if free to process unlimited amounts of data
* [New Feature] Data lineage -- many DataReaders can now add metadata to each record and field indicating where they were loaded from.  See DataReader.isLineageSupported(), setSaveLineage(boolean saveLineage), isSaveLineage()
- Added FieldLineage and RecordLineage convenience wrapper classes to access and modify data lineage properties.
- AbstractReader can now set field names using a collection
* Field and ArrayValue can now accept and retrieve temporal elements as LocalDateTime, LocalDate, LocalTime; converting them to java.sql.Date, java.util.Time, and java.util.Date respectively
- Field and ArrayValue now have a hasValue() to tell if they are populated
- DataException.isCauseInterruptedException() now checks for FileLockInterruptionException and goes more than one level deep
- DebugReader has new constructors that make it easier to create instances that write to the default console
* BigDecimal and BigInteger are now supported as their own top-level types instead of labelled as double and long
- Updated ExcelWriters to support writing BigDecimal and BigInteger as strings since they are not supported in Microsoft Excel
- MultiRowStatementInsert, JdbcUpsertWriter, JdbcWriter now supports BigDecimal and BigInteger
- BasicFieldTransformer and SetField transformers now supports BigDecimal, BigInteger, LocalDateTime, LocalDate, and LocalTime
- Rounder now handles BigInteger fields and negative rounding (left of the decimal)
- FieldType now contains a mapValue(Object value) method to convert objects and strings to the most appropriate Java type
- IParser and its implementations (LineParser, Parser) can now match until a terminating character or EOF
- added new JsonSerializable, XmlSerializable, and RecordSerializable for classes that can be serialized to and from JSON, XML, and DataPipeline records
- Node.getSessionProperty(String name) now automatically typecasts to the assigned type
- ProxyReader and ProxyWriter now has overloaded map(..., Function mapper) methods to quickly apply functions to DataReaders
- Bugfix: Record's copy constructor now copies session properties from the supplied record
- Record can now retrieve fields as an ArrayList
- Record now has containsNonNullField() methods to check that every element in a path expression (like customer.address[0].city) are all not null
- Added support, and new methods, for non-field values (like array elements) in Record.containsXXX and getFieldValueAsXXX
- Record can now produce pretty (formatted) JSON using the toJson() methods
- Records now have an easy way to retrieve values (simple or nested) or return a default using the getFieldValueAsXXX(fieldPath, defaultValue) methods
- Record now provides a way to create any fields that don't exist from a set using ensureFields()
- RecordList can now add all records from another RecordList or a collection using addAll()
- SingleValue now has getValueAsXXX() to retrieve its value as LocalDateTime, LocalDate, and LocalTime
- StreamWriter now has a RECORD_WITH_SESSION_PROPERTIES format and newSystemOutWriterWithSessionProperties() method to emit record and field session values in the output
- TextReader can now has getFile() and getLineNumber() to return their associated file and current line number while reading
- Added a new TextStreamWriter that doesn't force the concept of fieldNamesInFirstRow and doesn't assume the output is line based
- Added a new TimedReader that reads an underlying DataReader for a maximum amount of time
- ValueNode now has isSingleValue() and isNotSingleValue() to better distinguish subclasses
- Overloaded CSVReader.parseLine() and parse() to accept a trimFields flag
- CSVReader now has getColumnNumber() to return its current column while reading
- CSVWriter can now change its line endings using setNewLine(String newLine)
- CSVWriter now allows quotes to be empty strings 
- Removed Diagnostic.send(), it can no longer be used by developers to send us troubleshooting data
- Diagnostic now has a main method allowing it to be called directly from the library jar
- Diagnostic now emits more data help troubleshooting license and initialization issues
- Diagnostic now overrides toString() to retrieve it troubleshooting data as a string
- EventBus can now use setPublishLifecycleEvents(false) to disable publishing of its own lifecycle events
- ExceptionListeners can now be added to the EventBus without a filter
- Bugfix: EventBus now silently discards its lifecycle events while shutting down instead of logging an exception
- ExcelDocument can now retrieve it associated file using getFile()
- ValueMatch can now return its values as an ArrayList using getValues() 
- Expression language functions toDate(), toTime(), and toDatetime() now take Object instead of Date to convert from java.time.* classes
- Added toBoolean(Object), capitalize(String), uncapitalize(String), wapCase(String), substring(string, beginIndex)  to expression language
- improved error messages when calling methods in the expression language
- Moment can now be created from the LocalDate/Time classes
- Moment now has getDateTime(), getDatePart(), and getTimePart() methods
- PostgreSqlUpsert now supports ConflictAction (UPDATE and DO_NOTHING)
- Added JdbcConnectionFactory and wrap() for code that needs to accept both Connections and DataSources
- Added JsonRecordWriter to emit JSON using the structure of each record's natural representation
- Added XmlRecordWriter to emit XML using the structure of each record's natural representation
- Added MapWriter to collect values from records into a java.util.Map
- FieldTransformer can now be constructed from a collection of names
- Added MoveFieldToIndex transformer to move a named field to a new position in each record
- Added RemoveDuplicateFields to collapses fields with the same name in each record
- TransformingReader now allows setting its optional condition as an expression language string
- Added Node.isNull() and isNotNull() since it is already implemented by all subclasses
- Bugfix: Record.getFieldValueAsChar(String, Character) no longer throws ClassCastException

Foundations Changes
- Bean now implements RecordSerializable and XmlSerializable
- Removed all overloaded Bean.fromJsonString() and toJsonString() methods
- Encoder now extends RecordSerializable 
- AbstractFieldMapping.setDefaultValueExpression now allows null & empty values instead of throwing exceptions
- Bugfix: AbstractFieldMapping.mapField() the targeValue if it's a Record to prevent moving it from its source/parent
* Removed targetValidationEntity from DataMapping, use DataMappingPipeline instead
* DataMapping now has sourceEntity and targetEntity for automatic data conversion and validation
- DataMappingReader and DataMappingWriter now have discard writer functionality
- DecisionTableReader and DecisionTreeReader now have access to sourceRecord in the expression language
- FileSink now has an isAppend() property and the append argument has been removed from getOutputStream()
- LocalFile is now an abstract superclass of LocalFileSink and LocalFileSource
- FoundationMessages now extends CoreMessages from DataPipeline Core
- Bugfix: AggregateGroupFieldsAction now saves all group fields during record serialization
- Added CompositePipelineAction to group arbitrary actions in a pipeline
- Decoupled Dataset and Column from MVStore
- Dataset is now an abstract base for MemoryDataset and MvStoreDataset
- Added DatasetReader to read records from a Dataset
- Added DatasetPipelineInput to source pipeline data from an existing Dataset
- JdbcPipelineInput now uses a JdbcConnectionFactory instead of a JdbcConnection for its database connection
- Renamed JdbcPipelineInput.query to queryString to match JdbcReader; the old getter/setters are now deprecated
- Added JsonPipelineInput, JsonRecordPipelineInput, XmlPipelineInput, XmlRecordPipelineInput
- Added JsonRecordPipelineOutput, XmlRecordPipelineOutput
- Added AbstractPipeline as the base class for Pipeline and the new DataMappingPipeline
- Both PipelineInput and PipelineOutput are not abstract classes instead of interfaces
* EntityDef can now automatically convert and validate data using a set of new map*() and validate*() methods
- FieldDef can now define incoming fields by their position
- FieldDef can now initialize missing fields using a defaultValueExpression
- NumericFieldDef and TemporalFieldDef now support pattern validation
* Added SchemaTransformer to automatically convert and validate data using an entity's schema
- JavaCodeBuilder can now generate imports using a Class instance

Integration Changes
- Added Bloomberg integration: BloombergMessage and BloombergMessageReader
- Added Jira integration: JiraEpicReader, JiraIssueReader, JiraProjectReader, JiraSprintReader
- Added Apache Orc integration: OrcDataReader, OrcDataWriter
- Added Apache Parquet integration: ParquetDataReader, ParquetDataWriter
- Added Twitter v2 integration: TwitterFilterStreamReader, TwitterFollowerReader, TwitterFollowingReader, TwitterSearchReader, TwitterTimelineMentionsReader, TwitterTimelineTweetsReader
- TemplateWriter now supports nested expressions directly with ability to disable in constructor (to revert to previous behaviour)
- Added encryption and decryption readers: AsymmetricEncryptingReader, AsymmetricDecryptingReader, SymmetricEncryptingReader, SymmetricDecryptingReader

--------------------------------
6.0 - Oct 07, 2020
--------------------------------
- BigDecimal and BigInteger are now fully supported natively by new BIG_DECIMAL and BIG_INTEGER FieldTypes and endpoints throughout the engine
- CSVWriter now allows empty quote strings
- Added FieldType.getJavaType() to retrieve the default Java representation for any field
- MultiWriter now has a new REPLICATE_CLONE (ReplicateCloneWriteStrategy), default strategy
- Added Record.getFields():ArrayList
- Added Record.containsNonNullField(FieldPath path) and containsNonNullField(String fieldPathExpression) to determine if a record contains a field the value is not null
- Record overrides the default Java serialization to improve performence converting to/from a byte stream
- Added several Record.getFieldValueAsXXX(fieldPath, defaultValue) methods to provide a quick value retrievals
- Added a new TimedReader class
- The Diagnostic class now logs to the console or string and no longer posts to our support form
- The Diagnostic class now includes info to troubleshoot license loading issues
- Added a EventBus.publishLifecycleEvents flag to allow optional silencing of the bus' own lifecycle events
- Added EventBus.isNotAlive() to indicate if the bus has been shutdown or is in the process of shutting down
- Added EventBus.addUntypedEventListener(UntypedEventListener listener) and deprecated addListener(UntypedEventListener listener)
- Added EventBus.addUntypedEventListener(EventFilter filter, UntypedEventListener listener, Object topic) and deprecated addListener(EventFilter filter, UntypedEventListener listener, Object topic)
- Added EventBus.removeUntypedEventListener(UntypedEventListener listener) and deprecated removeListener(UntypedEventListener listener)
- Added EventBus.removeUntypedEventListenerReference(Reference reference) and deprecated removeUntypedListenerReference(Reference reference)
- Added EventBus.getUntypedEventListenerCount() and deprecated getUntypedListenerCount()
- The Excel providers (PoiProvider, PoiXssfProvider, and JxlProvider) write BIG_DECIMAL and BIG_INTEGER fields as String to prevent losing precision using native Excel types
- The binary FileReader and FileWriter classes now handle BigDecimal and BigInteger natively
- ValueMatch.get(int index) now returns the parameterized type instead of Object
- Added ValueMatch.getValues():ArrayList
- Added FilterExpression.getExpression() and getExpressionAsString()
- Added expression functions to capitalize(String value), uncapitalize(String value), and swapCase(String value)
- Replaced matchesRegex(String string, String regex, int flags) with the easier to use matchesRegex(String string, String regex, boolean ignoreCase, boolean dotAll, boolean multiLine) in the expression language
- Improved error message when a method call throws exception in the expression lanuage
- MultiRowStatementInsert now handles BigDecimal and BigInteger natively
- PostgreSqlUpsert now supports UPDATE and DO_NOTHING ConflictActions via a new setConflictAction(ConflictAction conflictAction) method
- JdbcUpsertWriter now handles BigDecimal and BigInteger natively
- Added a MapWriter to write key-value fields to a Map
- ThrottledReader and ThrottledWriter can now set and get their measures and can be directly constructed with their units and unitsPerSecond
- BasicFieldTransformer now natively supports BigDecimal and BigInteger 
- Added a new RemoveDuplicateFields to collapse multiple fields with the same name using one of several strategies: RETAIN_ALL, RETAIN_FIRST, RETAIN_LAST, MAKE_ARRAY, RENAME
- SetField now natively supports BigDecimal and BigInteger 
- TransformingReader now allows its condition to be set as an expression via a new setCondition(String condition) method
- DebugReader now has overloaded constructors that default output to system.out
- Improved how DataException detects thread interruptions via exceptions (InterruptedException, InterruptedIOException, ClosedByInterruptException)
- Rounder now supports BigInteger values
- Rounder now supports negative decimal places when dealing with non-BigDecimal/BigInteger values
- BigDecimal equality comparison in the expression loanguage now uses "value1.compareTo(value2) == 0" instead of  "value1.equals(value2)" to return true even if they have different scales (like 7.0 and 7.00)
- Record now contains methods to print formatted JSON: toJson(boolean pretty), toJson(ValueNode recordOrArray, boolean pretty), and toJson(ValueNode recordOrArray, Writer writer, boolean pretty)


5.2 - Feb 09, 2020
- added ConcurrentRecordList for use in multi-threaded scenarios
- added ability to retrieve nested Endpoint and root Endpoint from any Endpoint
- added Endpoint.getSelfTime() and getSelfTimeAsString() for performance testing
- FieldType now has methods to determine if instances are numeric, textual, temporal, boolean, or binary
- DPEL now supports blacklisting and whitelisting methods and blacklists potentially harmful packages and classes such as System and Runtime
- EventBus now has addListener and getPublisher methods that operate without an explicit event source
- DPEL now treats null as zero in cases where it appropriate (like addition and subtraction), otherwise, it returns null for expressions containing a null instead of throwing a NPE (like divide and multiply)
- added new functions to DPEL
- improved error messages in DPEL and moved them to a class-based resource bundle
- Upgraded Jackson 1 to version 2
- JsonReader can now return BigDecimals when the useBigDecimal flag is set
- JdbcWriter now supports configurable insert strategies via the IInsert interface
- JdbcUpsertWriter now includes specialized upsert steategies for Oracle, PostgreSQL and Sybase

5.1 - Jul 30, 2019
- Added toJson(), toXML(), and toBinary() to ArrayValue
- NullReader -- extending java.io.Reader -- moved to internal package and replace with one extending DataReader
- Removed NullDataReader -- use NulReader instead
- Added getDocument():ExcelDocument to ExcelReader and ExcelWriter
- Added md5() hashing function to expression language
- Added com.northconcepts.datapipeline.io.InputStreamFactory
- Added GenericUpsert.insertFirst to allow switching insert-update attempt order
- Added GenericUpsert.debug to log the generated SQL
- Added com.northconcepts.datapipeline.jdbc.upsert.MergeUpsert as a batch-able alternative to GenericUpsert for databases that support the SQL MERGE statement

5.0 - Feb 18, 2019
- Moved the following endpoints to separate add-on modules: AmazonS3FileSystem, AvroReader/Writer, EmailReader, InstagamReaders, KafkaReader/Writer, MailChimpReader, MongoDBReader/Writer, PdfWriter, RtfWriter, TemplateWriter, TwitterReaders/Writers
- Moved several methods from DataEndpoint into new superclass Endpoint
- BUGFIX: DataObject.resetID() no longer increments the instance's sequence ID
- added IParser/Parser.peekWhitespaceLength(int index), peekStringArray(CharSequence[] lookahead, int offset), and peekStringArray(char[][] lookahead, int offset)
- added ParsingReader.getLineNumber() and matchLINE()
- added SequenceWriter to write to a set of DataWriters created by a factory in turn
- BUGFIX: SortingReader could cause a memory leak if the it's underlying reader was not released after the sorting completed (normally pipelines hold all their readers until garbage collected -- this is now a special case)
- added CSVReader.allowQuoteInField to indicates if unescaped quotes are allowed in fields (default to false)
- CSVWriter now quotes numbers starting with zero where the length is greater than 1
- Added FieldCount filter to check the expected number of fields
- The FieldExists & FieldNotExists filters can now accept a variable number of fields, not just one
- Optimized use of BigDecimal to reduce instance count in GroupByReader
- Added AbstractJobCallback and DefaultJobCallback
- Added DataReaderFactory and DataWriterFactory to create DataReaders and DataWriters
- Added DataReaderIterator to read from a java.util.Iterator of DataReader and combine them into a single stream
- Added LookupTransformer. allowNoResults to indicate that no exception should be thrown if no results are found
- Added GetUrlQueryParam transformer to retrieve query param values from a URL
- Added BasicFieldTransformer.stringToDateTime(final String pattern, final String timeZoneId), stringToDate(final String pattern, final String timeZoneId), stringToTime(final String pattern, final String timeZoneId)
- Added BasicFieldTransformer.numberToDateTime(final long multiplier), millisecondsToDateTime(), secondsToDateTime(), minutesToDateTime(), hoursToDateTime(), daysToDateTime()
- Added BasicFieldTransformer.millisecondsToDate(), secondsToDate()
- Added BasicFieldTransformer.dateTimeToString(final String pattern, final String timeZoneId), dateToString(final String pattern, final String timeZoneId), timeToString(final String pattern, final String timeZoneId)
- Added BasicFieldTransformer.emptyToNull(), valueToNull(final Object object)
- Add CopyField(String sourceFieldName, String targetFieldName) constructor that always overwrites
- Add FieldTransformer.throwExceptionOnMissingField to indicate if an exception should be thrown if the source field is missing (default to true)
- Added SetBatchSequenceNumberField transformer to add a sequence (auto increment) field that is incremented when the value(s) in the specified watch fields differs from the previous record
- Added SetGroupSequenceNumberField transformer to add  a sequence (auto increment) field that is incremented when the value(s) in the specified watch fields remain the same as the previous record, the sequence restarts when watch fields differ from the previous record
- Added SetSequenceNumberField transformer to add  a sequence (auto increment) field with the specified increment (step)
- Added SetUuidField transformer to add a field with a randomly generated UUID


4.4 - Jul 30, 2018
- added AmazonS3FileSystem with support for streaming, multipart uploads
- added BufferedReader
- BUGFIX: AbstractReader now skips empty rows from read() instead of just readImpl() 
- AsyncReader now shows both the current thread's stack trace along with the async thread's stack trace when failures occur
- AsyncWriter now shows both the current thread's stack trace along with the async thread's stack trace when failures occur
- AsyncWriter now rethrows any async exceptions that occur during close
- added CompositeValue.getValuesAsString()
- DataException uses the nested exception's message when wrapping and rethrowing InvocationTargetException
- added DataObject.resetID() to restart the sequence number used to identify readers and writers
- DataObject  now uses the superclass' name when retrieving the name of anonymous classes
- added DebugReader.includeElapsedTime option
- BUGFIX: corrected property name from FieldPath.getSearator() to getSeparator()
- BUGFIX: updated the bufferSize param passed to SortingReader from int to long and corrected the docs to identify it as bytes instead of megabytes
- added StreamWriter.format, .newJsonSystemOutWriter(), and newXmlSystemOutWriter()
- added more classloader info to Diagnostic class
- FieldFilter now supports multiple field path expressions
- added CloseWindowStrategy.isPollingRequired() to identify strategies that need to be checked even when no new data is available
- added CloseWindowStrategy.isCloseBeforeAddingRecord()
- added CreateWindowStrategy.isPollingRequired() to identify strategies that need to be checked even when no new data is available
- added GroupByReader.getOperations(), getOpenedWindows(), getClosedWindows()
- GroupByReader can now return aggregated data on a schedule even when no input data is present (instead of waiting for new data)
- the expression language now returns the first non-null value in an addition expression, for example "null + 17" now returns 17 instead of null
- improved support big decimal and big integer
- BUGFIX: Job and AbstractJob now checks for cancellation using their flag instead of Thread.interrupted() which can be cleared by non-DataPipeline code
- added JsonRecordReader to return an entire branch of JSON as records -- including nested objects and arrays
- added XmlRecordReader to return an entire branch of XML as records -- including nested elements
- SplitWriter.removeClosedTargets() is now publc
- BasicFieldTransformer can now perform the same sequence of operations on multiple fields instead of just one
- added BasicFieldTransformer.nullToValue(BigInteger value)
- added BasicFieldTransformer.nullToValue(BigDecimal value)
- added BasicFieldTransformer.nullToExpression(String expressionAsString)
- added BasicFieldTransformer.stringToBoolean(final boolean lenient)
- FieldTransformer can now operate on multiple fields instead of just one
- added com.northconcepts.datapipeline.transform.Ngrams to extracts each run/sequence of N words from a field into an array
- SetField's constructors are now public
- BUGFIX: the Twitter readers no longer throws exceptions extracting unexpected data
- Twitter readers can now extract tweets longer than 140 characters
- TwitterUserLookupReader can now search by screen name, not just ID
- improved XPath expression handling


4.3.1 - Aug 19, 2017
- BUGFIX: RetryingReader & RetryingWriter does not retry when exception is InterruptedException, InterruptedIOException, or ClosedByInterruptException
- BUGFIX: TwitterUserLookupReader now respects Twitter's ratelimit and API policy settings


4.3 - Aug 14, 2017
- BUGFIX: AsyncReader now records separate details for for exceptions occuring on the main thread versus the reader thread
- AsyncReader now rethrows the first exception occuring on reader thread on the main thread instead of the last exception
- BUGFIX: AsyncWriter now records separate details for for exceptions occuring on the main thread versus the writer thread
- AsyncWriter now rethrows the first exception occuring on writer thread on the main thread instead of the last exception
- DataObject (and DataEndpoint) now contains a sequential id and name (simple class name + id)
- most end points now use thier name property when adding exception properties to group property names in different classes with the most specific instance
- renamed CSVReader.parse():Record to CSVReader.parseLine():Record
- added CSVReader.parse():RecordList to parse multiple lines of CSV records
- added EmailReader.debug property
- added FileReader.autoCloseWriter to indicate if the underlying input stream should be closed when the reader closes (defaults to true).
- added Instagram reader classes
- renamed JmsSettings.getName() to getInstanceName()
- added IJob/AbstractJob/Job.getRunningTimeAsString(boolean shortForm) to conditionally return the long form (2 Years, 1 Second, 12 Millisecond) or short form (2y 1s 12ms)
- added MailChimpListMemberReader to read the list of subscribed, unsubscribed, and cleaned members from a MailChimp list
- MemoryWriter now returns the count of records in its toString() instead of the actual records
- added a constructor to allow appending to TemplateWriter
- added BasicFieldTransformer.flattenToString(fieldSeparator) to convert any arrays or records in a field to string
- added FlattenRecord transformer to convert any fields in a record containing arrays or records to string
- added SplitWords transformer to create a new record for each element in an array
- added RetweetStatus* fields containingg the original tweet to TwitterSearchReader
- added TwitterErrorCode enum wih all standard error codes 
- added TwitterListOwnershipReader to read the lists belonging to a Twitter account
- added TwitterListMemberReader to read the accounts in a Twitter list
- added TwitterProvider.getConfig() to expose configuration settings
- added TwitterUserLookupReader to hydrate Twitter account IDs with the full details
- added isRecord() to all nodes: ArrayValue, Field, Record


4.2 - Dec 23, 2016
- added com.northconcepts.datapipeline.avro.AvroReader and AvroWriter to read and write Apache Avro formatted files and streams
- improved parsing error messages in exprssion language
- added AbstractReader.setSkipEmptyRows(boolean) to prevent returning records containing only null values (affects Excel, CSV, FixedWidth readers)
- ArrayValue.getType() now ignores UNDEFINED element types where their value are null when determing the overall type to return
- ArrayValue.fromArray(Object) now uses the array's component type as the default type for elemnts to prevents null values being treated as UNDEFINED types
- BUGFIX: converting a field with no value set to an array no longer sets the array's default type to STRING
- added new com.northconcepts.datapipeline.core.DataObject as superclass of DataEndpoint and JMS conncetion and settings class
- moved DataEndpoint.exception(...) methods to new parent DataObject
- unquoted identifiers in FieldPath can now include @ and $ signs
- all aliased functions know to the expression language can now be logged by calling Functions.log()
- added offset to LimitReader to allow it to skip a number of upstream records, limit the number of records sent downstream, or both.  Similar to MySQL's SELECT ... LIMIT offset, row_count statement
- BUGFIX: ProxyReader.setNestedDataReader() now opens the new nested reader if the proxy was already open and the manageLifecycle flag is true
- ProxyReader and ProxyWriter's constructors now thows exception if a null nested reader/writer is passed in (like setNestedDataReaderWriter())
- BUGFIX: Record.toJson(ValueNode, Writer) now closes the passed in java.io.Writer if an exception occurs
- BUGFIX: Record.toXml(ValueNode, Writer) now closes the passed in java.io.Writer if an exception occurs
- added overloaded Record.toXml() methods with ability to write pretty XML output with line breaks and indentation
- added Record.isEmpty() to return true if all fields in the record are null
- added RemoveDuplicatesReader.discardWriter to optionally collect non-unique/dulicate records
- added RemoveDuplicatesReader.onUnique() and onDuplicate() to intercept unique and dulicate records
- SortingReader now performs record collection and sorting when read() is first called instead of in open() to improve job handling (pause, resume, cancel) and monitoring
- SequenceReader now opens each reader passed to it one at a time, after the previous reader is finished and closed
- added StreamWriter.newSystemOutWriter() and newSystemErrWriter() convenience methods to write to STDOUT and STDERR respectively
- added a new com.northconcepts.datapipeline.diagnostic.Diagnostic class to help us diagnose environmental issues (see the javadocs for the data it collects)
- event bus now allows any interface for listening and publishing events -- they no longer need to extend java.util.EventListener
- BUGFIX: EventBusReader.readImpl() now treats InterruptedException while taking records from the queue as EOF
- added ExcelDocument.ProviderType.POI_SXSSF to write .xlsx files (Excel 2007+) using Apache POI's streaming API (SXSSF) and temporary files to reduce memory usage when writing large files
- added ExcelDocument.ProviderType.POI_XSSF_SAX to reads .xlsx files (Excel 2007+) using Apache POI's streaming API (XSSF_SAX) to reduce memory usage when reading large files
- added FileWriter.compressed to indicate if data should be compressed as it is being written
- added FileReader.compressed to indicate if data is compressed and should be decompressed as it is being read
- FileReader now uses an internal buffer to reduce memory usage
- BUGFIX: native FileWriter and FileReader now saves and loads the field types of null array elements - they previously appeared as UNDEFINED
- GroupByReader no longer requires group fields to run -- allowing it to operate on the entire stream as one group
- added built-in format/parse functions to the expression language that accept patersn: parseDate, formatDate, parseLong, parseDouble, formatDouble, formatLong
- added substring(String string, int beginIndex, int endIndex) function to the expression language
- added com.northconcepts.datapipeline.jms.JmsReader and JmsWriter to read and write publish-subscribe topics and message queues
- Job now implements new IJob interface
- added RunnableJobAdapter to convert any Runnable into a Job that can be managed (pause, resume, cancel) and monitored
- Updated Job with methods to access the parent and child instance (getParent(), getChild()), lifecycle events (onStart(), onFinish(), onPause(), onResume())
- running child jobs are now cancelled when their parents are cancelled
- added Job.getJobCount() to return the current number of live jobs in the JVM
- added SplitWriter as a replacement for DeMux to convert a single stream into many
- added RetryingReader and RetryingWriter to re-attempt failed reads and writes using configurable strategies
- added TwitterReader.rateLimitExceeded flag and onRateLimitExceeded() callback
- added TwitterSearchReader.maxId and sinceId to configure which tweets are returned
- added XmlReader.setIgnoreNamespaces(boolean) which defaults (and throws an exception if passed false) to true since XPath 1.0 is unable to handle namespaces
- JdbcMultiWriter's constructor now throws exception if either the number of connection or maxQueuedRecordsPerConnection is less than 1
- JdbcMultiWriter.write() now throws exception if called after one of the internal AsyncWriters has failed
- added JdbcMultiWriter.getException() to return an exception thrown by one of the internal AsyncWriters if any
- added JdbcMultiWriter.getExceptions() to returns all exceptions thrown by the internal AsyncWriters
- added JdbcMultiWriter.rethrowAsyncException() to rethrow an exception thrown by one of the internal AsyncWriters or returns silently if no exceptions were thrown
- added AsyncWriter.rethrowAsyncException() to rethrow the exception thrown by the internal thread or returns silently if no exception was thrown
- BUGFIX: MultiWriter.writeImpl() no longer throws NullPointerException when no downstream writers were set


4.1 - May 7, 2016
- added MongoDB endpoints -- MongoReader and MongoWriter 
- added CollectionWriter to add all the values from one column/field into a Java collection (List, Set)
- added DataEndpoint.getOpenElapsedTimeAsString()
- added DataException.isCausedBy(Class) and getRootCause()
- all FieldList.add and remove methods ignore null field names
- added Functions.addAll(Class, boolean, String...) to add all static methods in a static class can now be added ass aliases in the expression language
- the dynamic expression language now contains function aliases for all methods in java.lang.Math in addition to other new built-in functions
- added Record.toBinary and fromBinary to serialize and deserialize from the built-in binary format
- added TeeReader.cloneRecord to indicate if a clone/copy or the original record should be sent to the DataWriter
- BUGFIX: CSVReader no longer throws a NullPointerException while adding properties to another valid exception that was thrown
- CSVWriter now throws exception if a null or empty string is used as a quote char/string
- EmailReader now allow you to specify the port
- BUGFIX: EmailReader no longer throws exceptions when retrieveing null email addresses
- EventBus.addEventBusLifecycleListener() no longer requires an EventFilter
- rename EventBusReader.stopOnReaderEOF to stopOnWriterEOF
- EventBus can now remove listeners by topic
- added FileWriter.autoCloseWriter
- added FilteringReader.getDiscardReasonFieldName()
- the expression language now behaves like databases and returns null when adding or subtracting null from a number
- the expression language now handles methods with variable length arguments
- BUGFIX: JMX monitoring now returns a value for the endpoints.elapsedTimeAsString property
- added JdbcUpsertWriter.nonUpdateFields to indicate fields which should only be written once and not updated during an upsert
- added static convenience methods Job.run(reader, writer) and Job.runAsync(reader, writer)
- several Job methods, like pause(), resume(), and cancel(), now return the "this" instance for fluent API usage
- added SimpleJsonReader to mirror SimpleJsonWriter
- added SimpleXmlReader to mirror SimpleXmlWriter
- added XmlWriter.pretty to indicate if line breaks and indentations should be added to output (default is false)
- added CachedLookup.maxEntries to limit the number of elements cached and CachedLookup.resetSchedule to indicate when the cache should be completely cleared
- added CachedLookup.getEntries(), getHits(), getMisses(), and getRequests()
- BIGFIX:RenameField no longer throws exception when old and new fields are the same or have different cases
- added TransformingWriter to mirror TransformingReader -- all Transformer classes can now be used with TransformingWriter
- added Transformer.geEndpoint() and getWriter() to enable usage in reading and writing scenarios
- The binary FileWriter now supports appending to an existing file via a new overloaded constructor
- added CollectionWriter to receive all values from specific fields
- BIGFIX: AsyncWriter no longer hangs when an exception occurs while it's already blocking
- lookup classes (BasicLookup, CachedLookup, & DataReaderLookup) no longer display individual entries in their toString() methods, only their count
- both TransformingReader and TransformingWriter contain discardWriter and discardReasonFieldName functionality like filters and validators


4.0 - Mar 17, 2016
- added DataEndpoint.getLastRecord() to return the most recent record seen by this endpoint while it is open
- added DataEndpoint.enableJmx() to turn on JMX (Java Management Extensions) monitoring & management and exposes running Jobs and EventBuses as managed beans
- added DataEndpoint.getElapsedTimeAsString() to return a human readable string of the total time this endpoint spent reading or writing
- added DataException.isCauseInterruptedException() to indicate if a thread was interrupted
- added DataReader.isExhausted() to indicate that this stream has already returned a null and no further reads are possible
- DataReader.read() now guarantees that no further calls to the underlying DataReader.readImpl() will occur once it returns a null
- added Messages.getCurrent(boolean) to allow for conditional thread-local instance creation
- added Messages.clearCurrent() to allow thread-local instances to be explicitly removed
- java.io.InputStream and java.io.Reader values are now converted to byte[] and String when added to records
- CSVReader now uses strings (instead of a single char) for fieldSeparator and quote
- CSVReader now allows for different starting and ending quote strings
- CSVReader now supports configurable line (or record) separators using its new setLineSeparators(String...) and setLineSeparator(String) methods
- added com.northconcepts.datapipeline.email.EmailReader to read emails (and their attachments) from IMAP mailboxes
- added com.northconcepts.datapipeline.eventbus.EventBus, EventBusReader, and EventBusWriter to allow pipelines to be loosely coupled and used in one-to-many (publish-subscribe) scenarios
- FilteringReader now accepts an optional discardDataWriter and discardReasonFieldName to easily capture or route filtered out records
- ValidatingReader now accepts an optional discardDataWriter and discardReasonFieldName to easily capture or route records failing validation
- CreateWindowStrategy and CloseWindowStrategy noe include additional info in their API to help make the sliding window open/close decision
- added CreateWindowStrategy.recordPeriod(int) to opens new sliding windows at set record counts
- added more convenience methods to GroupByReader
- added new Job class to run, manage, and track pipelines -- replaces JobTemplate.DEFAULT.transfer(reader, writer);
- the exsiting JobTemplate transfer methods now return Job, but otherwise behave the same
- SimpleJsonWriter and SimpleXmlWriter now have setPretty(boolean) and isPretty() methods to indicate if line breaks and indentations should be added to output (default is false)
- added DeMux.runAsync() to mirror the new Job class
- added more BigDecimal and BigInteger operations to BasicFieldTransformer
- deprecated the TransformingReader.filter property in favour of the new name TransformingReader.condition to carify the conditional transformation intent

3.1.4.2 - Jan 29, 2016
- added Messages.setEnabled() and isEnabled() to control whether silent failures in TransformingReader and ValidatingReader are stored in Messages
- added TransformingReader.onFailure to explicitly intercept transformation failures

3.1.4 - Jan 19, 2016
- BUGFIX: AsyncWriter now throws an exception in the next call to writeImpl(Record record) if the asynchronous writer thread failed
- added DataEndpoint.DEFAULT_READ_BUFFER_SIZE for use in AsyncMultiReader and other classes
- DebugReader now emits the current record count along with each record
- The following writers now supports appending to an existing file via a new overloaded constructor: TextWriter, LinedTextWriter, CSVWriter, FixedWidthWriter
- renamed Node.isValue() to Node.isValueNode()
- added TextWriter.autoCloseWriter to indicate if the underlying java.io.BufferedWriter should be closed when this stream closes (defaults to true).
- added TextWriter.flushOnWrite to indicate if the underlying java.io.BufferedWriter should be flushed after each record is written (defaults to false).
- CSVWriter now supports different starting and ending quote string via setStartingQuote() and setEndingQuote()
  - the existing setQuoteChar() method assigns both startingQuote and endingQuote to the same value
  - the existing getQuoteChar() method returns the startingQuote value
  - updated CSVWriter.quoteChar from char to String
- added CSVWriter.forceQuote to indicate if all values should be quoted, even if they are not null or empty strings.
- updated CSVWriter.fieldSeparator from char to String
- added CSVWriter.nullValuePolicy to handle how null values are written (defaults to ValuePolicy.EMPTY_STRING, an unquoted blank value)
- added CSVWriter.emptyStringValuePolicy to handle how empty string values are written (defaults to ValuePolicy.QUOTED_EMPTY_STRING, a quoted blank value)
- updated the JavaDocs to clarify that ExcelDocument is not thread safe, but can be reread very quickly since it uses an in-memory buffer
- the Excel providers (PoiProvider, PoiXssfProvider, and JxlProvider) now guard against reusing an ExcelDocument while it is still open 
- added FileWriter.flushOnWrite to indicate if the underlying java.io.BufferedWriter should be flushed after each record is written (defaults to false).
- added several convenience methods to GroupByReader that uses the source field name for the target: sum, first, last, max, min
- added isBatchSupported() to IUpsert and GenericUpsert
- MySQl upserts (INSERT ... ON DUPLICATE KEY UPDATE) are now supported using the new com.northconcepts.datapipeline.jdbc.upsert.MySqlUpsert
  - for use with JdbcUpsertWriter
- added setCommitBatch() to JdbcWriter and JdbcUpsertWriter to indicate if a commit should occur after each batch is sent to the database
- added JobCallback.NULL for use by implementers of JobTemplate
- BUGFIX: the default implementation of JobTemplate.transfer() now calls JobCallback.onFailure() if the transfer was successful, but the reader or writer failed to close
- the following DeMux methods are now public: getStrategy(), getSource(), getSink(), getThread()
- onLimitExceeded() in IApiLimitPolicy and ApiLimitPolicy now accepts TwitterRateLimit instead of TwitterReader
- expandEntities() in IEntityExpansionPolicy and EntityExpansionPolicy now accepts ITwitterConverter instead of TwitterReader
- added the following new Twitter classes to com.northconcepts.datapipeline.twitter: 
  - ITwitterConverter, TwitterConverter
  - TwitterFilterStreamReader, TwitterSampleStreamReader
  - TwitterFollowerIDsReader, TwitterFollowerListReader, TwitterFollowingIDsReader, TwitterFollowingListReader
  - TwitterFollowWriter, TwitterUnfollowWriter
  - TwitterProvider, TwitterStreamProvider
  - TwitterRateLimit
- updated TwitterReader, TwitterSearchReader, and TwitterStreamReader to delegate work to TwitterProvider, ITwitterConverter, and TwitterRateLimit
- added isAddTextToParent() and setAddTextToParent() to XmlReader, JsonReader, and JavaBeanReader to indicate if each child node's text should be concatenated to its parent during parsing (defaults to false).
- added TeeReader to operate like tee in Unix and write every record passing through it to a DataWriter
- improved various error message text


3.1.3 - Jul 24, 2015
- transformations can now be applied to hierarchical records.  Field names passed to operators are represented internally as FieldPaths allowing them to reference fields in child records and arrays.  For example, the field path "customer.address[0].city" can be created in two ways [FieldPath.parse("customer.address[0].city") or new FieldPath().name("customer").name("address").index(0).name("city")] before being passed to the Record.getField() or Record.getValue() methods.
- addedd ArrayValue.addAll(ArrayValue array), indexOfNode(Node child), indexOfValue(Object value)
- BUGFIX: Field.compareTo() now prevents ClassCastException by first comparing the field value's node type and value type before comparing the actual value
- BUGFIX: Field now overrides getParentRecord() to return its record (same as Field.getRecord()) instead of its grandparent record
- FieldComparator can now compare nested/child fields.  It's internal fieldName property has been repalced with a FieldPath object.
- FieldList now treats field names as FieldPath internally which can be accessed via FieldList.getFieldPath(int)
- added FieldPath to represent an abstract location of a field within a record
- added Node.isValue() to identify value containers (Record, ArrayValue, and SingleValue instances)
- added Node.DuplicateNodeAction to indicate how duplicate fields should be handled during copy/merge/lookup operations
- updated ProxyReader and ProxyWriter to ensure offending records are added to any exceptions that are thrown by subclasses.
- added Record.copyFrom(Record, DuplicateNodeAction) to handle merging of hierachical records
- updated Record.getField(int) add the size of the list to negative values to obtain values from the end of the list
- added new Record.getField(), getValue(), contains(), containsField(), removeField(), moveFieldBefore(), moveFieldAfter(), excludeFields(), and removeFields() methods accepting FieldPath
- updated Record.fromJson() and fromXml() to automatically typecast to the left side variable by returning > instead of ValueNode
- updated the following classes to handle hierarchical fields
  - RemoveDuplicatesReader
  - FieldFilter
  - GroupByReader
  - GroupOperation
  - BasicLookup
  - DataReaderLookup
  - LookupTransformer
  - BasicFieldTransformer
  - CopyField
  - FieldTransformer
  - MoveFieldAfter
  - MoveFieldBefore
  - RemoveFields
  - RenameField
  - SetCalculatedField
  - SetField
  - SplitArrayField
- added FieldExists and FieldNotExists filters
- AggregateReader is now deprecated, use GroupByReader instead. GroupByReader provides the same functionality and can return continuous results
- BUGFIX: GroupByReader now fails on open if no group-by fields are specified
- added GroupOperation.getSourceFieldPath() and getTargetFieldPath()
- added new LookupTransformer(FieldList, Lookup, DuplicateNodeAction) constructor and deprecated LookupTransformer.LookupTransformer(FieldList, Lookup, boolean) constructor
- added new XmlField(String, String, String) constructor and cascadeResetLocationPath Java bean property to XmlField
- added overloaded addField(String, String, String) to identify when cascaded values (values repeated when new matches aren't found) should be cleared to XmlReader, JsonReader, and JavaBeanReader


3.1.2 - Jul 9, 2015
- The dynamic expression lanuage (DPEL) now automatically handles BigDecimal and BigInteger operations.  For example, if either operand in the expression "A + B" are BigXXX, then both operands will be promoted to BigXXX before applying the operator.
- Added convenience methods for converting records to and from XML
  - static Record.toXml(ValueNode recordOrArray)
  - static Record.toXml(ValueNode recordOrArray, Writer writer)
  - static Record.fromXml(String xml) 
  - static Record.fromXml(Reader reader)
  - Record.toXml()
- AsyncMultiReader.getException() now always returns the first exception, even when AsyncMultiReader.failOnException is set to false
- BUGFIX: ExcelWriter was limiting number of autofilter columns in subsequent sheets to number of columns in first sheet
- BUGFIX: Record.toJson() was throwing NullPointerException for uninitialized field value (i.e. fields not explicitly set to null)
- BUGFIX: BasicFieldTransformer was throwing NullPointerException for uninitialized field value (i.e. fields not explicitly set to null)


3.1.1 - Jun 30, 2015
- Addded AsyncMultiReader.getException() to retrieve exceptions when the failOnException flag is set to false
- updated AsyncMultiReader's buffer and threads fields to protected from private
- Addded AsyncMultiReader.ReaderThread.getReader()
- BUGFIX: ArrayValue now distinguishes between Iterable and ValueNodes that implements Iterable (like Record and ArrayValue)
- BUGFIX: Field.getSizeInBytes() now handles nested records and multidimensional arrays
- API change: FieldType.getSizeInBytes(Object value) is now FieldType.getSizeInBytes(ValueNode valueNode).  This change is part of the support for nested records and multidimensional arrays.
- Added ability to visit all nodes in a record, array, or field via com.northconcepts.datapipeline.core.NodeVisitor.visit(Node, INodeVisitor)
- API change: com.northconcepts.datapipeline.core.Node.NodeType.ARRAY_VALUE renamed to ARRAY
- Added ExcelDocument.getSheetNames()
- Added ExcelWriter.isAutofilterColumns() and setAutoFilterColumns(boolean)
- BasicFieldTransformer now supports nested records and multidimensional array values
- Added BasicFieldTransformer.isContinueOnException(), setContinueOnException(boolean), and getException() to allow field transformation chains to continue subsequent seps even when failures occur
- Added BasicFieldTransformer.SingleValueOperation to support automatic array and record traversal for single valued objects
- API change: BasicFieldTransformer.StringOperation now extends SingleValueOperation instead of Operation
- API change: removed BasicFieldTransformer.NullableStringOperation.  Usage now replaced with existing BasicFieldTransformer.StringOperation


3.1.0 - Jun 19, 2015
- Added native support for nested records and multi-dimensional arrays in the API and dynamic expression lanuage (think JSON on steroids)
  - Added the following new supporting classes:
    - com.northconcepts.datapipeline.core.Node
    - com.northconcepts.datapipeline.core.ValueNode
    - com.northconcepts.datapipeline.core.ArrayValue
    - com.northconcepts.datapipeline.core.SingleValue
  - The following methods now have built-in handling for Record, Collection, Iterable, Node subclasses, and arrays (primitive and object)
    - Record.addField(...)
    - Record.setField(...)
    - Field.addValue(...)
    - Field.setValue(...)
    - ArrayValue.addValue(...)
    - ArrayValue.setValue(...)
  - Record now extends ValueNode and implements methods from super classes
  - Record add the following methods for working with nested/descendent fields:
    - getField(FieldList fieldNamePath, boolean createField)
    - containsField(FieldList fieldNamePath)
    - removeField(FieldList fieldNamePath)
  - Record now includes an overloaded method to distinguish between adding/ensuring a unique field name and adding to an array: addField(String fieldName, Object value, boolean arrayField)
  - Added convenience methods for converting records to and from JSON
    - static Record.toJson(ValueNode recordOrArray)
    - static Record.toJson(ValueNode recordOrArray, Writer writer)
    - static Record.fromJson(String json) 
    - static Record.fromJson(Reader reader)
    - Record.toJson()
  - Field now extends Node and implements methods from super classes
  - Added the following methods to Field
    - isNotArray()
    - forceValueAsArray()
    - getValueAsArray()
    - getValueAsArray(boolean force)
    - getValueAsRecord()
    - getValueAsSingleValue()
  - Updated FileReader and FileWriter to support sub records and multi-dimensional arrays
- Added Record.evaluate(String expressionString) to allow dynamic expressions to be evaluated using a specific record
- Several classes now implement Iterable to be used in "foreach" statement: Record, RecordList, FieldList, ArrayValue
- Added Field.remove() to remove this field from its parent Record
- Added com.northconcepts.datapipeline.core.AsyncMultiReader to read from one or more DataReaders concurrently, in separate threads
- Added CompositeValue.isNull()
- Updated DataException to use the class name of its nested exception as the message if the message is null
- Added RECORD to FieldType enum
- Updated FieldType.getSizeInBytes(Object value) to return long instead of int
- RecordList is now Serializable
- Added Session.keySet(), containsSessionProperty(...), getSessionProperty(...), setSessionProperty(...);
- BUGFIX: ExcelWriter.close() no longer throws NullPointerException when autofitColumns is true and no data was written
- Added GroupByReader.setExcludeNulls(boolean excludeNulls) to indicate if results for null groups should be skipped
- Updated GroupByReader to handle arrays in grouping fields.  For example, each country in the country field is treated as a separate value if grouping on it.
- Added TemplateWriter.setAlwaysShowFooter(boolean alwaysShowFooter) and setAlwaysShowHeader(boolean alwaysShowHeader) to indicates if header and footer should be written when no records were written (defaults to true)
- Updated string operators in BasicFieldTransformer to handle arrays


3.0.5 - May 29, 2015
- Added ExcelWriter.setStyleFormat to allow user-defined column formatting
- ExcelWriter.setAutofitColumns() now uses the POI providers (ExcelDocument.ProviderType.POI and ExcelDocument.ProviderType.POI_XSSF) native autoSizeColumn function


3.0.4 - May 27, 2015
- BUGFIX: AsyncReader now guards against hanging the main thread when exceptions occur


3.0.3 - May 5, 2015
- using latest license agreement (v1.81) - no functional changes


3.0.2 - May 2, 2015
- using latest license agreement (v1.8) - no functional changes


3.0.1 - May 1, 2015
- using latest trial library - no functional changes


3.0.0 - Apr 30, 2015
- Improved performance of XPath-based readers (JsonReader, XmlReader, and JavaBeanReader)
- Added DataEndpoint.getElapsedTime(), isCaptureElapsedTime(), and setCaptureElapsedTime(boolean captureElapsedTime)
- Added DataEndpoint.getOpenedOn() and getClosedOn() to return system time in milliseconds when readers and writers were opened and closed
- Added DataEndpoint.getOpenElapsedTime returns the number of milliseconds this endpoint was (or has been) opened for
- Added DataEndpoint.assertClosed() for operations that need to ensure the endpoint has finished reading
- Exceptions now include timestamps for when endpoints were opened and closed if relevant
- Added Field.getValueAsBigDecimal(), getValueAsBigInteger(), setValue(BigDecimal value), and setValue(BigInteger value)
- Updated Field.toString() and getValueAsString() to strips insignificant zeros after decimals in BigDecimal values
- Added Record.getCreatedOn() to return the time in milliseconds when the record was created
- BUGFIX: fixed ClassCastException in Record.copySessionPropertiesFrom(Session)
- BUGFIX: CSVWriter now quotes values starting with zero (0) to prevent other apps from trimming numbers with zeros
- Added com.northconcepts.datapipeline.filter.rule.DateIsBefore, DateIsAfter, and DateIs
- Added FieldFilter.isThrowExceptionOnMissingField() and setThrowExceptionOnMissingField(boolean)
- Added convenience methods to FieldFilter
- Added com.northconcepts.datapipeline.transform.BasicFieldTransformer.replaceAll() and replaceFirst()
- updated GroupAverage and GroupSum to hold their running total and return value as BigDecimal
- updated GroupFirst to allow null first elements
- BUGFIX: GroupMaximum and GroupMinimum no longer throw ClassCastException when comparing different types of numbers (i.e. Double vs Long, BigDecimal vs short, etc.)
- BUGFIX: GroupMaximum and GroupMinimum no longer throw ClassCastException when comparing Strings with different types (i.e. String vs Long).  Both values are treated as Strings
- GroupByReader now provides sliding window aggregations via its new setNewWindowStrategy(NewWindowStrategy) and setCloseWindowStrategy(CloseWindowStrategy) methods
- Added convenience methods to GroupByReader
- Renamed GroupField to GroupOperationField
- Added GroupOperation.getFilter(), setFilter(Filter filter), getExcludeNulls(), setExcludeNulls(boolean excludeNulls), getTargetFieldName(), and setTargetFieldName(String targetFieldName)
- Updated JobTemplateImpl to log original and secondary exceptions as errors; original behaviour is not affected
- Updated SimpleJsonWriter to write date, time, and datetime fields using the following patterns respectively: "yyyy-MM-dd", "HH:mm:ss.SSS", "yyyy-MM-dd'T'HH:mm:ss.SSS"
- BUGFIX: SimpleJsonWriter now writes well-formed JSON, even when an exception occurs.
- Added TailProxyReader to keep the last n records in-memory for asynchronous retrievel as data is flows through
- Added TailWriter to keep the last n records written to it in-memory for asynchronous retrievel
- Added BasicFieldTransformer.intervalToXXX convenience methods
- BUGFIX: XmlReader now matches unclosed nodes early on record break if they are used in the current record
- Updated DateEndpoint.toString() to include record count, opened time, closed time, and description if available
- DataEndpoint.VENDOR, PRODUCT, and PRODUCT_VERSION are now public
- BUGFIX: TransformingReader now skips subsequent transformers if records gets marked as deleted
- Added SplitArrayField transformer
- Added convenience methods SortingReader.asc(String fieldName) and desc(String fieldName)



2.3.6.6 - Feb 17, 2015
- Field is now Serializable
- Record is now Serializable
- BUGFIX: PipedReader.available() now adds it's queued element count to its buffer count 
- Added Contains field filter rule
- Added FieldFilter.addNotRule(FieldFilterRule rule)
- DeMux now implements Runnable
- DeMux.getSource() and getSink() promoted from private to protected
- Added DeMux.run(boolean async) to conditionally run the feeder process in the current thread (false) or a new thread (true)
- Added DeMux.join(long millis), join(long millis, int nanos), and join()
- Added DataReaderClient
- Added DataReaderServer


2.3.6.4 - Dec 24, 2014
- BUGFIX: AsyncReader now stops filling its cache immediately when stop is called
- Field now supports array types
- Added LimitReader to cap the number of records read from another DataReader
- Added Record.addField(String, Object)
- Added GroupByReader to provide whole stream aggregations as records
- Updated JavaBeanReader to extend XmlReader instead of ProxyReader; API remains unchanged
- Updated JsonReader to extend XmlReader instead of ProxyReader; API remains unchanged
- BUGFIX: DeMux now removes closed readers from its list of targets
- BUGFIX: DeMuxReader now clears their queue on close to prevent possible blocking in parent DeMux
- Added DeMux.getThread(), isFinished(), and stop()
- Replaced ExcludeFields with RemoveFields; ExcludeFields is now depracated and extends RemoveFields
- Replaced IncludeFields with SelectFields; IncludeFields is now depracated and extends SelectFields
- Improved logging and rate limit handling in TwitterSearchReader
- Added getDuplicateFieldPolicy() and setDuplicateFieldPolicy(DuplicateFieldPolicy) to XmlReader, JsonReader, JavaBeanReader


2.3.5 - Oct 6, 2014
- new license agreement v1.4


2.3.4 - Sep 9, 2014
- added com.northconcepts.datapipeline.twitter.TwitterSearchReader
- updated license to refelct new plans + clarified revocation terms
- replaced "\n" with OS line separator in DataException.getPropertiesAsString() and getMessageWithProperties()
- Field.getValueAsString() now returns the first and last bytes for BLOBS instead of the array's default toString()
- Record.clone now returns Record instead of Object
- added Record.moveFieldBefore(String fieldName, String beforeFieldName) and moveFieldAfter(String columnName, String afterFieldName)
- added RecordList.addAll(DataReader reader)
- added RecordList.RecordList(DataReader reader) constructor
- added CSVReader.getLineText() and getLineParser()
- added filter rule: com.northconcepts.datapipeline.filter.rule.IsNull
- AggregateReader.add(AggregateOperation ... operations) is now public
- inner class AggregateReader.AggregateOperation is now public
- BUGFIX: XPath engine now matches root and ancestor attributes
- JavaBeanReader now returns node/field names separate from values; use "//firstName/text()" instead of "//firstName" for values
- added JavaBeanReader.setDebug(boolean debug) and isDebug() to display all potential paths seen by the reader
- added JsonReader.setDebug(boolean debug) and isDebug() to display all potential paths seen by the reader
- added XmlReader.setDebug(boolean debug) and isDebug() to display all potential paths seen by the reader
- BUGFIX: JobTemplateImpl no longer tries to reopen supplied endpoints if they are already open
- BUGFIX: DataReaderLookup no longer tries to reopen supplied endpoints if they are already open
- BUGFIX: DeMux now prevents sinks/readers from blocking forever if open fails
- added BasicFieldTransformer.dateTimeToDate() and dateTimeToTime() to split out just the date or time from a datetime field
- added FieldTransformer.setValueOnException(Object valueOnException), getValueOnException(), and hasValueOnException() to set a default value when an exception occurs instead of rethrowing it
- added com.northconcepts.datapipeline.transform.MoveFieldBefore and MoveFieldAfter transformations
- added examples from blogs: com.northconcepts.datapipeline.examples.cookbook.blog.*


2.3.3.1 - Apr 18, 2014
- BUGFIX: DataException.getRecord() no longer throws ClassCastException


2.3.3 - Oct 2, 2013
- Added XmlReader.setExpandDuplicateFields(boolean) to return multiple records instead of overwriting repeating fields


2.3.2.1 - Sep 30, 2013
- Increased records in Free version 


2.3.2 - Sep 24, 2013
- added Record.selectFields(FieldList, boolean) and IncludeFields.lenient to continue even if fields don't exist
- SetCalculatedField & SetField now have overwrite flags to prevent changing existing values


2.3.1 - Sep 13, 2013
- Added generic JDBC upsert writer: JdbcUpsertWriter
- Added PipedReader and PipedWriter to allow the push model of piping a writer to a reader
- Added convenience Record.setField(String fieldName, Object value) and Record.setFieldNull(String fieldName, FieldType type) methods.
- JsonReader XPath matching and exception reporting improvements
- Early access to SQL builder classes


2.3 - Aug 2, 2013
- added streaming JSON reading and writing (simple and template based)
- added SimpleXmlWriter
- improved handling of recursive XML-to-records 
- added user-definable demux strategies
- DeMuxReader is no longer a public class since it should not be referenced directly
- improved exception handling in JdbcReader
- BUGFIX: JavaBeanReader now handles xpath for recursive text children
- updated Apache POI to v3.9


2.2.9.3 - July 8, 2013
- updated licenses (change to number of developers and applications in each tier)
- IncludeFields & ExcludeFields now accept a collection of field names in their constructor and add method


2.2.9.2 - June 7, 2013
- updated distributed Eclipse project to include new Data Pipeline jar


2.2.9.1 - May 13, 2013
- added JdbcReader.useColumnLabel property to allow fields to be named using the column labels (or aliases) instead of the underlying, real column names


2.2.9 - May 6, 2013
- added Excel 2007 provider (POI_XSSF)
- Excel handling now defaults to the Apache POI_XSSF (Excel 2007) provider, instead of POI (Excel 2003)
- added FixedWidthField.align to allow left-filled (right aligned) fields
- added FixedWidthField.fillChar to allow fields to specify a different filler from their reader/writer
- reduced memory overhead for fields and records
- CSV performance improvements
- exception property values now truncated to 256 chars
- using StringBuilder (instead of StringBuffer) internally to improve performance


2.2.8 - Nov 28, 2012
- added TemplateWriter for writing text streams using FreeMarker templates
- added new examples for writing XML and HTML files using TemplateWriter
- BUGFIX: XmlWriter's (XmlTemplate, File) constructor now calls setFieldNamesInFirstRow(false) by default
- BUGFIX: The JxlProvider now converts intervals and user-defined types to string when generating Excel files
- Intervals are no longer converted to strings when added to a field/record
- BasicFieldTransformer can now convert numbers to intervals (seconds, months, days, minutes, etc.)
- JdbcWriter now has public accessors for connection, tableName, batchMode, and jdbcTypes
- individual fields can now be removed from a FieldList
- FieldList can now accept collections of strings
- updated Apache POI to v3.8


2.2.7 - July 14, 2012
- added JdbcMultiWriter for multi-threaded writing to 1 or more database connections
- added multi-threaded AsyncWriter to compliment AsyncReader
- data writers now have an available() method to indicate the number of records that can probably be written without blocking
- MultiWriter now supports configurable write strategies (ReplicateWriteStrategy, RoundRobinWriteStrategy, AvailableCapacityWriteStrategy, and user defined)
- added support for CLOB fields (see JdbcValueReader.DEFAULT)
- Field and Record's toString() methods now limit displayed strings to the first 128 characters
- RecordMeter is now public and returned by MeteredReader and MeteredWriter's getMeter() method
- BUGFIX: record count is no longer off by 1 in some cases 


2.2.6.1 - May 3, 2012
- BUGFIX: POI provider now handles null rows in Excel file
- BUGFIX: Excel reader exception logging no longer fails when a record has no fields


2.2.6 - April 22, 2012
- performance improvements in CSV and fixed width handling
- untyped expression evaluation is now based on the value's type, instead of the field's declared type
- BUGFIX: now handles untyped expressions between primitive and object values
- float expressions are now upgraded to doubles during evaluation
- all non doubles and floats numbers are now upgraded to longs during evaluation
- expressions can now reference Java beans, not just primitive values
- method call expression now finds the most appropriate method based on the runtime argument types (http://en.wikipedia.org/wiki/Multiple_dispatch)
- improved handling for collections and arrays in DataException properties
- Apache PoiProvider can now distinguish between date, time, and datetimes fields in Excel 


2.2.5 - Jan 8, 2012
- added JavaBeanReader whice uses XPath expressions to identify field values and break records
- AbstractReader's setStartingRow and setLastRow no return this
- Filter rule IsInstanceOfJavaType now returns false for null values
- added number-to-date methods to BasicFieldTransformer (numberToDate(), minutesToDate(), hoursToDate(), and daysToDate())
- BasicFieldTransformer.Operation and BasicFieldTransformer.StringOperation are now public classes
- BasicFieldTransformer.add(Operation ... operation) is now public
- ConditionalTransformer is now private (use TransformingReader.filter instead)
- TransformingReader now contains an optional Filter, allowing any transformer to be conditionally applied
- Removed TransformingReader.add(Filter filter, Transformer ... transformer) method


2.2.4 - May 31, 2011
- POI provider for ExcelWriter now caches cell styles.  This fixes the "Too many different cell formats" Excel message when opening a spreadsheet with more than 4000 styles


2.2.3 - May 14, 2011
- added JdbcValueReader to allow clients to override column reading strategy
- JdbcReader.valueReader property


2.2.2 - May 11, 2011
- added XmlTemplate
- XmlWriter now uses XmlTemplate to describe output pattern


2.2.1 - December 9, 2010
- added batch execution to JdbcWriter (see JdbcWriter.setBatchSize)
- added callback mechanism to track job progress (see JobTemplate.transfer(R reader, W writer, boolean async, JobCallback callback))
- early access to DeMuxReader


2.2.0 - September 9, 2010
- Added XPath-based XmlReader
- Excel now defaults to the Apache POI instead of JXL
- The following classes now use java.util.List instead of java.util.ArrayList in their public APIs: CompositeValue, FieldList, Lookup, LookupTransformer, Record, RecordList


2.1.0 - August 26, 2010
- Added support for Excel 2003 XLS files
- Added support for Excel XLSX (XML format) files
- BUGFIX: whitespace (like tab) can now be used as the field separator in CSVReader
- Added FixedWidthReader.setLastFieldConsumesRemaining(boolean lastFieldConsumesRemaining)   
- Added ExcelReader.setUseSheetColumnCount(boolean useSheetColumnCount)
- BUGFIX: handle null variable names in expressions
- Added more string utils to BasicFieldTransformer
- Added ConditionalTransformer class
- Added TransformingReader.add(Filter filter, Transformer ... transformer)
- SetField now has type-specific constructors
- Added Eclipse project files
- Added Ant build project


2.0.2 - Jan 18, 2009
- RecordList now has a varargs constructor
- BUGFIX: Lookup now has a get(Arraylist) method; was previously passing Arraylist to get(Object...)
- BUGFIX: XmlWriter no longer treats the field names as the first record
- Examples now include input files


2.0.1 - Jan 9, 2009
- BUGFIX: ValueMatch.add(Object ... values) now adds individual elements of values
- ValueMatch is now a parameterized type
- OrRule, PatternMatch now use varargs

	
2.0.0 - Oct 26, 2008
- Java 5 support (var-args, generics, enums)
- Dual licensing:  GPL and commercial
- Added CSVReader.trimFields property (default is true)
- Added workaround for the JXL (Excel provider) timezone issue


1.2.3
- Added ASTNode, ExpressionHelper, and ParseException to the public API
- Expression now extends ASTNode
- Added RtfWriter
- Added PdfWriter
- Added XmlWriter
- Added CombinedLogReader
- Added BinaryWriter
- added SequenceReader 


1.2.2
- RecordComparator now compares all fields when none are specified
- LookupTransformer now handles "no results" and "too many results" through overridable methods
- Added session properties to record and field
- Renamed ProxyWriter.setTargetDataSink to setNestedDataWriter
- Renamed ProxyReader.setTargetDataReader to setNestedDataReader
- ParsingReader now accepts a file parameter
- Interval now implements Comparable
- Moment now implements Comparable and accepts a Date
- FixedWithdWriter now extends LinedTextWriter (instead of TextWriter)
- CSVWriter now extends LinedTextWriter (instead of TextWriter)
- Added FieldFilterRule.toString()


1.2.1
- Added fixed width reader & writer


1.2.0
- JdbcReader & JdbcWriter shows current record on exception
- AbstractWriter now shows current record (instead of null) on exception
- ProxyReader respects the Record.isDeleted() flag when testing for record removal from ProxyReader.interceptRecord
- DataReader.read() now checks for records pushed into the buffer just before EOF
Changelog

Data Pipeline

Docs

Company

Tools