ParquetDataWriter (Data Pipeline JavaDoc)

java.lang.Object
- com.northconcepts.datapipeline.core.DataObject
- - com.northconcepts.datapipeline.core.Endpoint
  - - com.northconcepts.datapipeline.core.DataEndpoint
    - - com.northconcepts.datapipeline.core.DataWriter
      - com.northconcepts.datapipeline.internal.lang.IntegrationWriter
        
        com.northconcepts.datapipeline.parquet.ParquetDataWriter

```
public class ParquetDataWriter
extends IntegrationWriter
```
Writes records to Apache Parquet columnar files. See Apache Parquet columnar storage.

Nested Class Summary
- Nested classes/interfaces inherited from class com.northconcepts.datapipeline.core.DataEndpoint
  DataEndpoint.State

Field Summary
- Fields inherited from class com.northconcepts.datapipeline.core.DataEndpoint
  lastRecord, PRODUCT, PRODUCT_VERSION, VENDOR, XML_INPUT_FACTORY_KEY
- Fields inherited from class com.northconcepts.datapipeline.core.Endpoint
  BUFFER_SIZE, captureElapsedTime, DEFAULT_READ_BUFFER_SIZE
- Fields inherited from class com.northconcepts.datapipeline.core.DataObject
  id, log, name, TIMESTAMP_FORMAT

Constructor Summary

Constructors
Constructor and Description

ParquetDataWriter(File file)
Write parquet data to a file.

ParquetDataWriter(OutputFile outputFile)
Write parquet data to an OutputFile.

Constructors
Constructor and Description
`ParquetDataWriter(File file)` Write parquet data to a file.
`ParquetDataWriter(OutputFile outputFile)` Write parquet data to an `OutputFile`.

Method Summary

All Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`DataException`	`addExceptionProperties(DataException exception)` Adds this endpoint's current state to a `DataException`.
`void`	`close()` Indicates that this endpoint has finished reading or writing.
`String`	`getCacheFolder()` Indicates the folder to store cached files during dynamic schema generation with `LocalFileDataset` or null if schema generation should be performed completely in memory using `MemoryDataset` /> See also `getRecordsPerCacheFile()`
`protected int`	`getColumnStatsReaderThreads()`
`CompressionCodecName`	`getCompressionCodecName()` Indicates the compression used for writing (default UNCOMPRESSED).
`Configuration`	`getConfiguration()` Returns the Parquet configuration parameters.
`int`	`getDefaultBigDecimalScale()` Returns the default scale used when writing BigDecimal values (default 5).
`int`	`getDefaultBigNumberPrecision()` Returns the default precision used when writing BigDecimal & BigInteger values (default 25).
`Long`	`getMaxRecordsAnalyzed()` Indicates how many records should be analyzed and cached to generate the Parquet schema if no schema was explicitly set on this writer (default is 1000).
`int`	`getRecordsPerCacheFile()` Indicates how many records should be cached in memory to generate the Parquet schema if no schema was explicitly set on this writer (default is 10_000L).
`RoundingMode`	`getRoundingMode()` Indicates the rounding algorithm used for all BigDecimal values (default is `RoundingMode.HALF_UP`).
`MessageType`	`getSchema()` Returns the schema used to write the file.
`boolean`	`isDefaulAdjustToUTC()` Deprecated. see `isDefaultAdjustedToUTC()`
`boolean`	`isDefaultAdjustedToUTC()` Indicates if all datetime fields should be marked as AdjustedToUTC.
`boolean`	`isRemoveUnsupportedChars()` Indicates if unsupported characters should be removed from field names (default is true).
`void`	`open()` Makes this endpoint ready for reading or writing.
`ParquetDataWriter`	`setCacheFolder(String cacheFolder)` Indicates the folder to store cached files during dynamic schema generation with `LocalFileDataset` or null if schema generation should be performed completely in memory using `MemoryDataset`. See also `setRecordsPerCacheFile(int)`
`protected ParquetDataWriter`	`setColumnStatsReaderThreads(int columnStatsReaderThreads)`
`ParquetDataWriter`	`setCompressionCodecName(CompressionCodecName compressionCodecName)` Indicates the compression used for writing (default UNCOMPRESSED).
`ParquetDataWriter`	`setConfiguration(Configuration configuration)` Sets the Parquet configuration parameters.
`ParquetDataWriter`	`setDefaultAdjustedToUTC(boolean defaultAdjustedToUTC)` Indicates if all datetime fields should be marked as AdjustedToUTC.
`ParquetDataWriter`	`setDefaultAdjustToUTC(boolean defaultAdjustToUTC)` Deprecated. see `setDefaultAdjustedToUTC(boolean)`
`ParquetDataWriter`	`setDefaultBigDecimalScale(int defaultBigDecimalScale)` Sets the default scale used when writing BigDecimal values (default 5).
`ParquetDataWriter`	`setDefaultBigNumberPrecision(int defaultBigNumberPrecision)` Sets the default precision used when writing BigDecimal & BigInteger values (default 25).
`ParquetDataWriter`	`setMaxRecordsAnalyzed(Long maxRecordsAnalyzed)` Indicates how many records should be analyzed and cached to generate the Parquet schema if no schema was explicitly set on this writer (default is 1000).
`ParquetDataWriter`	`setRecordsPerCacheFile(int recordsPerCacheFile)` Indicates how many records should be cached in memory to generate the Parquet schema if no schema was explicitly set on this writer (default is 10_000L).
`ParquetDataWriter`	`setRemoveUnsupportedChars(boolean removeUnsupportedChars)` Indicates if unsupported characters should be removed from field names (default is true).
`ParquetDataWriter`	`setRoundingMode(RoundingMode roundingMode)` Indicates the rounding algorithm used for all BigDecimal values (default is `RoundingMode.HALF_UP`).
`ParquetDataWriter`	`setSchema(Connection connection, JdbcValueReader jdbcValueReader, String query, Object... queryParameters)` Sets the schema used to write the file by copying it from the metadata of an SQL query.
`ParquetDataWriter`	`setSchema(Connection connection, JdbcValueReader sqlToJavaTypeMapper, String databaseCatalog, String databaseSchema, String databaseTable)` Sets the schema used to write the Parquet file by copying it from the schema of a database table.
`ParquetDataWriter`	`setSchema(Connection connection, String query, Object... queryParameters)` Sets the schema used to write the file by copying it from the metadata of an SQL query.
`ParquetDataWriter`	`setSchema(Connection connection, String databaseCatalog, String databaseSchema, String databaseTable)` Sets the schema used to write the Parquet file by copying it from the schema of a database table.
`ParquetDataWriter`	`setSchema(JdbcConnectionFactory jdbcConnectionFactory, JdbcValueReader jdbcValueReader, String query, Object... queryParameters)` Sets the schema used to write the file by copying it from the metadata of an SQL query.
`ParquetDataWriter`	`setSchema(JdbcConnectionFactory jdbcConnectionFactory, String query, Object... queryParameters)` Sets the schema used to write the file by copying it from the metadata of an SQL query.
`ParquetDataWriter`	`setSchema(MessageType schema)` Sets the schema used to write the file.
`protected void`	`writeImpl(Record record)` Overridden by subclasses to write the specified record to this `DataWriter`.

Methods inherited from class com.northconcepts.datapipeline.core.DataWriter
available, getNestedEndpoint, getNestedWriter, getRootEndpoint, getRootWriter, write

Methods inherited from class com.northconcepts.datapipeline.core.DataEndpoint
decrementRecordCount, enableJmx, getLastRecord, getRecordCount, getRecordCountAsBigInteger, getRecordCountAsString, incrementRecordCount, isRecordCountBigInteger, resetRecordCount, toString

Methods inherited from class com.northconcepts.datapipeline.core.Endpoint
addElapsedtime, assertClosed, assertNotOpened, assertOpened, finalize, getClosedOn, getDescription, getElapsedTime, getElapsedTimeAsString, getOpenedOn, getOpenElapsedTime, getOpenElapsedTimeAsString, getSelfTime, getSelfTimeAsString, getState, isCaptureElapsedTime, isClosed, isOpen, setCaptureElapsedTime, setDescription

Methods inherited from class com.northconcepts.datapipeline.core.DataObject
exception, exception, exception, getId, getName, resetID

Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - ParquetDataWriter
```
public ParquetDataWriter(File file)
```
    Write parquet data to a file.
  - ParquetDataWriter
```
public ParquetDataWriter(OutputFile outputFile)
```
    Write parquet data to an OutputFile.
    
    Parameters:
    
    outputFile - - OutputFile with FileSystem.
- Method Detail
  - open
```
public void open()
          throws DataException
```
    Description copied from class: DataEndpoint
    
    Makes this endpoint ready for reading or writing.
    
    Overrides:
    
    open in class IntegrationWriter
    
    Throws:
    
    DataException
  - writeImpl
```
protected void writeImpl(Record record)
                  throws Throwable
```
    Description copied from class: DataWriter
    
    Overridden by subclasses to write the specified record to this DataWriter.
    
    Specified by:
    
    writeImpl in class DataWriter
    
    Throws:
    
    Throwable
  - close
```
public void close()
           throws DataException
```
    Description copied from class: DataEndpoint
    
    Indicates that this endpoint has finished reading or writing.
    
    Overrides:
    
    close in class DataEndpoint
    
    Throws:
    
    DataException
  - getSchema
```
public MessageType getSchema()
```
    Returns the schema used to write the file.
  - setSchema
```
public ParquetDataWriter setSchema(MessageType schema)
```
    Sets the schema used to write the file.
  - setSchema
```
public ParquetDataWriter setSchema(Connection connection,
                                   String query,
                                   Object... queryParameters)
```
    Sets the schema used to write the file by copying it from the metadata of an SQL query.
  - setSchema
```
public ParquetDataWriter setSchema(Connection connection,
                                   JdbcValueReader jdbcValueReader,
                                   String query,
                                   Object... queryParameters)
```
    Sets the schema used to write the file by copying it from the metadata of an SQL query. Since no data is read from query (only its metadata is used), care should be taken to create an optimized query where the database does minimal work and returns no data. For example, consider using a query like: SELECT * FROM invoices WHERE 1<0.
  - setSchema
```
public ParquetDataWriter setSchema(JdbcConnectionFactory jdbcConnectionFactory,
                                   String query,
                                   Object... queryParameters)
```
    Sets the schema used to write the file by copying it from the metadata of an SQL query. Since no data is read from query (only its metadata is used), care should be taken to create an optimized query where the database does minimal work and returns no data. For example, consider using a query like: SELECT * FROM invoices WHERE 1<0.
  - setSchema
```
public ParquetDataWriter setSchema(JdbcConnectionFactory jdbcConnectionFactory,
                                   JdbcValueReader jdbcValueReader,
                                   String query,
                                   Object... queryParameters)
```
    Sets the schema used to write the file by copying it from the metadata of an SQL query.
  - setSchema
```
public ParquetDataWriter setSchema(Connection connection,
                                   String databaseCatalog,
                                   String databaseSchema,
                                   String databaseTable)
```
    Sets the schema used to write the Parquet file by copying it from the schema of a database table.
  - setSchema
```
public ParquetDataWriter setSchema(Connection connection,
                                   JdbcValueReader sqlToJavaTypeMapper,
                                   String databaseCatalog,
                                   String databaseSchema,
                                   String databaseTable)
```
    Sets the schema used to write the Parquet file by copying it from the schema of a database table.
  - getDefaultBigDecimalScale
```
public int getDefaultBigDecimalScale()
```
    Returns the default scale used when writing BigDecimal values (default 5).
  - setDefaultBigDecimalScale
```
public ParquetDataWriter setDefaultBigDecimalScale(int defaultBigDecimalScale)
```
    Sets the default scale used when writing BigDecimal values (default 5).
  - getDefaultBigNumberPrecision
```
public int getDefaultBigNumberPrecision()
```
    Returns the default precision used when writing BigDecimal & BigInteger values (default 25).
  - setDefaultBigNumberPrecision
```
public ParquetDataWriter setDefaultBigNumberPrecision(int defaultBigNumberPrecision)
```
    Sets the default precision used when writing BigDecimal & BigInteger values (default 25).
  - addExceptionProperties
```
public DataException addExceptionProperties(DataException exception)
```
    Description copied from class: Endpoint
    
    Adds this endpoint's current state to a DataException. Since this method is called whenever an exception is thrown, subclasses should override it to add their specific information.
    
    Overrides:
    
    addExceptionProperties in class DataWriter
  - getCompressionCodecName
```
public CompressionCodecName getCompressionCodecName()
```
    Indicates the compression used for writing (default UNCOMPRESSED).
  - setCompressionCodecName
```
public ParquetDataWriter setCompressionCodecName(CompressionCodecName compressionCodecName)
```
    Indicates the compression used for writing (default UNCOMPRESSED).
  - isDefaulAdjustToUTC
```
@Deprecated
public boolean isDefaulAdjustToUTC()
```
    Deprecated. see isDefaultAdjustedToUTC()
  - setDefaultAdjustToUTC
```
@Deprecated
public ParquetDataWriter setDefaultAdjustToUTC(boolean defaultAdjustToUTC)
```
    Deprecated. see setDefaultAdjustedToUTC(boolean)
  - isDefaultAdjustedToUTC
```
public boolean isDefaultAdjustedToUTC()
```
    Indicates if all datetime fields should be marked as AdjustedToUTC.
  - setDefaultAdjustedToUTC
```
public ParquetDataWriter setDefaultAdjustedToUTC(boolean defaultAdjustedToUTC)
```
    Indicates if all datetime fields should be marked as AdjustedToUTC.
  - getRoundingMode
```
public RoundingMode getRoundingMode()
```
    Indicates the rounding algorithm used for all BigDecimal values (default is RoundingMode.HALF_UP).
  - setRoundingMode
```
public ParquetDataWriter setRoundingMode(RoundingMode roundingMode)
```
    Indicates the rounding algorithm used for all BigDecimal values (default is RoundingMode.HALF_UP).
  - getConfiguration
```
public Configuration getConfiguration()
```
    Returns the Parquet configuration parameters.
  - setConfiguration
```
public ParquetDataWriter setConfiguration(Configuration configuration)
```
    Sets the Parquet configuration parameters.
  - getColumnStatsReaderThreads
```
protected int getColumnStatsReaderThreads()
```
  - setColumnStatsReaderThreads
```
protected ParquetDataWriter setColumnStatsReaderThreads(int columnStatsReaderThreads)
```
  - getMaxRecordsAnalyzed
```
public Long getMaxRecordsAnalyzed()
```
    Indicates how many records should be analyzed and cached to generate the Parquet schema if no schema was explicitly set on this writer (default is 1000). This value will not be used if a schema was set on this writer.
    
    Passing in null will cause all records to be read and cached to determine the schema.
    
    The value will be set to 1 if a value less than 1 is passed in.
    
    Note: Using null or a high record count can significantly slow down processing and cause an OutOfMemoryError.
  - setMaxRecordsAnalyzed
```
public ParquetDataWriter setMaxRecordsAnalyzed(Long maxRecordsAnalyzed)
```
    Indicates how many records should be analyzed and cached to generate the Parquet schema if no schema was explicitly set on this writer (default is 1000). This value will not be used if a schema was set on this writer.
    
    Passing in null will cause all records to be read and cached to determine the schema.
    
    The value will be set to 1 if a value less than 1 is passed in.
    
    Note: Using null or a high record count can significantly slow down processing and cause an OutOfMemoryError.
  - getRecordsPerCacheFile
```
public int getRecordsPerCacheFile()
```
    Indicates how many records should be cached in memory to generate the Parquet schema if no schema was explicitly set on this writer (default is 10_000L). This value will not be used if a schema was set on this writer.
    
    The value will be set to 10_000L if a value less than 1 is passed in.
    
    Note: Using a small record count can significantly slow down processing due to many IO operations. And using a very huge record count can reduce the performance and consume more memory as it will hold more records in memory.
    
    Used only if setCacheFolder(String) is set.
  - setRecordsPerCacheFile
```
public ParquetDataWriter setRecordsPerCacheFile(int recordsPerCacheFile)
```
    Indicates how many records should be cached in memory to generate the Parquet schema if no schema was explicitly set on this writer (default is 10_000L). This value will not be used if a schema was set on this writer.
    
    The value will be set to 10_000 if a value less than 1 is passed in.
    
    Note: Using a small record count can significantly slow down processing due to many IO operations. And using a very huge record count can reduce the performance and consume more memory as it will hold more records in memory.
    
    Used only if setCacheFolder(String) is set.
  - getCacheFolder
```
public String getCacheFolder()
```
    Indicates the folder to store cached files during dynamic schema generation with LocalFileDataset or null if schema generation should be performed completely in memory using MemoryDataset /> See also getRecordsPerCacheFile()
  - setCacheFolder
```
public ParquetDataWriter setCacheFolder(String cacheFolder)
```
    Indicates the folder to store cached files during dynamic schema generation with LocalFileDataset or null if schema generation should be performed completely in memory using MemoryDataset.
    See also setRecordsPerCacheFile(int)
  - isRemoveUnsupportedChars
```
public boolean isRemoveUnsupportedChars()
```
    Indicates if unsupported characters should be removed from field names (default is true).
  - setRemoveUnsupportedChars
```
public ParquetDataWriter setRemoveUnsupportedChars(boolean removeUnsupportedChars)
```
    Indicates if unsupported characters should be removed from field names (default is true).

Class ParquetDataWriter

Nested Class Summary

Nested classes/interfaces inherited from class com.northconcepts.datapipeline.core.DataEndpoint

Field Summary

Fields inherited from class com.northconcepts.datapipeline.core.DataEndpoint

Fields inherited from class com.northconcepts.datapipeline.core.Endpoint

Fields inherited from class com.northconcepts.datapipeline.core.DataObject

Constructor Summary

Method Summary

Methods inherited from class com.northconcepts.datapipeline.core.DataWriter

Methods inherited from class com.northconcepts.datapipeline.core.DataEndpoint

Methods inherited from class com.northconcepts.datapipeline.core.Endpoint

Methods inherited from class com.northconcepts.datapipeline.core.DataObject

Methods inherited from class java.lang.Object

Constructor Detail

ParquetDataWriter

ParquetDataWriter

Method Detail

open

writeImpl

close

getSchema

setSchema

setSchema

setSchema

setSchema

setSchema

setSchema

setSchema

getDefaultBigDecimalScale

setDefaultBigDecimalScale

getDefaultBigNumberPrecision

setDefaultBigNumberPrecision

addExceptionProperties

getCompressionCodecName

setCompressionCodecName

isDefaulAdjustToUTC

setDefaultAdjustToUTC

isDefaultAdjustedToUTC

setDefaultAdjustedToUTC

getRoundingMode

setRoundingMode

getConfiguration

setConfiguration

getColumnStatsReaderThreads

setColumnStatsReaderThreads

getMaxRecordsAnalyzed

setMaxRecordsAnalyzed

getRecordsPerCacheFile

setRecordsPerCacheFile

getCacheFolder

setCacheFolder

isRemoveUnsupportedChars

setRemoveUnsupportedChars