public class ParquetDataWriter extends IntegrationWriter
DataEndpoint.State
lastRecord, PRODUCT, PRODUCT_VERSION, VENDOR, XML_INPUT_FACTORY_KEY
BUFFER_SIZE, captureElapsedTime, DEFAULT_READ_BUFFER_SIZE
id, log, name, TIMESTAMP_FORMAT
Constructor and Description |
---|
ParquetDataWriter(File file)
Write parquet data to a file.
|
ParquetDataWriter(OutputFile outputFile)
Write parquet data to an
OutputFile . |
Modifier and Type | Method and Description |
---|---|
DataException |
addExceptionProperties(DataException exception)
Adds this endpoint's current state to a
DataException . |
void |
close()
Indicates that this endpoint has finished reading or writing.
|
String |
getCacheFolder()
Indicates the folder to store cached files during dynamic schema generation with
LocalFileDataset
or null if schema generation should be performed completely in memory using MemoryDataset />
See also getRecordsPerCacheFile() |
protected int |
getColumnStatsReaderThreads() |
CompressionCodecName |
getCompressionCodecName()
Indicates the compression used for writing (default UNCOMPRESSED).
|
Configuration |
getConfiguration()
Returns the Parquet configuration parameters.
|
int |
getDefaultBigDecimalScale()
Returns the default scale used when writing BigDecimal values (default 5).
|
int |
getDefaultBigNumberPrecision()
Returns the default precision used when writing BigDecimal & BigInteger values (default 25).
|
Long |
getMaxRecordsAnalyzed()
Indicates how many records should be analyzed and cached to generate the Parquet schema if no schema was explicitly set
on this writer (default is 1000).
|
int |
getRecordsPerCacheFile()
Indicates how many records should be cached in memory to generate the Parquet schema if no schema was explicitly set
on this writer (default is 10_000L).
|
RoundingMode |
getRoundingMode()
Indicates the rounding algorithm used for all BigDecimal values (default is
RoundingMode.HALF_UP ). |
MessageType |
getSchema()
Returns the schema used to write the file.
|
boolean |
isDefaulAdjustToUTC()
Deprecated.
|
boolean |
isDefaultAdjustedToUTC()
Indicates if all datetime fields should be marked as AdjustedToUTC.
|
void |
open()
Makes this endpoint ready for reading or writing.
|
ParquetDataWriter |
setCacheFolder(String cacheFolder)
Indicates the folder to store cached files during dynamic schema generation with
LocalFileDataset
or null if schema generation should be performed completely in memory using MemoryDataset .See also setRecordsPerCacheFile(int) |
protected ParquetDataWriter |
setColumnStatsReaderThreads(int columnStatsReaderThreads) |
ParquetDataWriter |
setCompressionCodecName(CompressionCodecName compressionCodecName)
Indicates the compression used for writing (default UNCOMPRESSED).
|
ParquetDataWriter |
setConfiguration(Configuration configuration)
Sets the Parquet configuration parameters.
|
ParquetDataWriter |
setDefaultAdjustedToUTC(boolean defaultAdjustedToUTC)
Indicates if all datetime fields should be marked as AdjustedToUTC.
|
ParquetDataWriter |
setDefaultAdjustToUTC(boolean defaultAdjustToUTC)
Deprecated.
|
ParquetDataWriter |
setDefaultBigDecimalScale(int defaultBigDecimalScale)
Sets the default scale used when writing BigDecimal values (default 5).
|
ParquetDataWriter |
setDefaultBigNumberPrecision(int defaultBigNumberPrecision)
Sets the default precision used when writing BigDecimal & BigInteger values (default 25).
|
ParquetDataWriter |
setMaxRecordsAnalyzed(Long maxRecordsAnalyzed)
Indicates how many records should be analyzed and cached to generate the Parquet schema if no schema was explicitly set
on this writer (default is 1000).
|
ParquetDataWriter |
setRecordsPerCacheFile(int recordsPerCacheFile)
Indicates how many records should be cached in memory to generate the Parquet schema if no schema was explicitly set
on this writer (default is 10_000L).
|
ParquetDataWriter |
setRoundingMode(RoundingMode roundingMode)
Indicates the rounding algorithm used for all BigDecimal values (default is
RoundingMode.HALF_UP ). |
ParquetDataWriter |
setSchema(Connection connection,
JdbcValueReader jdbcValueReader,
String query,
Object... queryParameters)
Sets the schema used to write the file by copying it from the metadata of an SQL query.
|
ParquetDataWriter |
setSchema(Connection connection,
JdbcValueReader sqlToJavaTypeMapper,
String databaseCatalog,
String databaseSchema,
String databaseTable)
Sets the schema used to write the Parquet file by copying it from the schema of a database table.
|
ParquetDataWriter |
setSchema(Connection connection,
String query,
Object... queryParameters)
Sets the schema used to write the file by copying it from the metadata of an SQL query.
|
ParquetDataWriter |
setSchema(Connection connection,
String databaseCatalog,
String databaseSchema,
String databaseTable)
Sets the schema used to write the Parquet file by copying it from the schema of a database table.
|
ParquetDataWriter |
setSchema(JdbcConnectionFactory jdbcConnectionFactory,
JdbcValueReader jdbcValueReader,
String query,
Object... queryParameters)
Sets the schema used to write the file by copying it from the metadata of an SQL query.
|
ParquetDataWriter |
setSchema(JdbcConnectionFactory jdbcConnectionFactory,
String query,
Object... queryParameters)
Sets the schema used to write the file by copying it from the metadata of an SQL query.
|
ParquetDataWriter |
setSchema(MessageType schema)
Sets the schema used to write the file.
|
protected void |
writeImpl(Record record)
Overridden by subclasses to write the specified record to this
DataWriter . |
available, getNestedEndpoint, getNestedWriter, getRootEndpoint, getRootWriter, write
decrementRecordCount, enableJmx, getLastRecord, getRecordCount, getRecordCountAsBigInteger, getRecordCountAsString, incrementRecordCount, isRecordCountBigInteger, resetRecordCount, toString
addElapsedtime, assertClosed, assertNotOpened, assertOpened, finalize, getClosedOn, getDescription, getElapsedTime, getElapsedTimeAsString, getOpenedOn, getOpenElapsedTime, getOpenElapsedTimeAsString, getSelfTime, getSelfTimeAsString, getState, isCaptureElapsedTime, isClosed, isOpen, setCaptureElapsedTime, setDescription
public ParquetDataWriter(File file)
public ParquetDataWriter(OutputFile outputFile)
OutputFile
.outputFile
- - OutputFile with FileSystem.public void open() throws DataException
DataEndpoint
open
in class IntegrationWriter
DataException
protected void writeImpl(Record record) throws Throwable
DataWriter
DataWriter
.writeImpl
in class DataWriter
Throwable
public void close() throws DataException
DataEndpoint
close
in class DataEndpoint
DataException
public MessageType getSchema()
public ParquetDataWriter setSchema(MessageType schema)
public ParquetDataWriter setSchema(Connection connection, String query, Object... queryParameters)
public ParquetDataWriter setSchema(Connection connection, JdbcValueReader jdbcValueReader, String query, Object... queryParameters)
SELECT * FROM invoices WHERE 1<0
.public ParquetDataWriter setSchema(JdbcConnectionFactory jdbcConnectionFactory, String query, Object... queryParameters)
SELECT * FROM invoices WHERE 1<0
.public ParquetDataWriter setSchema(JdbcConnectionFactory jdbcConnectionFactory, JdbcValueReader jdbcValueReader, String query, Object... queryParameters)
public ParquetDataWriter setSchema(Connection connection, String databaseCatalog, String databaseSchema, String databaseTable)
public ParquetDataWriter setSchema(Connection connection, JdbcValueReader sqlToJavaTypeMapper, String databaseCatalog, String databaseSchema, String databaseTable)
public int getDefaultBigDecimalScale()
public ParquetDataWriter setDefaultBigDecimalScale(int defaultBigDecimalScale)
public int getDefaultBigNumberPrecision()
public ParquetDataWriter setDefaultBigNumberPrecision(int defaultBigNumberPrecision)
public DataException addExceptionProperties(DataException exception)
Endpoint
DataException
. Since this method is called whenever an
exception is thrown, subclasses should override it to add their specific information.addExceptionProperties
in class DataWriter
public CompressionCodecName getCompressionCodecName()
public ParquetDataWriter setCompressionCodecName(CompressionCodecName compressionCodecName)
@Deprecated public boolean isDefaulAdjustToUTC()
isDefaultAdjustedToUTC()
@Deprecated public ParquetDataWriter setDefaultAdjustToUTC(boolean defaultAdjustToUTC)
setDefaultAdjustedToUTC(boolean)
public boolean isDefaultAdjustedToUTC()
public ParquetDataWriter setDefaultAdjustedToUTC(boolean defaultAdjustedToUTC)
public RoundingMode getRoundingMode()
RoundingMode.HALF_UP
).public ParquetDataWriter setRoundingMode(RoundingMode roundingMode)
RoundingMode.HALF_UP
).public Configuration getConfiguration()
public ParquetDataWriter setConfiguration(Configuration configuration)
protected int getColumnStatsReaderThreads()
protected ParquetDataWriter setColumnStatsReaderThreads(int columnStatsReaderThreads)
public Long getMaxRecordsAnalyzed()
null
will cause all records to be read and cached to determine the schema.null
or a high record count can significantly slow down processing and cause an OutOfMemoryError
. public ParquetDataWriter setMaxRecordsAnalyzed(Long maxRecordsAnalyzed)
null
will cause all records to be read and cached to determine the schema.null
or a high record count can significantly slow down processing and cause an OutOfMemoryError
. public int getRecordsPerCacheFile()
setCacheFolder(String)
is set. public ParquetDataWriter setRecordsPerCacheFile(int recordsPerCacheFile)
setCacheFolder(String)
is set. public String getCacheFolder()
LocalFileDataset
or null if schema generation should be performed completely in memory using MemoryDataset
/>
See also getRecordsPerCacheFile()
public ParquetDataWriter setCacheFolder(String cacheFolder)
LocalFileDataset
or null if schema generation should be performed completely in memory using MemoryDataset
.setRecordsPerCacheFile(int)
Copyright (c) 2006-2024 North Concepts Inc. All Rights Reserved.