public class ParquetDataReader extends IntegrationReader
DataEndpoint.State
fieldLineage, recordLineage
lastRecord, PRODUCT, PRODUCT_VERSION, VENDOR, XML_INPUT_FACTORY_KEY
BUFFER_SIZE, captureElapsedTime, DEFAULT_READ_BUFFER_SIZE
id, log, name, TIMESTAMP_FORMAT
Constructor and Description |
---|
ParquetDataReader(InputFile inputFile)
Reads parquet data from an
InputFile . |
Modifier and Type | Method and Description |
---|---|
DataException |
addExceptionProperties(DataException exception)
Adds this endpoint's current state to a
DataException . |
protected Record |
addLineage(Record record) |
void |
close()
Indicates that this endpoint has finished reading or writing.
|
Configuration |
getConfiguration()
Returns the Parquet configuration parameters.
|
MetadataFilter |
getFilter()
Returns the filter settings.
|
MessageType |
getModifiedSchema()
Indicates the modified schema used to read the file.
|
MessageType |
getSchema()
Indicates the schema used to read the file.
|
protected Date |
int96ToTimestamp(byte[] int96Bytes) |
boolean |
isDebug()
Indicates if debugging is enabled to print log statements.
|
boolean |
isLineageSupported() |
boolean |
isMakeOptionalFieldsRequired()
Used whether to check for
optional columns that can be made required
using ColumnChunkMetaData#getStatistics() during open() . |
boolean |
isMakeRequiredFieldsOptional()
When true, check for
required columns that can be made optional
using ColumnChunkMetaData#getStatistics() . |
boolean |
isRemoveFieldsWithoutColumnMetadata()
When true, remove fields in the schema does not have a corresponding
ColumnChunkMetaData in
the file. |
boolean |
isRemoveFieldsWithoutValues()
When true, remove fields in the schema whose value count is <= 0
See
getFieldsWithoutColumnMetadata() for details. |
void |
open()
Makes this endpoint ready for reading or writing.
|
protected void |
readGroupValue(Group group,
int depth,
int fieldIndex,
ValueNodeContainer valueContainer) |
protected Record |
readImpl()
Overridden by subclasses to read the next record from this
DataReader . |
protected void |
readPrimitiveValue(Group group,
int fieldIndex,
ValueNodeContainer valueContainer,
Type fieldType) |
protected Record |
readRecord(Group group,
int depth) |
ParquetDataReader |
setConfiguration(Configuration configuration)
Sets the Parquet configuration parameters.
|
ParquetDataReader |
setDebug(boolean debug)
Indicates if debugging is enabled to print log statements.
|
ParquetDataReader |
setFilter(MetadataFilter filter)
Sets the filter settings.
|
ParquetDataReader |
setMakeOptionalFieldsRequired(boolean makeOptionalFieldsRequired)
Used whether to check for
optional columns that can be made required
using ColumnChunkMetaData#getStatistics() during open() . |
ParquetDataReader |
setMakeRequiredFieldsOptional(boolean makeRequiredFieldsOptional)
When true, check for
required columns that can be made optional
using ColumnChunkMetaData#getStatistics() . |
ParquetDataReader |
setRemoveFieldsWithoutColumnMetadata(boolean removeFieldsWithoutColumnMetadata)
Used whether to remove fields in the schema does not have a corresponding
ColumnChunkMetaData in
the file during open() . |
ParquetDataReader |
setRemoveFieldsWithoutValues(boolean removeFieldsWithoutValues)
Used whether to remove fields in the schema whose value count is <= 0 during
open() . |
ParquetDataReader |
setSchema(MessageType schema)
Indicates the schema used to read the file.
|
available, getBufferSize, getNestedEndpoint, getNestedReader, getRootEndpoint, getRootReader, isExhausted, isSaveLineage, peek, pop, push, read, setSaveLineage, skip
decrementRecordCount, enableJmx, getLastRecord, getRecordCount, getRecordCountAsBigInteger, getRecordCountAsString, incrementRecordCount, isRecordCountBigInteger, resetRecordCount, toString
addElapsedtime, assertClosed, assertNotOpened, assertOpened, finalize, getClosedOn, getDescription, getElapsedTime, getElapsedTimeAsString, getOpenedOn, getOpenElapsedTime, getOpenElapsedTimeAsString, getSelfTime, getSelfTimeAsString, getState, isCaptureElapsedTime, isClosed, isOpen, setCaptureElapsedTime, setDescription
public ParquetDataReader(InputFile inputFile)
InputFile
.public boolean isDebug()
public ParquetDataReader setDebug(boolean debug)
public Configuration getConfiguration()
public ParquetDataReader setConfiguration(Configuration configuration)
public MetadataFilter getFilter()
public ParquetDataReader setFilter(MetadataFilter filter)
public MessageType getSchema()
public MessageType getModifiedSchema()
public ParquetDataReader setSchema(MessageType schema)
public boolean isMakeRequiredFieldsOptional()
required
columns that can be made optional
using ColumnChunkMetaData#getStatistics()
.
See getRequiredColumnsWithNullValues()
for details.public ParquetDataReader setMakeRequiredFieldsOptional(boolean makeRequiredFieldsOptional)
required
columns that can be made optional
using ColumnChunkMetaData#getStatistics()
.
See getRequiredColumnsWithNullValues()
for details.public boolean isMakeOptionalFieldsRequired()
optional
columns that can be made required
using ColumnChunkMetaData#getStatistics()
during open()
.
See getOptionalColumnsWithoutNullValues()
()} for details.public ParquetDataReader setMakeOptionalFieldsRequired(boolean makeOptionalFieldsRequired)
optional
columns that can be made required
using ColumnChunkMetaData#getStatistics()
during open()
.
See getOptionalColumnsWithoutNullValues()
()} for details.public boolean isRemoveFieldsWithoutColumnMetadata()
ColumnChunkMetaData
in
the file.
See getFieldsWithoutColumnMetadata()
for details.public ParquetDataReader setRemoveFieldsWithoutColumnMetadata(boolean removeFieldsWithoutColumnMetadata)
ColumnChunkMetaData
in
the file during open()
.
See getFieldsWithoutColumnMetadata()
for details.public boolean isRemoveFieldsWithoutValues()
getFieldsWithoutColumnMetadata()
for details.public ParquetDataReader setRemoveFieldsWithoutValues(boolean removeFieldsWithoutValues)
open()
.
See getFieldsWithoutColumnMetadata()
for details.public void open() throws DataException
DataEndpoint
open
in class IntegrationReader
DataException
public void close() throws DataException
DataEndpoint
close
in class DataEndpoint
DataException
protected Record readImpl() throws Throwable
DataReader
DataReader
. The default
implementation of DataReader.read()
now insures that this method will not be called again after it returns
a null
.
If no record is available, null
will be returned.
readImpl
in class DataReader
Throwable
protected Record readRecord(Group group, int depth)
protected void readGroupValue(Group group, int depth, int fieldIndex, ValueNodeContainer valueContainer)
protected void readPrimitiveValue(Group group, int fieldIndex, ValueNodeContainer valueContainer, Type fieldType)
protected Date int96ToTimestamp(byte[] int96Bytes)
public boolean isLineageSupported()
isLineageSupported
in class DataReader
protected Record addLineage(Record record)
addLineage
in class DataReader
public DataException addExceptionProperties(DataException exception)
Endpoint
DataException
. Since this method is called whenever an
exception is thrown, subclasses should override it to add their specific information.addExceptionProperties
in class DataReader
Copyright (c) 2006-2024 North Concepts Inc. All Rights Reserved.