public class ParquetDataReader extends IntegrationReader
DataEndpoint.StatefieldLineage, recordLineagelastRecord, PRODUCT, PRODUCT_VERSION, VENDOR, XML_INPUT_FACTORY_KEYBUFFER_SIZE, captureElapsedTime, DEFAULT_READ_BUFFER_SIZEid, log, name, TIMESTAMP_FORMAT| Constructor and Description |
|---|
ParquetDataReader(InputFile inputFile)
Reads parquet data from an
InputFile. |
| Modifier and Type | Method and Description |
|---|---|
DataException |
addExceptionProperties(DataException exception)
Adds this endpoint's current state to a
DataException. |
protected Record |
addLineage(Record record) |
void |
close()
Indicates that this endpoint has finished reading or writing.
|
Configuration |
getConfiguration()
Returns the Parquet configuration parameters.
|
MetadataFilter |
getFilter()
Returns the filter settings.
|
MessageType |
getModifiedSchema()
Indicates the modified schema used to read the file.
|
MessageType |
getSchema()
Indicates the schema used to read the file.
|
protected Date |
int96ToTimestamp(byte[] int96Bytes) |
boolean |
isDebug()
Indicates if debugging is enabled to print log statements.
|
boolean |
isLineageSupported() |
boolean |
isMakeOptionalFieldsRequired()
Used whether to check for
optional columns that can be made required
using ColumnChunkMetaData#getStatistics() during open(). |
boolean |
isMakeRequiredFieldsOptional()
When true, check for
required columns that can be made optional
using ColumnChunkMetaData#getStatistics(). |
boolean |
isRemoveFieldsWithoutColumnMetadata()
When true, remove fields in the schema does not have a corresponding
ColumnChunkMetaData in
the file. |
boolean |
isRemoveFieldsWithoutValues()
When true, remove fields in the schema whose value count is <= 0
See
getFieldsWithoutColumnMetadata() for details. |
void |
open()
Makes this endpoint ready for reading or writing.
|
protected void |
readGroupValue(Group group,
int depth,
int fieldIndex,
ValueNodeContainer valueContainer) |
protected Record |
readImpl()
Overridden by subclasses to read the next record from this
DataReader. |
protected void |
readPrimitiveValue(Group group,
int fieldIndex,
ValueNodeContainer valueContainer,
Type fieldType) |
protected Record |
readRecord(Group group,
int depth) |
ParquetDataReader |
setConfiguration(Configuration configuration)
Sets the Parquet configuration parameters.
|
ParquetDataReader |
setDebug(boolean debug)
Indicates if debugging is enabled to print log statements.
|
ParquetDataReader |
setFilter(MetadataFilter filter)
Sets the filter settings.
|
ParquetDataReader |
setMakeOptionalFieldsRequired(boolean makeOptionalFieldsRequired)
Used whether to check for
optional columns that can be made required
using ColumnChunkMetaData#getStatistics() during open(). |
ParquetDataReader |
setMakeRequiredFieldsOptional(boolean makeRequiredFieldsOptional)
When true, check for
required columns that can be made optional
using ColumnChunkMetaData#getStatistics(). |
ParquetDataReader |
setRemoveFieldsWithoutColumnMetadata(boolean removeFieldsWithoutColumnMetadata)
Used whether to remove fields in the schema does not have a corresponding
ColumnChunkMetaData in
the file during open(). |
ParquetDataReader |
setRemoveFieldsWithoutValues(boolean removeFieldsWithoutValues)
Used whether to remove fields in the schema whose value count is <= 0 during
open(). |
ParquetDataReader |
setSchema(MessageType schema)
Indicates the schema used to read the file.
|
available, getBufferSize, getNestedEndpoint, getNestedReader, getReader, getRootEndpoint, getRootReader, isExhausted, isSaveLineage, peek, pop, push, read, setSaveLineage, skipdecrementRecordCount, enableJmx, getLastRecord, getRecordCount, getRecordCountAsBigInteger, getRecordCountAsString, incrementRecordCount, isRecordCountBigInteger, resetRecordCount, toStringaddElapsedtime, assertClosed, assertNotOpened, assertOpened, finalize, getClosedOn, getDescription, getElapsedTime, getElapsedTimeAsString, getOpenedOn, getOpenElapsedTime, getOpenElapsedTimeAsString, getSelfTime, getSelfTimeAsString, getState, isCaptureElapsedTime, isClosed, isOpen, setCaptureElapsedTime, setDescriptionpublic ParquetDataReader(InputFile inputFile)
InputFile.public boolean isDebug()
public ParquetDataReader setDebug(boolean debug)
public Configuration getConfiguration()
public ParquetDataReader setConfiguration(Configuration configuration)
public MetadataFilter getFilter()
public ParquetDataReader setFilter(MetadataFilter filter)
public MessageType getSchema()
public MessageType getModifiedSchema()
public ParquetDataReader setSchema(MessageType schema)
public boolean isMakeRequiredFieldsOptional()
required columns that can be made optional
using ColumnChunkMetaData#getStatistics().
See getRequiredColumnsWithNullValues() for details.public ParquetDataReader setMakeRequiredFieldsOptional(boolean makeRequiredFieldsOptional)
required columns that can be made optional
using ColumnChunkMetaData#getStatistics().
See getRequiredColumnsWithNullValues() for details.public boolean isMakeOptionalFieldsRequired()
optional columns that can be made required
using ColumnChunkMetaData#getStatistics() during open().
See getOptionalColumnsWithoutNullValues() ()} for details.public ParquetDataReader setMakeOptionalFieldsRequired(boolean makeOptionalFieldsRequired)
optional columns that can be made required
using ColumnChunkMetaData#getStatistics() during open().
See getOptionalColumnsWithoutNullValues() ()} for details.public boolean isRemoveFieldsWithoutColumnMetadata()
ColumnChunkMetaData in
the file.
See getFieldsWithoutColumnMetadata() for details.public ParquetDataReader setRemoveFieldsWithoutColumnMetadata(boolean removeFieldsWithoutColumnMetadata)
ColumnChunkMetaData in
the file during open().
See getFieldsWithoutColumnMetadata() for details.public boolean isRemoveFieldsWithoutValues()
getFieldsWithoutColumnMetadata() for details.public ParquetDataReader setRemoveFieldsWithoutValues(boolean removeFieldsWithoutValues)
open().
See getFieldsWithoutColumnMetadata() for details.public void open()
throws DataException
DataEndpointopen in class IntegrationReaderDataExceptionpublic void close()
throws DataException
DataEndpointclose in class DataEndpointDataExceptionprotected Record readImpl() throws Throwable
DataReaderDataReader. The default
implementation of DataReader.read() now insures that this method will not be called again after it returns
a null.
If no record is available, null will be returned.
readImpl in class DataReaderThrowableprotected Record readRecord(Group group, int depth)
protected void readGroupValue(Group group,
int depth,
int fieldIndex,
ValueNodeContainer valueContainer)
protected void readPrimitiveValue(Group group,
int fieldIndex,
ValueNodeContainer valueContainer,
Type fieldType)
protected Date int96ToTimestamp(byte[] int96Bytes)
public boolean isLineageSupported()
isLineageSupported in class DataReaderprotected Record addLineage(Record record)
addLineage in class DataReaderpublic DataException addExceptionProperties(DataException exception)
EndpointDataException. Since this method is called whenever an
exception is thrown, subclasses should override it to add their specific information.addExceptionProperties in class DataReaderCopyright (c) 2006-2025 North Concepts Inc. All Rights Reserved.