public class OrcDataReader extends IntegrationReader
DataEndpoint.State
fieldLineage, recordLineage
lastRecord, PRODUCT, PRODUCT_VERSION, VENDOR, XML_INPUT_FACTORY_KEY
BUFFER_SIZE, captureElapsedTime, DEFAULT_READ_BUFFER_SIZE
id, log, name, TIMESTAMP_FORMAT
Constructor and Description |
---|
OrcDataReader(File file)
Reads ORC data from a file.
|
OrcDataReader(Path path)
Read ORC data from a
Path . |
Modifier and Type | Method and Description |
---|---|
DataException |
addExceptionProperties(DataException exception)
Adds this endpoint's current state to a
DataException . |
protected Record |
addLineage(Record record) |
void |
close()
Indicates that this endpoint has finished reading or writing.
|
int |
getBatchSize()
Indicates the maximum number of records to buffer when reading ORC data (default 1024).
|
FieldList |
getColumns()
Indicates the columns to read from ORC file.
|
Configuration |
getConfig()
Returns the Orc configuration parameters.
|
Path |
getPath()
Returns the
Path of the ORC file being read. |
TypeDescription |
getSchema()
Indicates the schema used to read the file.
|
boolean |
isLineageSupported() |
void |
open()
Makes this endpoint ready for reading or writing.
|
protected Record |
readImpl()
Overridden by subclasses to read the next record from this
DataReader . |
OrcDataReader |
setBatchSize(int batchSize)
Indicates the maximum number of records to buffer when reading ORC data (default 1024).
|
OrcDataReader |
setColumns(FieldList columns)
Indicates the columns to read from ORC file.
|
OrcDataReader |
setConfig(Configuration config)
Sets the Orc configuration parameters.
|
OrcDataReader |
setSchema(String schema)
Indicates the schema used to read the file.
|
OrcDataReader |
setSchema(TypeDescription schema)
Indicates the schema used to read the file.
|
available, getBufferSize, getNestedEndpoint, getNestedReader, getRootEndpoint, getRootReader, isExhausted, isSaveLineage, peek, pop, push, read, setSaveLineage, skip
decrementRecordCount, enableJmx, getLastRecord, getRecordCount, getRecordCountAsBigInteger, getRecordCountAsString, incrementRecordCount, isRecordCountBigInteger, resetRecordCount, toString
addElapsedtime, assertClosed, assertNotOpened, assertOpened, finalize, getClosedOn, getDescription, getElapsedTime, getElapsedTimeAsString, getOpenedOn, getOpenElapsedTime, getOpenElapsedTimeAsString, getSelfTime, getSelfTimeAsString, getState, isCaptureElapsedTime, isClosed, isOpen, setCaptureElapsedTime, setDescription
public OrcDataReader(File file)
public OrcDataReader(Path path)
Path
.public void open() throws DataException
DataEndpoint
open
in class IntegrationReader
DataException
public void close() throws DataException
DataEndpoint
close
in class DataEndpoint
DataException
protected Record readImpl() throws Throwable
DataReader
DataReader
. The default
implementation of DataReader.read()
now insures that this method will not be called again after it returns
a null
.
If no record is available, null
will be returned.
readImpl
in class DataReader
Throwable
public int getBatchSize()
public OrcDataReader setBatchSize(int batchSize)
public FieldList getColumns()
public OrcDataReader setColumns(FieldList columns)
public TypeDescription getSchema()
public OrcDataReader setSchema(String schema)
public OrcDataReader setSchema(TypeDescription schema)
public Configuration getConfig()
public OrcDataReader setConfig(Configuration config)
public boolean isLineageSupported()
isLineageSupported
in class DataReader
protected Record addLineage(Record record)
addLineage
in class DataReader
public DataException addExceptionProperties(DataException exception)
Endpoint
DataException
. Since this method is called whenever an
exception is thrown, subclasses should override it to add their specific information.addExceptionProperties
in class DataReader
public Path getPath()
Path
of the ORC file being read.Copyright (c) 2006-2024 North Concepts Inc. All Rights Reserved.