public abstract class Dataset extends FoundationObject implements Iterable<Record>, Closeable
Pipeline
or DataMappingPipeline
. This class handles asynchronous loading
and column metadata and statistics.Modifier and Type | Class and Description |
---|---|
static class |
Dataset.ColumnsDataReader |
internalId, internalName, log, TIMESTAMP_FORMAT
SERIALIZED_CLASS_NAME, TYPE
XML_SERIALIZED_CLASS_NAME
Constructor and Description |
---|
Dataset(AbstractPipeline pipeline) |
Modifier and Type | Method and Description |
---|---|
protected Column |
addField(Record record,
Field field,
int fieldIndex) |
protected void |
afterColumnStatsLoaded()
Called during the data loading process after all the column stats have been loaded.
|
protected abstract void |
afterLoad()
Called at the end of the data loading process after all the records and column stats have been loaded.
|
protected void |
afterRecordsLoaded()
Called during the data loading process after all the records have been loaded.
|
protected abstract void |
beforeLoad()
Called at the start of the data loading process, but before any records or column stats have been loaded.
|
void |
cancelLoad()
Gracefully terminate the asynchronous data loading and column stats calculation, waiting up to 10 seconds in total.
|
void |
close() |
DataReader |
createColumnsDataReader() |
DataReader |
createDataReader()
Reads all data cached in this Dataset.
|
DataReader |
createDataReader(long offset,
int count)
Reads a subset of data cached in this Dataset.
|
protected abstract DataWriter |
createDataWriter()
Writes records to this dataset's cache after clearing it.
|
protected void |
finalize() |
void |
forEach(Consumer<? super Record> consumer)
Performs the given action for each record cached in this dataset.
|
DataWriterPipelineOutput |
fromRecord(Record source)
Loads this instance's state from a record and returns
this (for fluid API call chaining). |
abstract Column |
getColumn(int index) |
abstract Column |
getColumn(String name) |
abstract long |
getColumnCount() |
abstract List<String> |
getColumnNames() |
abstract List<Column> |
getColumns() |
Throwable |
getColumnStatsException() |
int |
getColumnStatsReaderThreads()
The number of threads to use to process column stats (default 2).
|
Throwable |
getDataLoadException() |
Job |
getJob() |
Long |
getMaxColumnStatsRecords()
The number of records to use when calculating column stats or
null for all records (default null ). |
Integer |
getMaxRecordsToLoad()
The maximum records to load parameter (
maxRecordsToLoad ) passed
to the last call to load(Integer) or load(Integer, JobCallback) . |
protected abstract Column |
getOrCreateColumn(String name,
int index) |
AbstractPipeline |
getPipeline() |
abstract Record |
getRecord(long index) |
abstract long |
getRecordCount() |
RecordList |
getRecordList(long offset,
int count)
Get a subset of the records cached in this dataset.
|
boolean |
isCollectUniqueValues()
Indicates if unique values in the dataset should be collected (default
false ). |
boolean |
isColumnStatsLoaded()
Return
true if all the column stats have been loaded. |
boolean |
isDataLoaded()
Return
true if all the records have been loaded and all the column stats have been loaded. |
boolean |
isDataLoading()
Return
true if the records or column stats are currently being loaded. |
boolean |
isDetectBigNumberValues()
Indicates if big decimals and big integers should be analyzed to determine their precision and scale (default
true ). |
boolean |
isDetectBooleanValues()
Indicates if boolean values should be looked for in strings and undefined types (default
true ). |
boolean |
isDetectNumericValues()
Indicates if numeric values should be looked for in strings and undefined types (default
true ). |
boolean |
isDetectTemporalValues()
Indicates if date/time patterns should be looked for in strings and undefined types (default
true ). |
boolean |
isDetectUuidValues()
Indicates if UUID values should be looked for in strings and undefined types (default
true ). |
boolean |
isInferStringTypes()
Indicates if string and undefined types should be analyzed to determine if they represent a numeric, boolean, or date/time value (default
true ). |
boolean |
isRecordsLoaded()
Return
true if all the records have been loaded. |
Iterator<Record> |
iterator() |
Dataset |
load()
Starts the asynchronous loading of records from the pipeline into this dataset.
|
Dataset |
load(Integer maxRecords)
Starts the asynchronous loading of records from the pipeline into this dataset.
|
Dataset |
load(Integer maxRecordsToLoad,
JobCallback<DataReader,DataWriter> callback)
Starts the asynchronous loading of records from the pipeline into this dataset.
|
Dataset |
setCollectUniqueValues(boolean collectUniqueValues)
Indicates if unique values in the dataset should be collected (default
false ). |
protected Dataset |
setColumnStatsLoaded(boolean columnStatsLoaded) |
Dataset |
setColumnStatsReaderThreads(int columnStatsReaderThreads)
The number of threads to use to process column stats (default 2).
|
Dataset |
setDetectBigNumberValues(boolean detectBigNumberValues)
Indicates if big decimals and big integers should be analyzed to determine their precision and scale (default
true ). |
Dataset |
setDetectBooleanValues(boolean detectBooleanValues)
Indicates if boolean values should be looked for in strings and undefined types (default
true ). |
Dataset |
setDetectNumericValues(boolean detectNumericValues)
Indicates if numeric values should be looked for in strings and undefined types (default
true ). |
Dataset |
setDetectTemporalValues(boolean detectTemporalValues)
Indicates if date/time patterns should be looked for in strings and undefined types (default
true ). |
Dataset |
setDetectUuidValues(boolean detectUuidValues)
Indicates if UUID values should be looked for in strings and undefined types (default
true ). |
Dataset |
setInferStringTypes(boolean inferStringTypes)
Indicates if string and undefined types should be analyzed to determine if they represent a numeric, boolean, or date/time value (default
true ). |
Dataset |
setMaxColumnStatsRecords(Long maxColumnStatsRecords)
The number of records to use when calculating column stats or
null for all records (default null ). |
Dataset |
setPipeline(AbstractPipeline pipeline) |
protected Dataset |
setRecordsLoaded(boolean recordsLoaded) |
Stream<Record> |
stream()
Returns a Stream over records cached in this dataset.
|
Record |
toRecord() |
protected void |
updateColumns(Record record,
DataWriter writer) |
Dataset |
waitForColumnStatsToLoad() |
Dataset |
waitForColumnStatsToLoad(long minRecords,
long maxWaitTimeMillis) |
Dataset |
waitForRecordsToLoad() |
Dataset |
waitForRecordsToLoad(long minRecords,
long maxWaitTimeMillis) |
Dataset |
waitUntilJobFinished() |
addExceptionProperties, assertValid, assertValid, clone, exception, exception, exception, getInternalId, getInternalName, resetInternalId
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
spliterator
fromArrayValue, fromArrayValue, fromArrayValue, fromJson, fromJson, newInstanceFromRecord, newInstanceFromRecord, toArrayValue, toJson, toJson, toJson, toRecord
newInstanceFromJson
forEachGrandChildElement, fromXml, fromXml, fromXmlElement, getAttribute, getAttribute, getAttribute, getAttribute, getAttribute, getAttribute, getAttribute, getAttribute, getAttribute, getAttribute, getChildElement, getChildElements, newInstanceFromXml, newInstanceFromXml, setAttribute, toXml, toXml, toXml, toXml, toXml, toXmlElement, writeXml, writeXml
addExceptionProperties, contributeExceptionProperties, contributeExceptionProperties
public Dataset(AbstractPipeline pipeline)
public void close()
close
in interface Closeable
close
in interface AutoCloseable
protected void finalize() throws Throwable
public AbstractPipeline getPipeline()
public Dataset setPipeline(AbstractPipeline pipeline)
public abstract long getRecordCount()
public abstract Record getRecord(long index)
public DataReader createDataReader()
public DataReader createDataReader(long offset, int count)
public RecordList getRecordList(long offset, int count)
public void forEach(Consumer<? super Record> consumer)
public abstract long getColumnCount()
public abstract Column getColumn(int index)
public Long getMaxColumnStatsRecords()
null
for all records (default null
).public Dataset setMaxColumnStatsRecords(Long maxColumnStatsRecords)
null
for all records (default null
).public boolean isInferStringTypes()
true
).public Dataset setInferStringTypes(boolean inferStringTypes)
true
).public boolean isDetectTemporalValues()
true
).public Dataset setDetectTemporalValues(boolean detectTemporalValues)
true
).public boolean isDetectNumericValues()
true
).public Dataset setDetectNumericValues(boolean detectNumericValues)
true
).public boolean isDetectBooleanValues()
true
).public Dataset setDetectBooleanValues(boolean detectBooleanValues)
true
).public boolean isDetectBigNumberValues()
true
).public Dataset setDetectBigNumberValues(boolean detectBigNumberValues)
true
).public boolean isDetectUuidValues()
true
).public Dataset setDetectUuidValues(boolean detectUuidValues)
true
).public boolean isCollectUniqueValues()
false
).public Dataset setCollectUniqueValues(boolean collectUniqueValues)
false
).public boolean isDataLoading()
true
if the records or column stats are currently being loaded.public boolean isDataLoaded()
true
if all the records have been loaded and all the column stats have been loaded.public boolean isRecordsLoaded()
true
if all the records have been loaded. The column stats might not have been loaded even when this method returns true
since they require additional processing.isColumnStatsLoaded()
protected Dataset setRecordsLoaded(boolean recordsLoaded)
public boolean isColumnStatsLoaded()
true
if all the column stats have been loaded. The records would have already been loaded when this method is called since column stats require additional processing.
See isRecordsLoaded()
protected Dataset setColumnStatsLoaded(boolean columnStatsLoaded)
public Throwable getDataLoadException()
public Throwable getColumnStatsException()
public DataReader createColumnsDataReader()
public Job getJob()
public int getColumnStatsReaderThreads()
public Dataset setColumnStatsReaderThreads(int columnStatsReaderThreads)
public Integer getMaxRecordsToLoad()
maxRecordsToLoad
) passed
to the last call to load(Integer)
or load(Integer, JobCallback)
.
This value is null
if load()
was called last or null
was passed to the other load methods.protected abstract void beforeLoad()
protected abstract void afterLoad()
protected void afterRecordsLoaded()
protected void afterColumnStatsLoaded()
public Dataset load()
waitForRecordsToLoad()
and waitForRecordsToLoad(long, long)
.public Dataset load(Integer maxRecords)
waitForRecordsToLoad()
and waitForRecordsToLoad(long, long)
.maxRecords
- the maximum records to load or null
to load all records.public Dataset load(Integer maxRecordsToLoad, JobCallback<DataReader,DataWriter> callback)
waitForRecordsToLoad()
and waitForRecordsToLoad(long, long)
.maxRecordsToLoad
- the maximum records to load or null
to load all records.callback
- the object to notify as data is being loaded.public void cancelLoad()
public Dataset waitForRecordsToLoad(long minRecords, long maxWaitTimeMillis)
public Dataset waitForRecordsToLoad()
public Dataset waitUntilJobFinished()
public Dataset waitForColumnStatsToLoad(long minRecords, long maxWaitTimeMillis)
public Dataset waitForColumnStatsToLoad()
protected abstract DataWriter createDataWriter()
protected void updateColumns(Record record, DataWriter writer)
public Record toRecord()
toRecord
in interface RecordSerializable
toRecord
in class Bean
public DataWriterPipelineOutput fromRecord(Record source)
RecordSerializable
this
(for fluid API call chaining).
For fluid API call chaining, the overridden method should change the declared return type to its class.fromRecord
in interface RecordSerializable
fromRecord
in class Bean
Copyright (c) 2006-2024 North Concepts Inc. All Rights Reserved.