Store Dataset In Memory

Updated: Jan 12, 2023

In this example you will learn how to store your dataset in memory using MemoryDataset.

This example be easily modified to show how to Store Dataset on Disk.

Input CSV file

Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating
101,Reeves,Keanu,9315.45,10000.00,1/17/1998,A
312,Butler,Gerard,90.00,1000.00,8/6/2003,B
868,Hewitt,Jennifer Love,0,17000.00,5/25/1985,B
761,Pinkett-Smith,Jada,49654.87,100000.00,12/5/2006,A
317,Murray,Bill,789.65,5000.00,2/5/2007,C

Java code listing

package com.northconcepts.datapipeline.foundations.examples.pipeline;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.Record;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.foundations.pipeline.Pipeline;
import com.northconcepts.datapipeline.foundations.pipeline.dataset.Dataset;
import com.northconcepts.datapipeline.foundations.pipeline.dataset.MemoryDataset;

import java.io.File;

public class StoreDatasetInMemory {

    public static void main(String[] args) {
        DataReader reader = new CSVReader(new File("example/data/input/credit-balance-01.csv"))
                .setFieldNamesInFirstRow(true);

        Pipeline pipeline = new Pipeline().setInputAsDataReader(reader);

        Dataset dataset = new MemoryDataset(pipeline);

        dataset.load().waitForRecordsToLoad();

        for (Record record : dataset) {
            System.out.println(record);
        }
        dataset.close();
    }

}

Code walkthrough

A CSVReader is created using the file path of the input file credit-balance-01.csv.
The CSVReader.setFieldNamesInFirstRow(true) method is invoked to specify that the names specified in the first row should be used as field names.
An instance of Pipeline is created which receives the reader as input (Pipeline().setInputAsDataReader(reader)).
new MemoryDataset(pipeline) is used to create the dataset in memory, it accepts pipeline as a parameter which is the source of the dataset's data.
dataset.load() starts the asynchronous loading of records from the pipeline into this dataset. This method returns immediately and does not wait for loading to complete.
.waitForRecordsToLoad() is used to ensure that all the records have been loaded before proceeding forward.
Since Dataset is Iterable , it is looped over and the records are printed from memory.
dataset.close() terminates any asynchronous data loading and column stats calculation.

All Tags | All Examples

Store Dataset In Memory

Input CSV file

Java code listing

Code walkthrough

Data Pipeline

Docs

Company

Tools