Write to Amazon S3 Using Multipart Streaming

Updated: Jul 1, 2023

Amazon S3

This example writes data files to Amazon S3 using multipart streaming. It provides efficient and scalable capabilities to upload large data files to S3 in a streaming fashion.

Organizations often need to store and archive large volumes of data in cloud storage systems like Amazon S3. With this library, users can easily stream and write data files to S3, making it suitable for data archiving and backup scenarios where data integrity, resumability, and scalability are essential.

Input CSV File

stock,time,price,shares
JHX,09:30:00.00,57,95
JNJ,09:30:00.00,91.14,548
OPK,09:30:00.00,8.3,300
OPK,09:30:00.00,8.3,63
OMC,09:30:00.00,74.53,100
OMC,09:30:00.00,74.53,24
...

Java Code

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;
import java.io.OutputStream;
import java.io.OutputStreamWriter;

import com.northconcepts.datapipeline.amazons3.AmazonS3FileSystem;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.csv.CSVWriter;
import com.northconcepts.datapipeline.job.Job;

public class WriteToAmazonS3UsingMultipartStreaming {
    
    private static final String ACCESS_KEY = "YOUR ACCESS KEY";
    private static final String SECRET_KEY = "YOUR SECRET KEY";

    public static void main(String[] args) throws Throwable {
        AmazonS3FileSystem s3 = new AmazonS3FileSystem();
        s3.setBasicAWSCredentials(ACCESS_KEY, SECRET_KEY);
//        s3.setDebug(true);
        s3.open();
        try {
            // Create AWS S3 streaming, multi-part OutputStream 
            OutputStream outputStream = s3.writeMultipartFile("datapipeline-test-01", "output/trades.csv");

            DataReader reader = new CSVReader(new File("example/data/input/trades.csv"))
                    .setFieldNamesInFirstRow(true);
                
            DataWriter writer = new CSVWriter(new OutputStreamWriter(outputStream, "utf-8"))
                    .setFieldNamesInFirstRow(true);
            
            Job.run(reader, writer);
            
            System.out.println("Done.");
        } finally {
            s3.close();
        }
    }

}

Code Walkthrough

First, AmazonS3FileSystem object is instantiated.
Access credentials initially declared at a class level are assigned to the AmazonS3FileSystem object and connection to the system is established.
Multipart OutputStream is created to write file data to the specified file path within a declared bucket via writeMultipartFile() method.
A CSVReader is created to read data from the local file trades.csv.
An OutputStreamWriter is then created to write the data via OutputStream instance. As the type of the output file is CSV, CSVWriter is utilized in the example.
Data is transferred from CSVReader to CSVWriter via Job.run() method.
After successful execution, "Done." message is printed on the console.

Output

Obtained records from the input CSV file will be written to trades.csv file in the Amazon s3 file system.

All Tags | All Examples

Write to Amazon S3 Using Multipart Streaming

Input CSV File

Java Code

Code Walkthrough

Output

Data Pipeline

Docs

Company

Tools