Write to Amazon S3 Using Multipart Streaming
Updated: Jul 1, 2023
This example writes data files to Amazon S3 using multipart streaming. It provides efficient and scalable capabilities to upload large data files to S3 in a streaming fashion.
Organizations often need to store and archive large volumes of data in cloud storage systems like Amazon S3. With this library, users can easily stream and write data files to S3, making it suitable for data archiving and backup scenarios where data integrity, resumability, and scalability are essential.
Input CSV File
stock,time,price,shares JHX,09:30:00.00,57,95 JNJ,09:30:00.00,91.14,548 OPK,09:30:00.00,8.3,300 OPK,09:30:00.00,8.3,63 OMC,09:30:00.00,74.53,100 OMC,09:30:00.00,74.53,24
...
Java Code
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import java.io.OutputStream; import java.io.OutputStreamWriter; import com.northconcepts.datapipeline.amazons3.AmazonS3FileSystem; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.csv.CSVWriter; import com.northconcepts.datapipeline.job.Job; public class WriteToAmazonS3UsingMultipartStreaming { private static final String ACCESS_KEY = "YOUR ACCESS KEY"; private static final String SECRET_KEY = "YOUR SECRET KEY"; public static void main(String[] args) throws Throwable { AmazonS3FileSystem s3 = new AmazonS3FileSystem(); s3.setBasicAWSCredentials(ACCESS_KEY, SECRET_KEY); // s3.setDebug(true); s3.open(); try { // Create AWS S3 streaming, multi-part OutputStream OutputStream outputStream = s3.writeMultipartFile("datapipeline-test-01", "output/trades.csv"); DataReader reader = new CSVReader(new File("example/data/input/trades.csv")) .setFieldNamesInFirstRow(true); DataWriter writer = new CSVWriter(new OutputStreamWriter(outputStream, "utf-8")) .setFieldNamesInFirstRow(true); Job.run(reader, writer); System.out.println("Done."); } finally { s3.close(); } } }
Code Walkthrough
- First, AmazonS3FileSystem object is instantiated.
- Access credentials initially declared at a class level are assigned to the AmazonS3FileSystem object and connection to the system is established.
- Multipart OutputStream is created to write file data to the specified file path within a declared bucket via
writeMultipartFile()
method. - A CSVReader is created to read data from the local file
trades.csv
. - An OutputStreamWriter is then created to write the data via OutputStream instance. As the type of the output file is CSV, CSVWriter is utilized in the example.
- Data is transferred from CSVReader to CSVWriter via Job.run() method.
- After successful execution, "Done." message is printed on the console.
Output
Obtained records from the input CSV file will be written to trades.csv
file in the Amazon s3 file system.