Write An ORC file to Amazon S3 using a Temporary File
Updated: Jun 7, 2023
In this example, you will learn how to convert CSV data to a temporary ORC file and then write it to an Amazon S3 bucket.
Check out CSV Examples for similar examples.
CSV input
Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating 101,Reeves,Keanu,9315.45,10000,17-01-1998,A 312,Butler,Gerard,90,1000,06-08-2003,B 101,Hewitt,Jennifer Love,0,17000,25-05-1985,B 312,Pinkett-Smith,Jada,49654.87,100000,05-12-2006,A 317,Murray,Bill,789.65,5000,05-02-2007,C 317,Murray,Bill,1,5000,05-02-2007,D
Java Code Listing
package com.northconcepts.datapipeline.examples.amazons3;
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import com.northconcepts.datapipeline.amazons3.AmazonS3FileSystem;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.orc.OrcDataWriter;
public class WriteAnOrcFileToAmazonS3UsingATemporaryFile {
private static final String ACCESS_KEY = "YOUR ACCESS KEY";
private static final String SECRET_KEY = "YOUR SECRET KEY";
public static void main(String[] args) throws Throwable {
File orcFile = File.createTempFile("credit-balance", ".orc");
orcFile.deleteOnExit();
try {
DataReader reader = new CSVReader(new File("example/data/input/credit-balance.csv"))
.setFieldNamesInFirstRow(true);
OrcDataWriter writer = new OrcDataWriter(orcFile);
Job.run(reader, writer);
uploadFileToS3(orcFile);
} finally {
orcFile.delete();
}
}
private static void uploadFileToS3(File orcFile) throws Throwable {
AmazonS3FileSystem s3 = new AmazonS3FileSystem();
try {
s3.setBasicAWSCredentials(ACCESS_KEY, SECRET_KEY);
s3.open();
OutputStream out = s3.writeMultipartFile("bucket-name", "output/credit-balance.orc");
InputStream in = new BufferedInputStream(new FileInputStream(orcFile));
byte[] buffer = new byte[1024];
int lengthRead;
while ((lengthRead = in.read(buffer)) > 0) {
out.write(buffer, 0, lengthRead);
}
} finally {
s3.close();
}
}
}
Code Walkthrough
- A temporary
orcFileis created and is set to delete itself on exit. - A
CSVReaderis created to read from the local filecredit-balance.csv. setFieldNamesInFirstRow(true)method is invoked so that thereaderpicks the headers from the first row of CSV File.- A
OrcDataWriteris then created taking the temporary file for writing. - The file is uploaded to the Amazon S3 bucket by invoking the uploadFileToS3() method explained below:
- The credentials are passed to
setBasicAWSCredentials()method and a connection is opened. - Since we are using a temporary file, two streams
InputStreamandOutputStreamare created. - A
bufferof 1024 bytes is used to write from the temporary file in theInputStreamto the bucket specified in theOutputStream. - In the
finallyblock, the connection to Amazon S3 bucket is closed.
- The credentials are passed to
Code Output
An ORC File credit-balance.orc will be created in the Amazon S3 bucket.
