Write An ORC file to Amazon S3 using a Temporary File
Updated: Jun 7, 2023
In this example, you will learn how to convert CSV data to a temporary ORC file and then write it to an Amazon S3 bucket.
Check out CSV Examples for similar examples.
CSV input
Account,LastName,FirstName,Balance,CreditLimit,AccountCreated,Rating 101,Reeves,Keanu,9315.45,10000,17-01-1998,A 312,Butler,Gerard,90,1000,06-08-2003,B 101,Hewitt,Jennifer Love,0,17000,25-05-1985,B 312,Pinkett-Smith,Jada,49654.87,100000,05-12-2006,A 317,Murray,Bill,789.65,5000,05-02-2007,C 317,Murray,Bill,1,5000,05-02-2007,D
Java Code Listing
package com.northconcepts.datapipeline.examples.amazons3; import java.io.BufferedInputStream; import java.io.File; import java.io.FileInputStream; import java.io.InputStream; import java.io.OutputStream; import com.northconcepts.datapipeline.amazons3.AmazonS3FileSystem; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.orc.OrcDataWriter; public class WriteAnOrcFileToAmazonS3UsingATemporaryFile { private static final String ACCESS_KEY = "YOUR ACCESS KEY"; private static final String SECRET_KEY = "YOUR SECRET KEY"; public static void main(String[] args) throws Throwable { File orcFile = File.createTempFile("credit-balance", ".orc"); orcFile.deleteOnExit(); try { DataReader reader = new CSVReader(new File("example/data/input/credit-balance.csv")) .setFieldNamesInFirstRow(true); OrcDataWriter writer = new OrcDataWriter(orcFile); Job.run(reader, writer); uploadFileToS3(orcFile); } finally { orcFile.delete(); } } private static void uploadFileToS3(File orcFile) throws Throwable { AmazonS3FileSystem s3 = new AmazonS3FileSystem(); try { s3.setBasicAWSCredentials(ACCESS_KEY, SECRET_KEY); s3.open(); OutputStream out = s3.writeMultipartFile("bucket-name", "output/credit-balance.orc"); InputStream in = new BufferedInputStream(new FileInputStream(orcFile)); byte[] buffer = new byte[1024]; int lengthRead; while ((lengthRead = in.read(buffer)) > 0) { out.write(buffer, 0, lengthRead); } } finally { s3.close(); } } }
Code Walkthrough
- A temporary
orcFile
is created and is set to delete itself on exit. - A
CSVReader
is created to read from the local filecredit-balance.csv
. setFieldNamesInFirstRow(true)
method is invoked so that thereader
picks the headers from the first row of CSV File.- A
OrcDataWriter
is then created taking the temporary file for writing. - The file is uploaded to the Amazon S3 bucket by invoking the uploadFileToS3() method explained below:
- The credentials are passed to
setBasicAWSCredentials()
method and a connection is opened. - Since we are using a temporary file, two streams
InputStream
andOutputStream
are created. - A
buffer
of 1024 bytes is used to write from the temporary file in theInputStream
to the bucket specified in theOutputStream
. - In the
finally
block, the connection to Amazon S3 bucket is closed.
- The credentials are passed to
Code Output
An ORC File credit-balance.orc
will be created in the Amazon S3 bucket.