Write An ORC file to Amazon S3 using a Temporary File

Updated: Jun 7, 2023

In this example, you will learn how to convert CSV data to a temporary ORC file and then write it to an Amazon S3 bucket.

CSV input

101,Hewitt,Jennifer Love,0,17000,25-05-1985,B


Java Code Listing

package com.northconcepts.datapipeline.examples.amazons3;

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;

import com.northconcepts.datapipeline.amazons3.AmazonS3FileSystem;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.orc.OrcDataWriter;

public class WriteAnOrcFileToAmazonS3UsingATemporaryFile {

    private static final String ACCESS_KEY = "YOUR ACCESS KEY";
    private static final String SECRET_KEY = "YOUR SECRET KEY";

    public static void main(String[] args) throws Throwable {
        File orcFile = File.createTempFile("credit-balance", ".orc");

        try {
            DataReader reader = new CSVReader(new File("example/data/input/credit-balance.csv"))
            OrcDataWriter writer = new OrcDataWriter(orcFile);

            Job.run(reader, writer);

        } finally {

    private static void uploadFileToS3(File orcFile) throws Throwable {
        AmazonS3FileSystem s3 = new AmazonS3FileSystem();
        try {
            s3.setBasicAWSCredentials(ACCESS_KEY, SECRET_KEY);

            OutputStream out = s3.writeMultipartFile("bucket-name", "output/credit-balance.orc");
            InputStream in = new BufferedInputStream(new FileInputStream(orcFile));

            byte[] buffer = new byte[1024];
            int lengthRead;
            while ((lengthRead = in.read(buffer)) > 0) {
                out.write(buffer, 0, lengthRead);
        } finally {


Code Walkthrough

  1. A temporary orcFile is created and is set to delete itself on exit.
  2. CSVReader is created to read from the local file credit-balance.csv.
  3. setFieldNamesInFirstRow(true) method is invoked so that the reader picks the headers from the first row of CSV File.
  4. A OrcDataWriter is then created taking the temporary file for writing.
  5. The file is uploaded to the Amazon S3 bucket by invoking the uploadFileToS3() method explained below:
    • The credentials are passed to setBasicAWSCredentials() method and a connection is opened.
    • Since we are using a temporary file, two streams InputStream and OutputStream are created.
    • A buffer of 1024 bytes is used to write from the temporary file in the InputStream to the bucket specified in the OutputStream.
    • In the finally block, the connection to Amazon S3 bucket is closed.


Code Output

An ORC File credit-balance.orc will be created in the Amazon S3 bucket.

