Configure AmazonS3FileSystem Using Client

Updated: Jun 26, 2023

This example shows how to integrate with an Amazon S3 compatible storage offered by vendors other than Amazon. It allows you to leverage the power and convenience of S3 storage without being locked into a specific provider for your cloud storage infrastructure.

In this demo code, using an AWS client AmazonS3, you will see how to read records from a CSV file and write the number of obtained records to the console.

Java code listing

package com.northconcepts.datapipeline.examples.amazons3;

import java.io.InputStream;
import java.io.InputStreamReader;

import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.northconcepts.datapipeline.amazons3.AmazonS3FileSystem;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.job.Job;

public class ConfigureAmazonS3FileSystemUsingClient {
    
    private static final String ACCESS_KEY = "YOUR ACCESS KEY";
    private static final String SECRET_KEY = "YOUR SECRET KEY";
    
    public static void main(String[] args) throws Throwable {
        BasicAWSCredentials basicCredentials = new BasicAWSCredentials(ACCESS_KEY, SECRET_KEY);

        AmazonS3 s3Client = AmazonS3ClientBuilder
            .standard()
            .withCredentials(new AWSStaticCredentialsProvider(basicCredentials))
            .withRegion(Regions.US_EAST_2)
            .build();
        
        AmazonS3FileSystem s3 = new AmazonS3FileSystem();
        s3.setClient(s3Client);
        s3.open();

        try {
            InputStream inputStream = s3.readFile("datapipeline-test-01", "output/orders-records.csv");

            DataReader reader = new CSVReader(new InputStreamReader(inputStream));
            DataWriter writer = StreamWriter.newSystemOutWriter();

            Job.run(reader, writer);

            System.out.println("Records read: " + writer.getRecordCount());
        } finally {
            s3.close();
        }
    }

}

Code Walkthrough

  1. Create an instance of AmazonS3 with credentials and region.
    1. Set your AWS account credentials using AWSCredentialsProvider. The access keys and secret keys can be found in "Profile Name > My Security Credentials > Access Keys" of the AWS Management Console.
    2. Set your S3 region.
    3. Set your additional properties or parameters to this client.
  2. An instance of AmazonS3FileSystem is created with the above client.
  3. s3.open() is used to open a connection to the Amazon S3 file system.
  4. s3.readFile() reads the file specified in the first parameter from the S3 bucket.
  5. You can print obtained records on the console by using StreamWriter.newSystemOutWriter().
  6. Data are transferred from AmazonS3FileSystem to the console via Job.run() method. See how to compile and run data pipeline jobs.
  7. writer.getRecordCount returns the number of records read by DataWriter (indirectly it returns the number of records obtained by AmazonS3FileSystem).
  8. s3.close() closes the Amazon S3 connection.

AmazonS3FileSystem

Class for accessing an Amazon S3 file system. It extends FileSystem and it contains several useful methods such as listBuckets() and listFolders().

CSVReader

Obtains records from a Comma Separated Value (CSV) or delimited stream. It extends TextReader class and can be created using or Reader object. Passing true to method setFieldNamesInFirstRow() in this class enables the CSVReader to use the names specified in the first row of the input data as field names.

Console output

All the records from the S3 file will be printed to the console. Additionally, in the end, the number of obtained records will be printed in the console.

Mobile Analytics