Use a Custom AWS S3 Endpoint with AmazonS3FileSystem

Updated: Jun 27, 2023

This example shows how to integrate with an Amazon S3 compatible storage solution offered by vendors other than Amazon. It allows users to leverage the power and convenience of S3 storage without being limited to a specific provider, offering flexibility and choice in their cloud storage infrastructure.

In this demo code, using a custom S3 endpoint, you will see how to read records from a CSV file and write the number of obtained records to the console.

Java code listing

package com.northconcepts.datapipeline.examples.amazons3;

import java.io.InputStream;
import java.io.InputStreamReader;

import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.client.builder.AwsClientBuilder;
import com.northconcepts.datapipeline.amazons3.AmazonS3FileSystem;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.csv.CSVReader;
import com.northconcepts.datapipeline.job.Job;

public class UseACustomAmazonS3EndpointWithAmazonS3FileSystem {

	private static final String ACCESS_KEY = "YOUR ACCESS KEY";
	private static final String SECRET_KEY = "YOUR SECRET KEY";

	private static final String AWS_S3_CUSTOM_ENDPOINT = "YOUR S3 CUSTOM ENDPOINT";
	private static final String REGION = "YOUR AWS S3 REGION";

	public static void main(String[] args) {
		BasicAWSCredentials credentials = new BasicAWSCredentials(ACCESS_KEY, SECRET_KEY);
		AmazonS3FileSystem s3 = new AmazonS3FileSystem()
				.setCredentialsProvider(new AWSStaticCredentialsProvider(credentials))
				.setEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(AWS_S3_CUSTOM_ENDPOINT, REGION))
				.setDebug(true);

		s3.open();

		try {
			InputStream inputStream = s3.readFile("datapipeline-test-01", "output/trades.csv");

			DataReader reader = new CSVReader(new InputStreamReader(inputStream));
			DataWriter writer = StreamWriter.newSystemOutWriter();

			Job.run(reader, writer);

			System.out.println("Records read: " + writer.getRecordCount());
		} finally {
			s3.close();
		}
	}
}

Code Walkthrough

  1. An instance of AmazonS3FileSystem is created with the necessary access parameters.
    1. Set your AWS account credentials using AWSCredentialsProvider. The access keys and secret keys can be found in "Profile Name > My Security Credentials > Access Keys" of the AWS Management Console.
    2. Set your custom endpoint with region using EndpointConfiguration.
  2. s3.open() is used to open a connection to the Amazon S3 file system.
  3. s3.readFile() reads the file specified in the first parameter from the S3 bucket.
  4. You can print obtained records on the console by using StreamWriter.newSystemOutWriter().
  5. Data are transferred from AmazonS3FileSystem to the console via Job.run() method. See how to compile and run data pipeline jobs.
  6. writer.getRecordCount returns the number of records read by DataWriter (indirectly it returns the number of records obtained by AmazonS3FileSystem).
  7. s3.close() closes the Amazon S3 connection.

AmazonS3FileSystem

Class for accessing an Amazon S3 file system. It extends FileSystem and it contains several useful methods such as listBuckets() and listFolders().

CSVReader

Obtains records from a Comma Separated Value (CSV) or delimited stream. It extends TextReader class and can be created using or Reader object. Passing true to method setFieldNamesInFirstRow() in this class enables the CSVReader to use the names specified in the first row of the input data as field names.

Console output

All the records from the S3 file will be printed to the console. Additionally, in the end, the number of obtained records will be printed in the console.

Mobile Analytics