Use a Custom AWS S3 Endpoint with AmazonS3FileSystem
This example shows how to integrate with an Amazon S3 compatible storage solution offered by vendors other than Amazon. It allows users to leverage the power and convenience of S3 storage without being limited to a specific provider, offering flexibility and choice in their cloud storage infrastructure.
In this demo code, using a custom S3 endpoint, you will see how to read records from a CSV file and write the number of obtained records to the console.
Java code listing
package com.northconcepts.datapipeline.examples.amazons3; import java.io.InputStream; import java.io.InputStreamReader; import com.amazonaws.auth.AWSStaticCredentialsProvider; import com.amazonaws.auth.BasicAWSCredentials; import com.amazonaws.client.builder.AwsClientBuilder; import com.northconcepts.datapipeline.amazons3.AmazonS3FileSystem; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.job.Job; public class UseACustomAmazonS3EndpointWithAmazonS3FileSystem { private static final String ACCESS_KEY = "YOUR ACCESS KEY"; private static final String SECRET_KEY = "YOUR SECRET KEY"; private static final String AWS_S3_CUSTOM_ENDPOINT = "YOUR S3 CUSTOM ENDPOINT"; private static final String REGION = "YOUR AWS S3 REGION"; public static void main(String[] args) { BasicAWSCredentials credentials = new BasicAWSCredentials(ACCESS_KEY, SECRET_KEY); AmazonS3FileSystem s3 = new AmazonS3FileSystem() .setCredentialsProvider(new AWSStaticCredentialsProvider(credentials)) .setEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(AWS_S3_CUSTOM_ENDPOINT, REGION)) .setDebug(true); s3.open(); try { InputStream inputStream = s3.readFile("datapipeline-test-01", "output/trades.csv"); DataReader reader = new CSVReader(new InputStreamReader(inputStream)); DataWriter writer = StreamWriter.newSystemOutWriter(); Job.run(reader, writer); System.out.println("Records read: " + writer.getRecordCount()); } finally { s3.close(); } } }
Code Walkthrough
- An instance of AmazonS3FileSystem is created with the necessary access parameters.
- Set your AWS account credentials using
AWSCredentialsProvider
. The access keys and secret keys can be found in "Profile Name > My Security Credentials > Access Keys" of the AWS Management Console. - Set your custom endpoint with region using
EndpointConfiguration
.
- Set your AWS account credentials using
s3.open()
is used to open a connection to the Amazon S3 file system.s3.readFile()
reads the file specified in the first parameter from the S3 bucket.- You can print obtained records on the console by using
StreamWriter.newSystemOutWriter()
. - Data are transferred from
AmazonS3FileSystem
to the console via Job.run() method. See how to compile and run data pipeline jobs. writer.getRecordCount
returns the number of records read byDataWriter
(indirectly it returns the number of records obtained byAmazonS3FileSystem
).s3.close()
closes the Amazon S3 connection.
AmazonS3FileSystem
Class for accessing an Amazon S3 file system. It extends FileSystem and it contains several useful methods such as listBuckets()
and listFolders()
.
CSVReader
Obtains records from a Comma Separated Value (CSV) or delimited stream. It extends TextReader class and can be created using or Reader object. Passing true
to method setFieldNamesInFirstRow() in this class enables the CSVReader
to use the names specified in the first row of the input data as field names.
Console output
All the records from the S3 file will be printed to the console. Additionally, in the end, the number of obtained records will be printed in the console.