Read from Amazon S3
Updated: Jun 29, 2023
This example shows how you can use Data Pipeline to read files from an Amazon S3 bucket. It provides a convenient and efficient way to access and process data stored in S3, which is a popular cloud-based storage service offered by Amazon Web Services (AWS).
Real-life use cases for this example can vary across different domains and industries. For example, in data analytics and business intelligence, the library can be used to extract data from S3 buckets for further analysis and reporting. Data engineers can leverage it to build data pipelines that involve reading and processing large volumes of data stored in S3.
Java Code Listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.InputStream; import java.io.InputStreamReader; import com.northconcepts.datapipeline.amazons3.AmazonS3FileSystem; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.core.NullWriter; import com.northconcepts.datapipeline.csv.CSVReader; import com.northconcepts.datapipeline.job.Job; public class ReadFromAmazonS3 { private static final String ACCESS_KEY = "YOUR ACCESS KEY"; private static final String SECRET_KEY = "YOUR SECRET KEY"; public static void main(String[] args) throws Throwable { AmazonS3FileSystem s3 = new AmazonS3FileSystem(); s3.setBasicAWSCredentials(ACCESS_KEY, SECRET_KEY); s3.open(); try { InputStream inputStream = s3.readFile("datapipeline-test-01", "output/trades.csv"); DataReader reader = new CSVReader(new InputStreamReader(inputStream)); // DataWriter writer = StreamWriter.newSystemOutWriter(); DataWriter writer = new NullWriter(); Job.run(reader, writer); System.out.println("Records read: " + writer.getRecordCount()); } finally { s3.close(); } } }
Code Walkthrough
- Beginning the execution, an
AmazonS3FileSystem
instance is initialized with basic credentialsACCESS_KEY
andSECRET_KEY
. - A connection to the Amazon S3 file system is established with
open()
method. readFile()
method is invoked with the bucket"datapipeline-test-01"
and file"output/trades.csv"
. The result is then stored in anInputStream
instance.- A CSV type
DataReader
is then passed toJob.run()
method to transfer records to aNullWriter
. This writer can be of any desired type for accessing the records. - Finally, the count of records in the provided file is printed on the console.
Console Output
The count of obtained records will be printed in the console.