Write an Avro File
In this example you are going to see how to write records to an Avro file using Data Pipeline's AvroWriter.
This example can be easily modified to show you how to read an Avro File.
Avro is an open source project that provides data serialization and data exchange services for Apache Hadoop.
Java Code listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import java.util.Arrays; import com.northconcepts.datapipeline.avro.AvroWriter; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.core.Record; import com.northconcepts.datapipeline.core.RecordList; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.memory.MemoryReader; public class WriteAnAvroFile { public static void main(String[] args) { DataReader reader = new MemoryReader(new RecordList(createRecord("One"), createRecord("Two"), createRecord("Three"))); DataWriter writer = new AvroWriter(new File("example/data/output/bird-iq.avro"), null); Job.run(reader, writer); } private static Record createRecord(String title) { Record record = new Record(); record.setField("Title", title); record.setField("Number", 3.14); record.setField("BirdIQ", Arrays.asList("b","i","r","d","i","q")); return record; } }
Code walkthrough
- A method
createRecord()
creates a Record to persist data in a key-value field. - Record.setField() method is used to add new fields with the specified field name and value. In this example there are three fields
Title
,Number
andBirdId
and value can be added to each field asrecord1.setField("Number", 3.14)
. - MemoryReader is created to obtain records from an in-memory RecordList.
- AvroWriter is created to write to an output
bird-iq.avro
file.Null
value is passed to the second parameter because there are no any records that are going to be discarded for this example. - Data are transferred from MemoryReader to AvroWriter via Job.run() method. See how to compile and run data pipeline jobs.
Record
Record class holds persistent data in key-value fields as it flows through the pipeline. A method setField()
in this class creates a new field as key-value pair by taking field name and a value as a parameter.
MemoryReader
Obtains records from an in-memory RecordList. You get all records added to this object via getRecordList() method.
RecordList
As the name suggests, it is used to store a list of Record objects in memory. It implements Java's Iterable interface so you can perform operations similar to Java Collections classes on this object.
AvroWriter
Write records to an Apache Avro file. It extends IntegrationWriter and can be created using File and DataWriter or OutputStream and DataWriter objects. If there are any records that doesn't conform for the Avro schema you can discard those records by writing them to DataWriter object and passing the object to AvroWriter constructor as a second parameter.
Output
The output will be written to the Avro file and stored in the specified location i.e example/data/output
.