Write an Avro File

Updated: Jun 4, 2023

In this example you are going to see how to write records to an Avro file using Data Pipeline's AvroWriter.

This example can be easily modified to show you how to read an Avro File.

Avro is an open source project that provides data serialization and data exchange services for Apache Hadoop.

Java Code listing

/*
 * Copyright (c) 2006-2022 North Concepts Inc.  All rights reserved.
 * Proprietary and Confidential.  Use is subject to license terms.
 * 
 * https://northconcepts.com/data-pipeline/licensing/
 */
package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;
import java.util.Arrays;

import com.northconcepts.datapipeline.avro.AvroWriter;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.Record;
import com.northconcepts.datapipeline.core.RecordList;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.memory.MemoryReader;

public class WriteAnAvroFile {
    
    public static void main(String[] args) {
        DataReader reader = new MemoryReader(new RecordList(createRecord("One"), createRecord("Two"),
                createRecord("Three")));
        DataWriter writer = new AvroWriter(new File("example/data/output/bird-iq.avro"), null);
        
        Job.run(reader, writer);
    }
    
    private static Record createRecord(String title) {
        Record record = new Record();
        record.setField("Title", title);
        record.setField("Number", 3.14);
        record.setField("BirdIQ", Arrays.asList("b","i","r","d","i","q"));
        return record;
    }

}

Code walkthrough

  1. A method createRecord() creates a Record to persist data in a key-value field.
  2. Record.setField() method is used to add new fields with the specified field name and value. In this example there are three fields Title, Number and BirdId and value can be added to each field as record1.setField("Number", 3.14).
  3. MemoryReader is created to obtain records from an in-memory RecordList.
  4. AvroWriter is created to write to an output bird-iq.avro file. Null value is passed to the second parameter because there are no any records that are going to be discarded for this example.
  5. Data are transferred from MemoryReader to AvroWriter via Job.run() method. See how to compile and run data pipeline jobs.

Record

Record class holds persistent data in key-value fields as it flows through the pipeline. A method setField() in this class creates a new field as key-value pair by taking field name and a value as a parameter.

MemoryReader

Obtains records from an in-memory RecordList. You get all records added to this object via getRecordList() method.

RecordList

As the name suggests, it is used to store a list of Record objects in memory. It implements Java's Iterable interface so you can perform operations similar to Java Collections classes on this object.

AvroWriter

Write records to an Apache Avro file. It extends IntegrationWriter and can be created using File and DataWriter or OutputStream and DataWriter objects. If there are any records that doesn't conform for the Avro schema you can discard those records by writing them to DataWriter object and passing the object to AvroWriter constructor as a second parameter.

Output

The output will be written to the Avro file and stored in the specified location i.e example/data/output.

Mobile Analytics