Read an Avro File

Updated: Jun 4, 2023

In this example you will learn how you can use AvroReader from DataPipline to read an avro file.

Avro is an open source project that provides data serialization and data exchange services for Apache Hadoop.

This example can be easily modified to show you how to write an Avro File.

Java Code listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import com.northconcepts.datapipeline.avro.AvroReader;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.job.Job;

public class ReadAnAvroFile {
    
    public static void main(String[] args) {
        DataReader reader = new AvroReader(new File("example/data/input/twitter.avro"));
        
        Job.run(reader, new StreamWriter(System.out));
    }
/* output
-----------------------------------------------
0 - Record (MODIFIED) {
    0:[username]:STRING=[miguno]:String
    1:[tweet]:STRING=[Rock: Nerf paper, scissors is fine.]:String
    2:[timestamp]:LONG=[1366150681]:Long
}

-----------------------------------------------
1 - Record (MODIFIED) {
    0:[username]:STRING=[BlizzardCS]:String
    1:[tweet]:STRING=[Works as intended.  Terran is IMBA.]:String
    2:[timestamp]:LONG=[1366154481]:Long
}

-----------------------------------------------
2 records
*/
}

Code walkthrough

  1. AvroReader is created corresponding to the input file twitter.avro. What this does is it reads the avro file and converts it into records that we can access through reader.
  2. Job.run(reader, new StreamWriter(System.out) is then used to copy the data from the reader to the StreamWriter.
  3. StreamWriter is used to write the document in a human readable format. In this case the output is written to the console.

AvroReader

Used to read an Apache Avro file and convert the contents into Records.

To read a stream of avro file, AvroReader(InputStream inputStream) constructor is used.

Console Output

-----------------------------------------------
0 - Record (MODIFIED) {
    0:[username]:STRING=[miguno]:String
    1:[tweet]:STRING=[Rock: Nerf paper, scissors is fine.]:String
    2:[timestamp]:LONG=[1366150681]:Long
}

-----------------------------------------------
1 - Record (MODIFIED) {
    0:[username]:STRING=[BlizzardCS]:String
    1:[tweet]:STRING=[Works as intended.  Terran is IMBA.]:String
    2:[timestamp]:LONG=[1366154481]:Long
}

-----------------------------------------------
2 records

Mobile Analytics