Convert JSON to Parquet

Updated: Aug 28, 2023

This example shows how to convert JSON data into the Parquet format. By converting JSON to Parquet, you can benefit from improved query performance, reduced storage space, and enhanced compatibility with data analytics platforms. This can be used in big data processing, data warehousing, and business intelligence applications.

If your JSON objects do not follow the same pattern with the same fields, you can view this example to convert your data to Parquet.

 

Input JSON file

[
  {
    "id": 1,
    "first_name": "Sarette",
    "last_name": "Sharkey",
    "email": "ssharkey0@vkontakte.ru",
    "ip_address": "236.5.24.150"
  },
  {
    "id": 2,
    "first_name": "Wylie",
    "last_name": "Greeno",
    "email": "wgreeno1@cornell.edu",
    "ip_address": "39.67.151.237"
  },
...

 

Java Code Listing

package com.northconcepts.datapipeline.examples.cookbook;

import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.json.SimpleJsonReader;
import com.northconcepts.datapipeline.parquet.ParquetDataReader;
import com.northconcepts.datapipeline.parquet.ParquetDataWriter;

import java.io.File;

public class ConvertJsonToParquet {
    private static final File PARQUET_FILE = new File("example/data/output/WriteAParquetFileFromJson.parquet");

    public static void main(String[] args) {
        File inputFile = new File("example/data/input/pipeline/MOCK_DATA.json");
        DataReader reader = new SimpleJsonReader(inputFile);

        ParquetDataWriter writer = new ParquetDataWriter(PARQUET_FILE);

        Job.run(reader, writer);

        System.out.println("=======================Reading Parquet File============================================");
        reader = new ParquetDataReader(PARQUET_FILE);
        Job.run(reader, new StreamWriter(System.out));
    }
}

 

Code Walkthrough

  1. SimpleJsonReader object is created corresponding to the input file MOCK_DATA.json.
  2. ParquetDataWriter is created to write records to the specified Parquet file WriteAParquetFileFromJson.parquet.
  3. Job.run(reader, writer) is used to transfer the data from the JSON file to Parquet. See how to compile and run data pipeline jobs. 
  4. ParquetDataReader is created to read data from WriteAParquetFileFromJson.parquet.
  5. An array of objects is then printed on the console via Job.run() and StreamWriter class which takes System.out as a parameter to its constructor.

 

Console Output

=======================Reading Parquet File============================================
-----------------------------------------------
0 - Record (MODIFIED) {
    0:[id]:LONG=[1]:Long
    1:[first_name]:STRING=[Sarette]:String
    2:[last_name]:STRING=[Sharkey]:String
    3:[email]:STRING=[ssharkey0@vkontakte.ru]:String
    4:[ip_address]:STRING=[236.5.24.150]:String
}

-----------------------------------------------
1 - Record (MODIFIED) {
    0:[id]:LONG=[2]:Long
    1:[first_name]:STRING=[Wylie]:String
    2:[last_name]:STRING=[Greeno]:String
    3:[email]:STRING=[wgreeno1@cornell.edu]:String
    4:[ip_address]:STRING=[39.67.151.237]:String
}
...
Mobile Analytics