Query Parquet Using JSONata

Updated: May 13, 2026

This example shows how to read an Parquet file (or any other supported format) and query or transform it using JSONata expressions with DataPipeline.

 

Java code listing

package com.northconcepts.datapipeline.examples.jsonata;

import com.northconcepts.datapipeline.core.RecordList;
import com.northconcepts.datapipeline.core.ValueNode;
import com.northconcepts.datapipeline.parquet.ParquetDataReader;

import java.io.File;

import static com.northconcepts.datapipeline.jsonata.Jsonata.jsonata;

public class QueryParquetUsingJSONata {

    public static void main(String[] args) throws Throwable {
        ValueNode data = readParquetAsValueNode("data/input/nested-data.parquet");

        printSection("Basic Selection");
        runQuery("Query 1 - Donut Names", "records.name", data);
        runQuery("Query 2 - Batter Types",
                "records.batterTypes[]",
                data);

        printSection("Filtering");
        runQuery("Query 3 - Donut Named Raised",
                "records[name='Raised']",
                data);
        runQuery("Query 4 - Donut with Chocolate with Sprinkles Topping",
                "records[$count(toppingTypes[$='Chocolate with Sprinkles']) > 0].name",
                data);

        printSection("Reshaping");
        runQuery("Query 5 - Name With Batter And Topping Counts",
                "records.{'name': name, 'batterCount': $count(batterIds), 'toppingCount': $count(toppingIds)}",
                data);
        runQuery("Query 6 - Donut Summary",
                "records.{'id': id, 'name': name, 'pricePerUnit': ppu}",
                data);

        printSection("Metrics");
        runQuery("Query 7 - Total Donuts", "$count(records)", data);
        runQuery("Query 8 - Average Price Per Unit", "$average(records.ppu.$number())", data);
    }

    private static ValueNode readParquetAsValueNode(String parquetFilePath) throws Throwable {
        RecordList rows = new RecordList(new ParquetDataReader(new File(parquetFilePath)));
        return rows.toRecord();
    }


    private static void runQuery(String title, String expression, ValueNode data) {
        System.out.println(title);
        System.out.println("Expression: " + expression);
        System.out.println("Result: ");
        System.out.println(jsonata(expression).evaluate(data));
    }

    private static void printSection(String title) {
        System.out.println("==============================================================");
        System.out.println(title);
    }
}


 

Code walkthrough

  1. Read Parquet into memory
    • readParquetAsValueNode(parquetFilePath) uses a ParquetDataReader to stream records from an input Parquet file and returns a ValueNode tree. All queries run against that tree.
  2. Execute JSONata expressions
    • runQuery prints the query name, the expression, and the evaluated result from jsonata(expression).evaluate(data)
  3. Query patterns used
    • Basic Selection: These expressions navigate fields and nested arrays.
      • records.name
      • records.batterTypes[]
    • Filtering: These expressions keep only records that match conditions.
      • records[name='Raised']
      • records[$count(toppingTypes[$='Chocolate with Sprinkles']) > 0].name
    • Reshaping: These expressions build new objects with only the fields you want.
      • records.{'name': name, 'batterCount': $count(batterIds), 'toppingCount': $count(toppingIds)}
      • records.{'id': id, 'name': name, 'pricePerUnit': ppu}
    • Metrics: These expressions compute aggregate values.
      • $count(records)
      • $average(records.ppu.$number())

 

Console Output

==============================================================
Basic Selection
Query 1 - Donut Names
Expression: records.name
Result: 
[Cake, Raised]
Query 2 - Batter Types
Expression: records.batterTypes[]
Result: 
[Regular, Chocolate, Blueberry, Devil's Food, Regular]
==============================================================
Filtering
Query 3 - Donut Named Raised
Expression: records[name='Raised']
Result: 

        Record (MODIFIED) (is child record) {
            0:[id]:STRING=[0002]:String
            1:[type]:STRING=[donut]:String
            2:[name]:STRING=[Raised]:String
            3:[ppu]:DOUBLE=[0.55]:Double
            4:[batterIds]:ARRAY of STRING=[[1001]]:ArrayValue
            5:[batterTypes]:ARRAY of STRING=[[Regular]]:ArrayValue
            6:[toppingIds]:ARRAY of STRING=[[5001, 5002, 5005, 5003, 5004]]:ArrayValue
            7:[toppingTypes]:ARRAY of STRING=[[None, Glazed, Sugar, Chocolate, Maple]]:ArrayValue
        }
Query 4 - Donut with Chocolate with Sprinkles Topping
Expression: records[$count(toppingTypes[$='Chocolate with Sprinkles']) > 0].name
Result: 
Cake
==============================================================
Reshaping
Query 5 - Name With Batter And Topping Counts
Expression: records.{'name': name, 'batterCount': $count(batterIds), 'toppingCount': $count(toppingIds)}
Result: 
[Record (MODIFIED) (is child record) {
    0:[name]:STRING=[Cake]:String
    1:[batterCount]:INT=[4]:Integer
    2:[toppingCount]:INT=[7]:Integer
}
, Record (MODIFIED) (is child record) {
    0:[name]:STRING=[Raised]:String
    1:[batterCount]:INT=[1]:Integer
    2:[toppingCount]:INT=[5]:Integer
}
]
Query 6 - Donut Summary
Expression: records.{'id': id, 'name': name, 'pricePerUnit': ppu}
Result: 
[Record (MODIFIED) (is child record) {
    0:[id]:STRING=[0001]:String
    1:[name]:STRING=[Cake]:String
    2:[pricePerUnit]:DOUBLE=[0.55]:Double
}
, Record (MODIFIED) (is child record) {
    0:[id]:STRING=[0002]:String
    1:[name]:STRING=[Raised]:String
    2:[pricePerUnit]:DOUBLE=[0.55]:Double
}
]
==============================================================
Metrics
Query 7 - Total Donuts
Expression: $count(records)
Result: 
2
Query 8 - Average Price Per Unit
Expression: $average(records.ppu.$number())
Result: 
0.55
Mobile Analytics