Read an XML File

Updated: Feb 21, 2022

XmlReader can be used to parse huge XML files (or input streams) into records. It uses a subset of XPath to assign field values and mark record breaks.

This example reads the following XML file and prints the resulting records to a logger. It can easily be modified to write to a database or other target.

<?xml version="1.0" encoding="ISO-8859-1"?>
    <title lang="eng">Harry Potter</title>
    <title lang="eng">Learning XML</title>

The code creates an XmlReader to parse bookstore.xml and populate the title, language, and price fields using the specified xpath expressions. The addRecordBreak("//book") call tells the reader to return a new record using whatever fields have been assigned whenever a book element ends.

package com.northconcepts.datapipeline.examples.cookbook;


import org.apache.log4j.Logger;

import com.northconcepts.datapipeline.core.DataEndpoint;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.Record;
import com.northconcepts.datapipeline.xml.XmlReader;

public class ReadAnXmlFile {
    public static final Logger log = DataEndpoint.log; 

    public static void main(String[] args) {
        DataReader reader = new XmlReader(new File("example/data/input/bookstore.xml"))
        	.addField("title", "//book/title/text()")
        	.addField("language", "//book/title/@lang")
        	.addField("price", "//book/price/text()")
        try {
            Record record;
            while ((record = != null) {
        } finally {


Running this program produces output similar to the following.

19:18:38,492  INFO [main] datapipeline:38 - Record (MODIFIED) {
    0:[title]:STRING=[Harry Potter]:String

19:18:38,498  INFO [main] datapipeline:38 - Record (MODIFIED) {
    0:[title]:STRING=[Learning XML]:String
Mobile Analytics