Read an XML File

Updated: Feb 21, 2022

XmlReader can be used to parse huge XML files (or input streams) into records. It uses a subset of XPath to assign field values and mark record breaks.

This example reads the following XML file and prints the resulting records to a logger. It can easily be modified to write to a database or other target.

<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
  <book>
    <title lang="eng">Harry Potter</title>
    <price>29.99</price>
  </book>
  <book>
    <title lang="eng">Learning XML</title>
    <price>39.95</price>
  </book>
</bookstore>

The code creates an XmlReader to parse bookstore.xml and populate the title, language, and price fields using the specified xpath expressions. The addRecordBreak("//book") call tells the reader to return a new record using whatever fields have been assigned whenever a book element ends.

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import org.apache.log4j.Logger;

import com.northconcepts.datapipeline.core.DataEndpoint;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.Record;
import com.northconcepts.datapipeline.xml.XmlReader;

public class ReadAnXmlFile {
    
    public static final Logger log = DataEndpoint.log; 

    public static void main(String[] args) {
        DataReader reader = new XmlReader(new File("example/data/input/bookstore.xml"))
        	.addField("title", "//book/title/text()")
        	.addField("language", "//book/title/@lang")
        	.addField("price", "//book/price/text()")
        	.addRecordBreak("//book");

        reader.open();
        try {
            Record record;
            while ((record = reader.read()) != null) {
                log.info(record);
            }
        } finally {
            reader.close();
        }
    }

}

Running this program produces output similar to the following.

19:18:38,492  INFO [main] datapipeline:38 - Record (MODIFIED) {
    0:[title]:STRING=[Harry Potter]:String
    1:[language]:STRING=[eng]:String
    2:[price]:STRING=[29.99]:String
}

19:18:38,498  INFO [main] datapipeline:38 - Record (MODIFIED) {
    0:[title]:STRING=[Learning XML]:String
    1:[language]:STRING=[eng]:String
    2:[price]:STRING=[39.95]:String
}
Mobile Analytics