Read an XML File
Updated: Feb 21, 2022
XmlReader can be used to parse huge XML files (or input streams) into records. It uses a subset of XPath to assign field values and mark record breaks.
This example reads the following XML file and prints the resulting records to a logger. It can easily be modified to write to a database or other target.
<?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <book> <title lang="eng">Learning XML</title> <price>39.95</price> </book> </bookstore>
The code creates an XmlReader to parse bookstore.xml
and populate the title, language, and price fields using the specified xpath expressions. The addRecordBreak("//book")
call tells the reader to return a new record using whatever fields have been assigned whenever a book element ends.
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import org.apache.log4j.Logger; import com.northconcepts.datapipeline.core.DataEndpoint; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.Record; import com.northconcepts.datapipeline.xml.XmlReader; public class ReadAnXmlFile { public static final Logger log = DataEndpoint.log; public static void main(String[] args) { DataReader reader = new XmlReader(new File("example/data/input/bookstore.xml")) .addField("title", "//book/title/text()") .addField("language", "//book/title/@lang") .addField("price", "//book/price/text()") .addRecordBreak("//book"); reader.open(); try { Record record; while ((record = reader.read()) != null) { log.info(record); } } finally { reader.close(); } } }
Running this program produces output similar to the following.
19:18:38,492 INFO [main] datapipeline:38 - Record (MODIFIED) { 0:[title]:STRING=[Harry Potter]:String 1:[language]:STRING=[eng]:String 2:[price]:STRING=[29.99]:String } 19:18:38,498 INFO [main] datapipeline:38 - Record (MODIFIED) { 0:[title]:STRING=[Learning XML]:String 1:[language]:STRING=[eng]:String 2:[price]:STRING=[39.95]:String }