Read an XML File
Updated: Feb 21, 2022
XmlReader can be used to parse huge XML files (or input streams) into records. It uses a subset of XPath to assign field values and mark record breaks.
This example reads the following XML file and prints the resulting records to a logger. It can easily be modified to write to a database or other target.
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book>
<title lang="eng">Harry Potter</title>
<price>29.99</price>
</book>
<book>
<title lang="eng">Learning XML</title>
<price>39.95</price>
</book>
</bookstore>
The code creates an XmlReader to parse bookstore.xml and populate the title, language, and price fields using the specified xpath expressions. The addRecordBreak("//book") call tells the reader to return a new record using whatever fields have been assigned whenever a book element ends.
package com.northconcepts.datapipeline.examples.cookbook;
import java.io.File;
import org.apache.log4j.Logger;
import com.northconcepts.datapipeline.core.DataEndpoint;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.Record;
import com.northconcepts.datapipeline.xml.XmlReader;
public class ReadAnXmlFile {
public static final Logger log = DataEndpoint.log;
public static void main(String[] args) {
DataReader reader = new XmlReader(new File("example/data/input/bookstore.xml"))
.addField("title", "//book/title/text()")
.addField("language", "//book/title/@lang")
.addField("price", "//book/price/text()")
.addRecordBreak("//book");
reader.open();
try {
Record record;
while ((record = reader.read()) != null) {
log.info(record);
}
} finally {
reader.close();
}
}
}
Running this program produces output similar to the following.
19:18:38,492 INFO [main] datapipeline:38 - Record (MODIFIED) {
0:[title]:STRING=[Harry Potter]:String
1:[language]:STRING=[eng]:String
2:[price]:STRING=[29.99]:String
}
19:18:38,498 INFO [main] datapipeline:38 - Record (MODIFIED) {
0:[title]:STRING=[Learning XML]:String
1:[language]:STRING=[eng]:String
2:[price]:STRING=[39.95]:String
}
