Read an XML File (2)

This example shows you how to read an XML file in Java using the XmlReader class.

This example also shows how the input data can be modified on the fly using the TransformingReader and BasicFieldTransformer classes before being saved to a database.

For demo purpose, this example reads an XML file and writes its contents to a database table. However, the input XML data can also be written to other output sources.

There are other examples which show how to write to an XML file or write to an XML file programmatically or write to an XML file using freemarker templates.

Input XML file

<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
  <book>
    <title lang="eng">Harry Potter</title>
    <price>29.99</price>
  </book>
  <book>
    <title lang="eng">Learning XML</title>
    <price>39.95</price>
  </book>
</bookstore>

Java code listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;
import java.sql.Connection;

import org.apache.log4j.Logger;

import com.northconcepts.datapipeline.core.DataEndpoint;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.jdbc.JdbcWriter;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.transform.BasicFieldTransformer;
import com.northconcepts.datapipeline.transform.TransformingReader;
import com.northconcepts.datapipeline.xml.XmlReader;

public class ReadAnXmlFile2 {
    
    public static final Logger log = DataEndpoint.log;
    private static Connection connection = null; 

    public static void main(String[] args) {
        String inputFile = "example/data/input/bookstore.xml";
        String targetTable = "book";
        
        DataReader reader = new XmlReader(new File(inputFile))
        	.addField("title", "//book/title/text()")
        	.addField("language", "//book/title/@lang")
        	.addField("price", "//book/price/text()")
        	.addRecordBreak("//book");
        
        reader = new TransformingReader(reader)
            .add(new BasicFieldTransformer("price")
                .nullToValue("0")
                .replaceString("$", "")
                .trim()
                .stringToDouble());

        DataWriter writer = new  JdbcWriter(connection , targetTable)
            .setAutoCloseConnection(true)
            .setBatchSize(100);
   
        Job.run(reader, writer);
    }

}

Code walkthrough

  1. First an XmlReader is created corresponding to the input file i.e. bookstore.xml.
  2. The title, language, and price fields are populated via the XMLReader.addField
  3. The addRecordBreak("//book") is invoked to return a new record whenever a book element ends.
  4. Transformations are applied to the price field in the input via TransformingReader.
  5. A JdbcWriter is created to write the output to the database.
  6. Data is transferred from the XML file to the database via JobTemplate.DEFAULT.transfer.

XmlReader

XmlReader is an input reader that can be used to read XML files. The main method in this class is the XmlReader.addField method which is used to add the fields to be read from the XML stream. This method uses the field name and a subset of the XPath 1.0 location paths notation to identify field values. You can selectively add whichever fields you wish to be read from the input XML stream. Another important method is the addRecordBreak method, which tells the reader to return a new record using whatever fields have been assigned. This method is basically used to demarcate records.

Input Data Transformation

A TransformingReader is used to modify the input data. In this example, it is created as a wrapper around the XmlReader and modifies the data in the input XML file. However, it can be created on top of other readers and can be used to modify other types of input data as well.A TransformingReader basically modifies the input reader provided using the Transformer objects added to it via the add method.

Transformer

Transformer is an abstract class and does not define much functionality. It has an abstract method called transform which is invoked by the TransformingReader. FieldTransformer is a sub-class of Transformer and overrides the transform method. FieldTransformer has a method called transformField, which is overridden by the BasicFieldTransformer.

BasicFieldTransformer

BasicFieldTransformer which is used in this code is a sub-class of FieldTransformer. It has several methods like stringToDouble, nullToValue, etc. These specialized methods adds a new Operation to its collection of operations, then returns the "this" pointer (itself).

In order to perform the data transformation, the TransformingReader invokes the Transformer.transform method. This in turn invokes the transformField method. Thus, these collection of operations are run when the transform->transformField methods are called by the TransformingReader.

Data conversions

The following data conversions are performed in this code:

  • nullToValue - This specifies the value to which the field value should be converted in case a null is encountered, so in this case if price field is absent or has a null value, it will be assigned 0.
  • replaceString - This operation basically replaces the specified string with another string, so in this case if the price string has a "$" symbol, this will be replaced by an empty string
  • trim - This operation trims whitespaces in the field if any.
  • stringToDouble - This operation converts String value to Double value, so in this case price field is set as a "Double" in the corresponding records

Database output

The output of this program is data inserted into the book database table.

Mobile Analytics