How to Query Java Objects with XPath

How to Query Java Objects with XPathData Pipeline’s query engine allows you to use XPath to query XML, JSON, and Java objects.  This walkthrough will show you how to query Java objects using XPath and save the results to a CSV file.  While the reading and writing will be done with the JavaBeanReader and CSVWriter classes, you can swap out the CSVWriter for any other endpoint or transformation that Data Pipeline supports.

Java object model

The object model used as input is an ArrayList of Java beans called Signal.  The Signal class has a variety of private fields, along with their corresponding getter and setter methods.

The XPath engine doesn’t require any change to read your classes  — no interfaces to implement and no special annotations.  The only restriction is that your model needs to be some combination of:

All other types, including strings, dates, primitives, boxed types, etc., are treated as single-valued types (leaves, instead of branch nodes).

Resulting CSV file

The transfer job will create a CSV file named output.csv with the following contents.

 

Data Pipeline code

Now once you have your object model and decided on the output format (CSV in this case), it’s time to write the Data Pipeline job code.

There are basically only three parts to the above code.

1. Create your JavaBeanReader

Instantiate a new JavaBeanReader with your arbitrary document/root name and your object model.

Think of the "messages" param as the root node in an XML document.  It can be used as part of your XPath query if needed: /messages//source.

The second param — your input object model —  can be any of the types previously mentioned: array, collection, Java bean, etc.

2. Specify your fields and record breaks

Use the addField and addRecordBreak methods to tell the JavaBeanReader how to create records from your object model.

The addField method takes the name of the new, target field and an XPath 1.0 location path to identify each field of the record.

The addRecordBreak method takes another XPath 1.0 location path to identify each record’s boundary.  The reader returns a new record whenever this pattern is matched.  The returned record contains all fields (from addField matches) that have been captured up to that point.

3. Run the job

Create the desired target DataWriter and run the reader-to-writer transfer.

At this point, you can replace the CSVWriter with another DataWriter to produce a different output format.  Here are several other examples of endpoints you can use:

 XPath queries

The XPath 1.0 location paths used in the addField and addRecordBreak methods are a subset of the full spec.

This limitation is primarily due to Data Pipeline being built to stream data.  To ensure the XML and JSON parsers run with low memory overhead, only forward matching, XPath expression are supported.  You can see the list of supported expression on JavaBeanReader’s Javadoc.

Download Data Pipeline

The Data Pipeline library, including the online examples, are available for immediate download.  Once you have it, see the getting started guide to start running the examples.

 

About Dele Taylor

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

One thought on “How to Query Java Objects with XPath

Leave a Reply

Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">