How to convert XML to Excel

Data Pipeline makes it easy to read, transform, and write XML and Excel files. This post demonstrates how to load data from an on-disk XML file, apply transformations on-the-fly, and save the result to an Excel file.

I recommend you to take a look at the How to Read a JSON Stream post if you need to read data directly from a URL and skip the disk altogether. If you prefer to write your data to multiple Excel sheets, you can to read the post How to create multiple sheets in a single Excel file.

Input XML file

In this example we will process stock quotes available at Yahoo Finance. We saved this XML data on disk in a file named “yahoo_stock_quote_feed.xml”.

Java code

 

1. Define a reader from an XML file

XmlReader lets you obtain records from any XML stream.  The XML stream is broken into separate records by calling addRecordBreak(String locationPath) on your XmlReader instance.   Fields are then added to each record using the addField(String name, String locationPath) method. The locationPath expressions used in the above methods are a subset of the XPath 1.0 location paths notation. Below we give an example of location paths for selecting a node and an attribute:

  • "//quote/Company" – selects all Company nodes that are children of a quote node no matter where they are in the document;
  • "//quote/@symbol" – selects the symbol attribute in all quote nodes no matter where they are in a document.

2. Transform the input data on-the-fly (optional)

When converting data between formats, we often need to perform transformations. For example, we may read numbers as text from an XML file, but prefer to save them as a number in an Excel file. Or, we may need to adjust date and number formatting, and calculate aggregated values.

TransformingReader is a proxy that applies transformations to records being passed through it. On-the-fly record transformations are added to a TransformingReader using the method add(Transformer... transformers). Data Pipeline makes it easy to:

The following posts contain additional on-the-fly data-transformation examples: Lookup Data in any Data Reader, Search Twitter for Tweets, Rename a Field and Set a Field.

3. Define a writer to a Microsoft Excel file

ExcelWriter lets you write records to a Microsoft Excel document. It is used in combination with ExcelDocument, a class that encapsulates an Excel workbook in memory. Data Pipeline supports both the XLS and XSLX file format. By default, the ExcelDocument objects represent an XLSX file. You ca use ExcelWriter’s setStyleFormat(FieldType fieldType, String pattern) method to set the desired cell format per data type.

4. Transfer the data from the XML file to the Excel file and save the result

JobTemplate is a base type for classes that transfer records from a DataReader to a DataWriter. Data Pipeline provides a default JobTemplate implementation accessible at JobTemplate.DEFAULT. A call to the method JobTemplate.DEFAULT.transfer(DataReader reader, DataWriter writer) will do the data transfer.

Finally, a call to the ExcelDocument’s save(File excelFile) method is needed to write the generated workbook data to an Excel file on disk.

Output file

Excel output

All examples can be found in the examples/src folder of the download.

About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

2 thoughts on “How to convert XML to Excel

Leave a Reply

Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">