How to convert XML to Excel (2023)

Data Pipeline makes it easy to read, transform, and write XML and Excel files. This post shows you how you too can load data from an on-disk XML file, apply transformations on the fly, and save the result to an Excel file.

If you need to read data directly from a URL and skip the disk altogether, take a look at the How to Read a JSON Stream example. If you prefer to write your data to multiple Excel sheets, you can read the post How to create multiple sheets in a single Excel file.

Input XML file

For this example, we will process stock quotes available at Yahoo Finance. We saved this XML data on disk in a file named “yahoo_stock_quote_feed.xml”.

Java code

 

1. Define a reader from an XML file

XmlReader lets you get records from an XML stream.  The XML stream is broken into separate records by calling addRecordBreak(String locationPath) on your XmlReader instance.   Fields are then added to each record using the addField(String name, String locationPath) method. The locationPath expressions used in the above methods are a subset of the XPath 1.0 location paths notation. Below we give an example of location paths for selecting a node and an attribute:

  • "//quote/Company" – selects all Company nodes that are children of a quote node no matter where they are in the document;
  • "//quote/@symbol" – selects the symbol attribute in all quote nodes no matter where they are in a document.

2. Transform the input data on the fly (optional)

When converting data between formats, we often need to perform transformations. For example, we may read numbers as text from an XML file, but prefer to save them as a number in an Excel file. Or, we may need to adjust the date and number formatting, and calculate aggregated values.

TransformingReader is a proxy that applies transformations to records being passed through it. Record transformations are then added to a TransformingReader using the method add(Transformer... transformers). Data Pipeline makes it easy to:

The following posts contain additional on-the-fly data-transformation examples: Lookup Data in any Data Reader, Search Twitter for Tweets, Rename a Field and Set a Field.

3. Define a writer to a Microsoft Excel file

ExcelWriter lets you write records to a Microsoft Excel document. We use it in combination with ExcelDocument, a class that encapsulates an Excel workbook in memory. Data Pipeline supports both the XLS and XSLX file format. By default, the ExcelDocument objects represent an XLSX file. You can use ExcelWriter’s setStyleFormat(FieldType fieldType, String pattern) method to set the desired cell format per data type.

4. Transfer the data from the XML file to the Excel file and save the result

JobTemplate is a base type for classes that transfer records from a DataReader to a DataWriter. Data Pipeline provides a default JobTemplate implementation which you can access at JobTemplate.DEFAULT. A call to the method JobTemplate.DEFAULT.transfer(DataReader reader, DataWriter writer) is responsible for doing the data transfer.

Finally, a call to the ExcelDocument’s save(File excelFile) method is needed to write the generated workbook data to an Excel file on disk.

Output file

Excel output

All examples can be found in the examples/src folder of the download.

About The DataPipeline Team

We make Data Pipeline — a lightweight ETL framework for Java. Use it to filter, transform, and aggregate data on-the-fly in your web, mobile, and desktop apps. Learn more about it at northconcepts.com.

2 thoughts on “How to convert XML to Excel (2023)

Leave a Reply

Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">