Read a PDF File
This example shows how to read data from a PDF file using DataPipeline.
In this demo code we are going to use PdfReader to read records from an input PDF file.
This example can easily be modified to show how to generate a PDF File using DataPipeline's PdfWriter.
Java Code listing
package com.northconcepts.datapipeline.examples.pdf; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.pdf.PdfDocument; import com.northconcepts.datapipeline.pdf.PdfReader; import java.io.File; public class ReadAPdfFile { public static void main(String[] args) { File file = new File("example/data/input/addresses.pdf"); PdfReader reader = new PdfReader(new PdfDocument(file)) .setFieldNamesInFirstRow(true); Job.run(reader, new StreamWriter(System.out)); } }
Code Walkthrough
- A PdfDocument is created to detect tables in the PDF file
addresses.pdf
. This class can also be instantiated with the password of a password-protected PDF file e.g.new PdfDocument(file, "password")
- The PdfDocument is then passed to a PdfReader to stream records from the detected tables.
- Data are transferred from PdfReader to the console via Job.run() method. See how to compile and run data pipeline jobs.
PdfReader and PdfDocument
PdfReader is an input reader which can be used to read a PDF file. The PdfReader.setFieldNamesInFirstRow(true)
method can be used to specify that the names specified in the first row of the input data should be used as field names. If this method is not invoked, the fields would be named as A, B, C, etc. If the document has more than 1 table, the PdfReader.setTableIndex(tableIndex)
method can be used to specify which table to read. PdfDocument is class that encapsulates a PDF document.
Console output
----------------------------------------------- 0 - Record (MODIFIED) { 0:[ID Number]:STRING=[1]:String 1:[Fist Name]:STRING=[John]:String 2:[Last Name]:STRING=[Doe]:String 3:[Address]:STRING=[120 jefferson st.]:String 4:[City]:STRING=[Riverside]:String 5:[Code]:STRING=[NJ]:String 6:[Size]:STRING=[8075]:String } ----------------------------------------------- 1 - Record (MODIFIED) { 0:[ID Number]:STRING=[2]:String 1:[Fist Name]:STRING=[Jack]:String 2:[Last Name]:STRING=[McGinnis]:String 3:[Address]:STRING=[220 hobo Av.]:String 4:[City]:STRING=[Phila]:String 5:[Code]:STRING=[PA]:String 6:[Size]:STRING=[9119]:String } ----------------------------------------------- 2 - Record (MODIFIED) { 0:[ID Number]:STRING=[3]:String 1:[Fist Name]:STRING=[John "Da Man"]:String 2:[Last Name]:STRING=[Repici]:String 3:[Address]:STRING=[120 Jefferson St.]:String 4:[City]:STRING=[Riverside]:String 5:[Code]:STRING=[NJ]:String 6:[Size]:STRING=[8075]:String } ----------------------------------------------- 3 - Record (MODIFIED) { 0:[ID Number]:STRING=[4]:String 1:[Fist Name]:STRING=[Stephen]:String 2:[Last Name]:STRING=[Tyler]:String 3:[Address]:STRING=[7452 Terrace "At the Plaza" road]:String 4:[City]:STRING=[SomeTown]:String 5:[Code]:STRING=[SD]:String 6:[Size]:STRING=[91234]:String } ----------------------------------------------- 4 - Record (MODIFIED) { 0:[ID Number]:STRING=[5]:String 1:[Fist Name]:STRING=[]:String 2:[Last Name]:STRING=[Blankman]:String 3:[Address]:STRING=[]:String 4:[City]:STRING=[SomeTown]:String 5:[Code]:STRING=[SD]:String 6:[Size]:STRING=[298]:String } ----------------------------------------------- 5 - Record (MODIFIED) { 0:[ID Number]:STRING=[6]:String 1:[Fist Name]:STRING=[Joan "the bone" Anne]:String 2:[Last Name]:STRING=[Jet]:String 3:[Address]:STRING=[9th, at Terrace plc]:String 4:[City]:STRING=[Desert City]:String 5:[Code]:STRING=[CO]:String 6:[Size]:STRING=[123]:String } ----------------------------------------------- 6 - Record (MODIFIED) { 0:[ID Number]:STRING=[7]:String 1:[Fist Name]:STRING=[John]:String 2:[Last Name]:STRING=[Doe]:String 3:[Address]:STRING=[120 jefferson st.]:String 4:[City]:STRING=[Riverside]:String 5:[Code]:STRING=[NJ]:String 6:[Size]:STRING=[8075]:String } ----------------------------------------------- 7 - Record (MODIFIED) { 0:[ID Number]:STRING=[8]:String 1:[Fist Name]:STRING=[Jack]:String 2:[Last Name]:STRING=[McGinnis]:String 3:[Address]:STRING=[220 hobo Av.]:String 4:[City]:STRING=[Phila]:String 5:[Code]:STRING=[PA]:String 6:[Size]:STRING=[9119]:String } ----------------------------------------------- 8 - Record (MODIFIED) { 0:[ID Number]:STRING=[9]:String 1:[Fist Name]:STRING=[John "Da Man"]:String 2:[Last Name]:STRING=[Repici]:String 3:[Address]:STRING=[120 Jefferson St.]:String 4:[City]:STRING=[Riverside]:String 5:[Code]:STRING=[NJ]:String 6:[Size]:STRING=[8075]:String } ----------------------------------------------- 9 - Record (MODIFIED) { 0:[ID Number]:STRING=[10]:String 1:[Fist Name]:STRING=[Stephen]:String 2:[Last Name]:STRING=[Tyler]:String 3:[Address]:STRING=[7452 Terrace "At the Plaza" road]:String 4:[City]:STRING=[SomeTown]:String 5:[Code]:STRING=[SD]:String 6:[Size]:STRING=[91234]:String } ----------------------------------------------- 10 - Record (MODIFIED) { 0:[ID Number]:STRING=[11]:String 1:[Fist Name]:STRING=[]:String 2:[Last Name]:STRING=[Blankman]:String 3:[Address]:STRING=[]:String 4:[City]:STRING=[SomeTown]:String 5:[Code]:STRING=[SD]:String 6:[Size]:STRING=[298]:String } ----------------------------------------------- 11 - Record (MODIFIED) { 0:[ID Number]:STRING=[12]:String 1:[Fist Name]:STRING=[Joan "the bone" Anne]:String 2:[Last Name]:STRING=[Jet]:String 3:[Address]:STRING=[9th, at Terrace plc]:String 4:[City]:STRING=[Desert City]:String 5:[Code]:STRING=[CO]:String 6:[Size]:STRING=[123]:String } ----------------------------------------------- 12 - Record (MODIFIED) { 0:[ID Number]:STRING=[13]:String 1:[Fist Name]:STRING=[John]:String 2:[Last Name]:STRING=[Doe]:String 3:[Address]:STRING=[120 jefferson st.]:String 4:[City]:STRING=[Riverside]:String 5:[Code]:STRING=[NJ]:String 6:[Size]:STRING=[8075]:String } ----------------------------------------------- 13 - Record (MODIFIED) { 0:[ID Number]:STRING=[14]:String 1:[Fist Name]:STRING=[Jack]:String 2:[Last Name]:STRING=[McGinnis]:String 3:[Address]:STRING=[220 hobo Av.]:String 4:[City]:STRING=[Phila]:String 5:[Code]:STRING=[PA]:String 6:[Size]:STRING=[9119]:String } ----------------------------------------------- 14 - Record (MODIFIED) { 0:[ID Number]:STRING=[15]:String 1:[Fist Name]:STRING=[John "Da Man"]:String 2:[Last Name]:STRING=[Repici]:String 3:[Address]:STRING=[120 Jefferson St.]:String 4:[City]:STRING=[Riverside]:String 5:[Code]:STRING=[NJ]:String 6:[Size]:STRING=[8075]:String } ----------------------------------------------- 15 - Record (MODIFIED) { 0:[ID Number]:STRING=[16]:String 1:[Fist Name]:STRING=[Stephen]:String 2:[Last Name]:STRING=[Tyler]:String 3:[Address]:STRING=[7452 Terrace "At the Plaza" road]:String 4:[City]:STRING=[SomeTown]:String 5:[Code]:STRING=[SD]:String 6:[Size]:STRING=[91234]:String } ----------------------------------------------- 16 - Record (MODIFIED) { 0:[ID Number]:STRING=[17]:String 1:[Fist Name]:STRING=[]:String 2:[Last Name]:STRING=[Blankman]:String 3:[Address]:STRING=[]:String 4:[City]:STRING=[SomeTown]:String 5:[Code]:STRING=[SD]:String 6:[Size]:STRING=[298]:String } ----------------------------------------------- 17 - Record (MODIFIED) { 0:[ID Number]:STRING=[18]:String 1:[Fist Name]:STRING=[Joan "the bone" Anne]:String 2:[Last Name]:STRING=[Jet]:String 3:[Address]:STRING=[9th, at Terrace plc]:String 4:[City]:STRING=[Desert City]:String 5:[Code]:STRING=[CO]:String 6:[Size]:STRING=[123]:String } ----------------------------------------------- 18 - Record (MODIFIED) { 0:[ID Number]:STRING=[1]:String 1:[Fist Name]:STRING=[John]:String 2:[Last Name]:STRING=[Doe]:String 3:[Address]:STRING=[120 jefferson st.]:String 4:[City]:STRING=[Riverside]:String 5:[Code]:STRING=[NJ]:String 6:[Size]:STRING=[8075]:String } ----------------------------------------------- 19 - Record (MODIFIED) { 0:[ID Number]:STRING=[2]:String 1:[Fist Name]:STRING=[Jack]:String 2:[Last Name]:STRING=[McGinnis]:String 3:[Address]:STRING=[220 hobo Av.]:String 4:[City]:STRING=[Phila]:String 5:[Code]:STRING=[PA]:String 6:[Size]:STRING=[9119]:String } ----------------------------------------------- 20 - Record (MODIFIED) { 0:[ID Number]:STRING=[3]:String 1:[Fist Name]:STRING=[John "Da Man"]:String 2:[Last Name]:STRING=[Repici]:String 3:[Address]:STRING=[120 Jefferson St.]:String 4:[City]:STRING=[Riverside]:String 5:[Code]:STRING=[NJ]:String 6:[Size]:STRING=[8075]:String } ----------------------------------------------- 21 - Record (MODIFIED) { 0:[ID Number]:STRING=[4]:String 1:[Fist Name]:STRING=[Stephen]:String 2:[Last Name]:STRING=[Tyler]:String 3:[Address]:STRING=[7452 Terrace "At the Plaza" road]:String 4:[City]:STRING=[SomeTown]:String 5:[Code]:STRING=[SD]:String 6:[Size]:STRING=[91234]:String } ----------------------------------------------- 22 - Record (MODIFIED) { 0:[ID Number]:STRING=[5]:String 1:[Fist Name]:STRING=[]:String 2:[Last Name]:STRING=[Blankman]:String 3:[Address]:STRING=[]:String 4:[City]:STRING=[SomeTown]:String 5:[Code]:STRING=[SD]:String 6:[Size]:STRING=[298]:String } ----------------------------------------------- 23 - Record (MODIFIED) { 0:[ID Number]:STRING=[6]:String 1:[Fist Name]:STRING=[Joan "the bone" Anne]:String 2:[Last Name]:STRING=[Jet]:String 3:[Address]:STRING=[9th, at Terrace plc]:String 4:[City]:STRING=[Desert City]:String 5:[Code]:STRING=[CO]:String 6:[Size]:STRING=[123]:String } ----------------------------------------------- 24 - Record (MODIFIED) { 0:[ID Number]:STRING=[7]:String 1:[Fist Name]:STRING=[John]:String 2:[Last Name]:STRING=[Doe]:String 3:[Address]:STRING=[120 jefferson st.]:String 4:[City]:STRING=[Riverside]:String 5:[Code]:STRING=[NJ]:String 6:[Size]:STRING=[8075]:String } ----------------------------------------------- 25 - Record (MODIFIED) { 0:[ID Number]:STRING=[8]:String 1:[Fist Name]:STRING=[Jack]:String 2:[Last Name]:STRING=[McGinnis]:String 3:[Address]:STRING=[220 hobo Av.]:String 4:[City]:STRING=[Phila]:String 5:[Code]:STRING=[PA]:String 6:[Size]:STRING=[9119]:String } ----------------------------------------------- 26 - Record (MODIFIED) { 0:[ID Number]:STRING=[9]:String 1:[Fist Name]:STRING=[John "Da Man"]:String 2:[Last Name]:STRING=[Repici]:String 3:[Address]:STRING=[120 Jefferson St.]:String 4:[City]:STRING=[Riverside]:String 5:[Code]:STRING=[NJ]:String 6:[Size]:STRING=[8075]:String } ----------------------------------------------- 27 records