Programatically Filling Out PDFs in Java

I just got done dealing with an interesting problem. I had one of those PDFs that’s a form you can fill out and was tasked with programatically filling it out. So, I busted out my google-fu and came across PDFBox. It’s a handy and fairly no-nonsense little utility not just for filling out forms, but for manipulating PDFs in general. I had no idea something like this existed (mainly because I’d never really thought about it).

I downloaded the jar for PDFBox and wrote a simple class to test out my theory. In setting up the class and poking around randomly in the documentation, I saw that the main object of interest was a PDDocument. So, I set about instantiating one and discovered that you needed to use something called a COSDocument, which took something called a RandomAccess (not the standard version of the file, but a special version from PDFBox), and then my eyes started to cross and I pulled back and discovered that this is really what I want:

Much easier. Now, as I got down to business of trying this out, I discovered via runtime exception that I needed two external dependencies. One was apache commons logging and the other was something called fontbox that was right there along with the PDFBox download, but I ignored in the beginning. Probably with this code alone you wouldn’t necessarily hit both of those problems, but you will eventually, so better to add those jars right up front.

So far I was able successfully to open a PDF and save it as another file, which isn’t exactly a new capability for any programming language with file I/O, so I added something a little more concrete to the mix:

Lo and behold, liftoff. I actually got the right number of pages in the document. Now I was getting somewhere. Time to get down to the real business. I did this by going to the “Cookbook” section of the project and seeing what was under form generation. Seeing that this just took me to the javadoc for examples, I went and downloaded the example code and pasted it into my project (modifying it to conform to the Egyptian-style braces. In this fashion, I had a method that would print out all of the fields in the PDF as well as a method that would let me set fields by name. When I ran the one that printed out all of the fields, I got a runtime exception about some deprecated method and I discovered that in the source code for that method, it just threw an exception. Presumably, the written examples predated some change that had deprecated that method — deprecated it with extreme prejudice!

Well, I’d like to say that I fought the good fight, but I didn’t. I just deleted the offending call since it was just writing to console. So here is the end result of that effort:

Going forward, I’ll certainly factor this into a new class and probably extract some methods and improve warning avoidance, but that’s the gist of it. This didn’t exactly take a long time, but it probably could have gone quicker if I’d known a little more up-front and had all example code in one place. Hopefully it helps you in that capacity.

  • Timothy Boyce

    If anyone is looking for a PDF library like this for .NET that is well supported, I recommend PDFKit.NET.

  • http://www.daedtech.com/blog Erik Dietrich

    Thanks for the recommendation — that’s handy to know.

  • Joshua

    Im getting a NullPointerException when trying to use the .setValue….
    SEVERE: java.lang.NullPointerException
    at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:558)
    at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:303)
    at org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131)
    at org.apache.jsp.index_jsp._jspService(index_jsp.java:91)
    at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:111)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
    ….

  • LostKatana

    Have you checked if the field you tried to set the value is null?
    If not add the check…

    if (field != null) {

    field.setValue(value);

    } else {

    System.err.println(“No field found with name:” + name);

    }

  • frrug

    Thanks Erick!!This help me very match!!

  • http://www.daedtech.com/blog Erik Dietrich

    Glad if it helped! It’s always nice to hear that posting my solution to something I was working on helps someone else.

  • cedric25

    Thanks a lot for sharing that! Very useful as I just started using PDFBox with forms.

  • http://www.daedtech.com/blog Erik Dietrich

    Glad to hear that people are still finding this useful, and thanks for the feedback. Always nice to hear when a post helped someone.

  • PN

    in printFields, I get an error at line: ” Iterator fieldsIter = fields.iterator();”

    java.lang.ClassCastException: org.apache.pdfbox.pdmodel.common.COSArrayList incompatible with com.sun.xml.internal.bind.v2.schemagen.xmlschema.List

    do you know what is causing this? the quick fix tells me to add Cast to fields, however that does not fix the error

  • PN

    I resolved the cast issue, by modifying my imports, I am now using:

    import java.io.IOException;
    import java.util.Iterator;
    import java.util.List;

    import org.apache.pdfbox.exceptions.COSVisitorException;
    import org.apache.pdfbox.pdmodel.common.COSArrayList;
    import org.apache.pdfbox.pdmodel.PDDocument;
    import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
    importorg.apache.pdfbox.pdmodel.common.COSObjectable;
    importorg.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
    import org.apache.pdfbox.pdmodel.interactive.form.PDField;