DaedTech

Stories about Software

By

Programatically Filling Out PDFs in Java

I just got done dealing with an interesting problem. I had one of those PDFs that’s a form you can fill out and was tasked with programatically filling it out. So, I busted out my google-fu and came across PDFBox. It’s a handy and fairly no-nonsense little utility not just for filling out forms, but for manipulating PDFs in general. I had no idea something like this existed (mainly because I’d never really thought about it).

I downloaded the jar for PDFBox and wrote a simple class to test out my theory. In setting up the class and poking around randomly in the documentation, I saw that the main object of interest was a PDDocument. So, I set about instantiating one and discovered that you needed to use something called a COSDocument, which took something called a RandomAccess (not the standard version of the file, but a special version from PDFBox), and then my eyes started to cross and I pulled back and discovered that this is really what I want:

private static PDDocument _pdfDocument;

private static void populateAndCopy(String originalPdf, String targetPdf) throws IOException, COSVisitorException {
	_pdfDocument = PDDocument.load(originalPdf);
	
	_pdfDocument.save(targetPdf);
	_pdfDocument.close();
}

Much easier. Now, as I got down to business of trying this out, I discovered via runtime exception that I needed two external dependencies. One was apache commons logging and the other was something called fontbox that was right there along with the PDFBox download, but I ignored in the beginning. Probably with this code alone you wouldn’t necessarily hit both of those problems, but you will eventually, so better to add those jars right up front.

So far I was able successfully to open a PDF and save it as another file, which isn’t exactly a new capability for any programming language with file I/O, so I added something a little more concrete to the mix:

private static void populateAndCopy(String originalPdf, String targetPdf) throws IOException, COSVisitorException {
	_pdfDocument = PDDocument.load(originalPdf);
	
	_pdfDocument.getNumberOfPages();
	
	setField("SomeFieldName", "SomeFieldValue");
	_pdfDocument.save(targetPdf);
	_pdfDocument.close();
}

Lo and behold, liftoff. I actually got the right number of pages in the document. Now I was getting somewhere. Time to get down to the real business. I did this by going to the “Cookbook” section of the project and seeing what was under form generation. Seeing that this just took me to the javadoc for examples, I went and downloaded the example code and pasted it into my project (modifying it to conform to the Egyptian-style braces. In this fashion, I had a method that would print out all of the fields in the PDF as well as a method that would let me set fields by name. When I ran the one that printed out all of the fields, I got a runtime exception about some deprecated method and I discovered that in the source code for that method, it just threw an exception. Presumably, the written examples predated some change that had deprecated that method — deprecated it with extreme prejudice!

Well, I’d like to say that I fought the good fight, but I didn’t. I just deleted the offending call since it was just writing to console. So here is the end result of that effort:

public class Populater {

	private static PDDocument _pdfDocument;
	
	public static void main(String[] args) {
		
		String originalPdf = "C:\\blah\\blah\\input.PDF";
		String targetPdf = "C:\\blah\\blah\\input.PDF";
		
		try {
			populateAndCopy(originalPdf, targetPdf);
		} catch (IOException | COSVisitorException e) {
			e.printStackTrace();
		}
		
		System.out.println("Complete");
	}

	private static void populateAndCopy(String originalPdf, String targetPdf) throws IOException, COSVisitorException {
		_pdfDocument = PDDocument.load(originalPdf);
		
		_pdfDocument.getNumberOfPages();
		//printFields();  //Uncomment to see the fields in this document in console
		
		setField("SomeFieldName", "SomeFieldValue");
		_pdfDocument.save(targetPdf);
		_pdfDocument.close();
	}
	
    public static void setField(String name, String value ) throws IOException {
        PDDocumentCatalog docCatalog = _pdfDocument.getDocumentCatalog();
        PDAcroForm acroForm = docCatalog.getAcroForm();
        PDField field = acroForm.getField( name );
        if( field != null ) {
            field.setValue(value);
        }
        else {
            System.err.println( "No field found with name:" + name );
        }
    }

    @SuppressWarnings("rawtypes")
	public static void printFields() throws IOException {
        PDDocumentCatalog docCatalog = _pdfDocument.getDocumentCatalog();
        PDAcroForm acroForm = docCatalog.getAcroForm();
        List fields = acroForm.getFields();
        Iterator fieldsIter = fields.iterator();

        System.out.println(new Integer(fields.size()).toString() + " top-level fields were found on the form");

        while( fieldsIter.hasNext()) {
            PDField field = (PDField)fieldsIter.next();
               processField(field, "|--", field.getPartialName());
        }
    }
    
    @SuppressWarnings("rawtypes")
	private static void processField(PDField field, String sLevel, String sParent) throws IOException
    {
        List kids = field.getKids();
        if(kids != null) {
            Iterator kidsIter = kids.iterator();
            if(!sParent.equals(field.getPartialName())) {
               sParent = sParent + "." + field.getPartialName();
            }
            
            System.out.println(sLevel + sParent);
            
            while(kidsIter.hasNext()) {
               Object pdfObj = kidsIter.next();
               if(pdfObj instanceof PDField) {
                   PDField kid = (PDField)pdfObj;
                   processField(kid, "|  " + sLevel, sParent);
               }
            }
         }
         else {
             String outputString = sLevel + sParent + "." + field.getPartialName() + ",  type=" + field.getClass().getName();
             System.out.println(outputString);
         }
    }
}

Going forward, I’ll certainly factor this into a new class and probably extract some methods and improve warning avoidance, but that’s the gist of it. This didn’t exactly take a long time, but it probably could have gone quicker if I’d known a little more up-front and had all example code in one place. Hopefully it helps you in that capacity.

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.
28 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Timothy Boyce
Timothy Boyce
11 years ago

If anyone is looking for a PDF library like this for .NET that is well supported, I recommend PDFKit.NET.

Erik Dietrich
11 years ago
Reply to  Timothy Boyce

Thanks for the recommendation — that’s handy to know.

Joshua
Joshua
10 years ago

Im getting a NullPointerException when trying to use the .setValue….
SEVERE: java.lang.NullPointerException
at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.calculateFontSize(PDAppearance.java:558)
at org.apache.pdfbox.pdmodel.interactive.form.PDAppearance.setAppearanceValue(PDAppearance.java:303)
at org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.setValue(PDVariableText.java:131)
at org.apache.jsp.index_jsp._jspService(index_jsp.java:91)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:111)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
….

LostKatana
LostKatana
10 years ago
Reply to  Joshua

Have you checked if the field you tried to set the value is null?
If not add the check…

if (field != null) {

field.setValue(value);

} else {

System.err.println(“No field found with name:” + name);

}

frrug
frrug
10 years ago

Thanks Erick!!This help me very match!!

Erik Dietrich
10 years ago
Reply to  frrug

Glad if it helped! It’s always nice to hear that posting my solution to something I was working on helps someone else.

cedric25
cedric25
9 years ago

Thanks a lot for sharing that! Very useful as I just started using PDFBox with forms.

Erik Dietrich
9 years ago
Reply to  cedric25

Glad to hear that people are still finding this useful, and thanks for the feedback. Always nice to hear when a post helped someone.

PN
PN
9 years ago

in printFields, I get an error at line: ” Iterator fieldsIter = fields.iterator();”

java.lang.ClassCastException: org.apache.pdfbox.pdmodel.common.COSArrayList incompatible with com.sun.xml.internal.bind.v2.schemagen.xmlschema.List

do you know what is causing this? the quick fix tells me to add Cast to fields, however that does not fix the error

PN
PN
9 years ago
Reply to  PN

I resolved the cast issue, by modifying my imports, I am now using:

import java.io.IOException;
import java.util.Iterator;
import java.util.List;

import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.common.COSArrayList;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
importorg.apache.pdfbox.pdmodel.common.COSObjectable;
importorg.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;

Ben
Ben
9 years ago

Excellent. thank you.

kumar
kumar
9 years ago

Erik, thanks for this article. Because of your article, I did not waste any time in experimenting and could use your code.

Erik Dietrich
9 years ago
Reply to  kumar

Excellent — glad it helped!

PN
PN
9 years ago

Thank you for the example code, I have some fields that are not getting updated… I have a PDF form that has some PDCheckBox fields that only have one radio button choice (they are not part of a group of fields of the same name), setField(field, value) will not update those fields. if the PDF form has another check box of the same name with a different default radio button choice then setField will update the field. Any ideas how I can get around this issue without having to modify the forms?

Erik Dietrich
9 years ago
Reply to  PN

I honestly haven’t looked at that code or utility in over 2 years. I wish I could be more help, but it’d probably take me a good long time just to get to where I understood as much as you.

sam
sam
9 years ago

thanks for this

Thebrowser Ccs
Thebrowser Ccs
8 years ago

i tried this but returns all null:

List fields = acroForm.getFields();

if( fields != null && fields.size()>0) {

for(Object field : fields){

System.out.println(field.toString());

}

}

Thebrowser Ccs
Thebrowser Ccs
8 years ago

System.out.println(field+ ” |– “+ field.getPartialName());

//null |– a age 1

//null |– dob 1

processField(field, “|–“, field.getPartialName());

what should be that value of “name” here?

PDField field = acroForm.getField( name );

Thebrowser Ccs
Thebrowser Ccs
8 years ago

This error returns: Don’t know how to calculate the position for non-simple fonts

Erik Dietrich
8 years ago
Reply to  Thebrowser Ccs

Wish I could be of more help, but it’s been years since I’ve even had access to that code, much less looked at it.

Malcolm Greaves
Malcolm Greaves
8 years ago

Why did you write this code to use global, mutable state?

Erik Dietrich
8 years ago

I’m not sure what kind of answer you’re looking for. I wrote this post almost 3 years ago, and the code in question is code I took from an example in their API doco and tweaked to work for my purposes.

This post was just, “here’s a thing I got this working that might help you.” It wasn’t intended to be any kind of lesson on encapsulation.

Malcolm Greaves
Malcolm Greaves
8 years ago
Reply to  Erik Dietrich

It was an open-ended question. In retrospect, I can see how you interpreted it as a hostile comment from me. My apologies.

Your explanation is great though: just a one off kind of thing. I thought your post was more on the lines of “this is how you programmatically fill out PDF forms in Java” not “he’s a hacked together recipe with PDF box.”

Erik Dietrich
8 years ago

No worries on my end 🙂

For a lot of years, I tended to reply with snark if I didn’t understand the context or tone of a question. Now, I just try to ask.

Any static state that I use, generally, would either be the result of me having something foisted on me by existing code or an API, or it would be the result of me not knowing how to eliminate it. I test drive any production-targeted code that I write, so global state tends to be anathema to me because of its effect on testability.

bat_chile
bat_chile
7 years ago

Easy way to iterate the fields in 2.0 version:
Iterator iterator = acroForm.getFieldIterator();
while(iterator.hasNext()) {
log.info(iterator.next());
}

chrylis
chrylis
7 years ago

Appreciate the snippet! I’ve banging my head against the wall with an XFA form but convinced the client to re-render it as FDF, and this has gotten me well along my way.

Erik Dietrich
7 years ago
Reply to  chrylis

Glad if it helped!

Jamiro
Jamiro
7 years ago

Very helpful. Thanks. Is that a way to read contents pdf fields with Apache pdfbox?