PDF Forms: Peeling Back the Onion

PDF Forms: Peeling Back the Onion

http://www.dreamstime.com/royalty-free-stock-images-onion-mandala-image14549529

Sample of the Week:

Joel Geraci

For many web developers, PDF Forms are a bit of a mystery and best avoided outside of their Defence Against the Dark Arts classes. While PDF Forms are incredibly powerful, the beauty and elegance of the format only reveals itself once you’ve peeled back the layers and gotten to the core… so let’s get started.

As I’ve mentioned in the past, PDF forms come in two basic varieties; the ones from prior to Adobe’s acquisition of Accellio (JetForm)… and the ones that came after… in more common terms that’s AcroForms and XFA, respectively. This article is limited to the classic AcroForm type of PDF form.

As I mention above, if you’re a developer and have experience with HTML forms, PDF Forms can be somewhat mysterious. At the time of this writing, HTML5 has 22 different types of input elements; PDF has 4. Now, this might lead one to believe that PDF is impoverished compared to HTML but that’s not exactly the case; you can build a lot of truly elegant stuff using just 4 building blocks. There are three key aspects of the PDF specification that developers need to understand to fully appreciate PDF Forms and work with them effectively. The viewer or API that the developer is using must also process the PDF file with respect for these three aspects or the user may not see what they should be seeing when they open the form. Luckily, the Datalogics PDF Java Toolkit makes accessing these aspects of the format simple and easy. To a certain degree, it also cuts through some of the confusion that the Acrobat form authoring user interface might be causing.

Fields vs Widgets:

Unlike HTML, a PDF Field isn’t actually the thing that you type your data into on the page. A field is an object in a dictionary that belongs to the entire document; not an individual page. Each field has name, and generally occurs exactly once. Fields can also be hierarchical using a period as the separator. The thing on the page that you type into is a “Widget Annotation” or just “widget.” A widget that refers to the same field can appear on any number of pages throughout a document and even multiple times on the same page but it always has the same value. This allows a form developer to place a field called “name” on every page of the document but a user only needs to enter their name once in order to have it appear everywhere. Here comes the fun part… the value entered is stored in the field but the various widgets can have different appearances. The value of a field is completely separate from how that value is presented… appears… on the page; this lets you do some pretty amazing things.

Values vs. Appearances:

eval-nowWhile Adobe Acrobat allows a user to place 8 different types of form fields on a page, there are really only four; Text fields, Button fields, Choice fields, and Signature fields. Acrobat uses appearances to differentiate these four into the 8 that people are most familiar with. When a user creates a new checkbox, Acrobat automatically generates a button field, sets the button type for the widget, creates the appearances necessary for the on state and the off state and then assigns the proper appearance based on the field value. The FormFieldManager interface in the Datalogics PDF Java Toolkit does the same thing for developers; a single line of code will create the field, create a default set of appearances and set the value. The code below is from the FormFieldServiceSample which demonstrates how to add new AcroForm fields to a PDF document via the FormFieldService API making creating forms easier.

Code Snippet:

formFieldManager.addCheckBox("chk1", true,
     PDFRectangle.newInstance(pdfDoc, 400, 400, 450, 450), page,
     null);
formFieldManager.addCheckBox("chk2", false,
     PDFRectangle.newInstance(pdfDoc, 500, 400, 550, 450), page,
     null);

The best example that demonstrates the difference between a value and an appearance is the Barcode field. The value of a barcode on an AcroForm can be presented as one of three different barcode types but the underlying value is the same regardless of how it’s presented on the page. This is because barcode fields are really just text fields with a special appearance that Acrobat creates automatically using it’s built in barcode generator. The Datalogics PDF Java Toolkit also has a built-in barcode generator and can regenerate a barcode on a PDF form based on new data that may have been added programmatically.

The other aspect of barcode fields that distinguish them from regular text fields is that they are always calculated even though you won’t see a “Calculate” tab when you look at the properties of a barcode field in Acrobat. What you will see is an option to encode the data as tab-delimited or XML. When a user creates a barcode field in Acrobat, under the hood, a small bit of JavaScript is added to the calculation script for the field. The calculation formats the value of the field to be either tab-delimited or XML, with or without field names based on the fields selected in the interface. Because properly interpreting this JavaScript is critical to properly generating the appearance for a barcode, PDF developer tools that can’t execute the calculations in PDF form fields don’t have a very good chance of generating the barcode properly.

JavaScript:

The final aspect of the PDF Forms puzzle is JavaScript. JavaScript in PDF can be extremely powerful. It can be used to validate the values entered into a field, calculate the value of a field, and format the field. It’s the format script that’s most interesting for the purposes of this article. The format script is what tells Acrobat and the Datalogics PDF Java Toolkit what to base the appearance of the widget on. The format script is what adds currency symbols to numbers and limits the number of decimal places to show. The format script can be used to change the appearance of negative currency values to be shown in red while positive ones are black. The underlying value is just a number but the format script generates a string that is used by the appearance generator to display the value in the format requested.

Why?

At first this architecture can appear to be overly complicated. It isn’t. The “P” in PDF stands for “Portable” and portability is exactly what this architecture provides. By separating the field values and their appearances, a PDF viewer that doesn’t understand what interactive fields are can still display the form properly. This is particularly important for PDF viewers on mobile devices that are often times less capable than their desktop counterparts or high-speed printers that can accept PDF files directly. As long as a fully capable PDF tool like Acrobat or the Datalogics PDF Java Toolkit was the last tool to modify the form, virtually any PDF viewer… even the worst of them… can display the form as the author intended.

View and download the FormFieldServiceSample sample or get all the samples and documentation by requesting an evaluation of the Datalogics PDF Java Toolkit.

Leave a Reply

Your email address will not be published. Required fields are marked *