Finding Barcode Fields using the Datalogics PDF Java Toolkit

Finding Barcode Fields using the Datalogics PDF Java Toolkit

Sample of the Week:

Last week I wrote an article describing how to read barcodes stored as images in PDF files; the technique is perfect for reading barcodes in forms that were flattened by some other process. This week describes an approach to locating barcode fields that haven’t been flattened; rather than decoding the barcode, you can just reach into the field and pull out it’s value. The barcode image itself is just the appearance of the field, the field value is actually the string that the barcode encodes.

The value of a barcode on an AcroForm can be presented as one of three different barcode types but the underlying value is the same regardless of it’s appearance on the page. This is because barcode fields are really just text fields with a special appearance that Acrobat creates automatically using it’s built in barcode generator. The Datalogics PDF Java Toolkit also has a built-in barcode generator and can generate a barcode on a PDF form based on new data that may have been added programmatically.

The other aspect of barcode fields that distinguish them from regular text fields is that they are always calculated even though you won’t see a “Calculate” tab when you look at the properties of a barcode field in Acrobat. What you will see is an option to encode the data as tab-delimited or XML. When a user creates a barcode field in Acrobat, under the hood, a small bit of JavaScript is added to the calculation script for the field; this JavaScript isn’t generally accessible through the Acrobat UI. The calculation sets the value of the field to be either tab-delimited or XML, with or without field names based on the fields selected in the interface. Because properly interpreting this JavaScript is critical to properly generating the appearance for a barcode, developer tools that can’t execute the calculations in PDF forms don’t have a very good chance of generating the barcode properly; if at all.

With this in mind, when we find ourselves with a PDF file that contains barcode fields, we can bypass the step of decoding the barcode by just reading the value of the field. As the Gist below shows, the Datalogics PDF Java Toolkit makes it incredibly easy to iterate through the fields in a PDF form, find the text fields, locate the annotations associated with a field, and then determine if that field value is represented as a barcode. The key property that distinguishes a barcode field from a regular text field is the “Paper Metadata” or PMD Dictionary. If the text field has a PMD Dictionary, it’s a barcode field (see highlighted line).

To get started with reading barcode fields in PDF files, download this Gist and request an evaluation copy of The Datalogics PDF Java Toolkit.

Leave a Reply

Your email address will not be published. Required fields are marked *