Converting PDF Forms to HTML Forms with Datalogics PDF Java Toolkit

Converting PDF Forms to HTML Forms with Datalogics PDF Java Toolkit

Not all PDF viewers are created equally, and when it comes to handling PDF Forms, the level of support that PDF viewers provide varies greatly. It gets worse when you mix in web browsers that support displaying PDFs. Over time, the support for PDF Forms both in PDF viewers and web browsers should improve – who has time to wait for that though? Your business has real customers today that you need to collect information from now. Recreating the forms your business relies on by hand, in HTML, would take too much time. By leveraging Datalogics PDF Java Toolkit, though, you can automate the conversion to HTML to get a basic HTML form quickly and then adjust the appearance. With an HTML form in hand, you can have your customers fill out the form using a web browser on any device, merge their input into the PDF, and then send them the resulting PDF as a record (or store it for your records).

A couple of PDF Specific items to watch out for

At a high level, converting a PDF Form to an HTML Form is relatively straightforward – create an HTML Form and add HTML elements to the form to represent each field from the form in the PDF. There are more than a few pieces that might trip you up, though, if you are not familiar with the PDF Specification or are working with complex PDF Forms.

The first thing to watch out for is that there are at least two different types of forms found in PDF files so it is important to know what you are working with. We have covered how to determine what type of PDF Form you are working with by using our PDF Java Toolkit previously (see Detecting PDF Form Types) so we won’t go into detail about that in this post. To help us focus on how to use PDF Java Toolkit, let’s assume we are working with an AcroForm. AcroForms are the most straightforward and widely supported type of PDF Form, and have incredibly powerful features.

The second thing to watch out for is that in a PDF Form, the fields in the form are just the fields, and the text surrounding the field could be defined anywhere in the PDF. In HTML, though, any text that needs to be displayed to the user to help them fill in the correct information needs to be written in an HTML element before the form field so that when the HTML is presented to the user, they are reading the content in an order that helps them fill out the form. Since PDF documents typically do not have a well-defined structure, it is a best practice to name the fields in a PDF Form appropriately. When working with PDF Forms, it is also important to rely on the fully qualified name of the field so that you have a unique name to work with so that if there are multiple address fields in a form (for example, a shipping and billing address) you can distinguish between them without having to know that “address2” is supposed to be interpreted as the billing address.

Using Datalogics PDF Java Toolkit to create an HTML Form

Our ConvertAcroFormToHtml sample accepts 2 arguments, the first being the PDF file that contains a form and the second being where the resulting HTML file should be written to (please note that this sample relies on the HtmlFlow project for creating and writing out the HTML, and that this is not part of Datalogics PDF Java Toolkit). The ConvertAcroFormToHtml sample will loop through the fields in a PDF Form by using an Iterator with the type PDFField and create an HTML file that contains a representation of the form without any other context from the original PDF.

The interesting code starts with acquiring the PDFInteractiveForm and an Iterator for the PDFFields in the form.

final PDFInteractiveForm form = pdfDocument.getInteractiveForm();
final Iterator<PDFField> fieldIterator = form.iterator();

With the iterator acquired, the sample sets up the HTML document using the title of the PDF as the title of the HTML document and constructing the HTML Form that will contain the fields defined in the PDF Form.

final HtmlView<?> taskView = new HtmlView<>();
taskView.head().title(pdfDocument.getDocumentInfo().getTitle());
final HtmlForm<?> htmlForm = taskView.body().form("Form");

Now that the HTML Form has been constructed, we can use our Iterator to inspect each PDFField defined in our PDF Form to determine what should be displayed in the HTML Form. In PDF Forms, there are 4 primary types of fields that can be displayed

  • Button
  • Choice
  • Signature
  • Text

All other field types are really extensions of these types. This is important to know so that when you go to create a field in the HTML Form, you know what to look for on the PDFField object in PDF Java Toolkit. For example, a Checkbox in a PDF Form is really a button with a property that changes its appearance so when adding this to an HTML Form you need to specify that the type of your input element is “checkbox”. Unless you specify the type in the HTML on your input elements, you will not get an HTML Form that resembles the PDF Form.

In the ConvertAcroFormToHtml sample, there is code to handle a few of the field types. It starts out by getting the field type from the PDFField

final PDFFieldType fieldType = field.getFieldType();
if (fieldType == PDFFieldType.Text) {
    htmlForm.text(field.getQualifiedName()).inputText(field.getQualifiedName()).br();
}

For the sample, we have kept this pretty basic as it is an area that can be highly customized. Some use cases might call for changing the field type in the HTML Form entirely whereas others might want to make the HTML Form an exact copy of the PDF Form. One thing to note about choice fields in a PDF is that the options may not be a List of Strings and it might instead be a List of Lists!

After the sample has finished going through the fields in the form, it writes out the resulting HTML Form. Depending on how you intend to use the HTML Form, this may be all you need. For most use cases, though, you will want to include more context from the original PDF Form and provide some styling to the HTML Form. In complex forms, there may even be JavaScript to handle validation or formatting of field values or JavaScript to perform calculations when buttons are pressed or when field values change which can be extracted from the PDF as well.

We hope this sample provides a base for you to start with if you are looking to convert your PDF Forms to HTML Forms and that you will leverage PDF Java Toolkit to automate this work to save time and prevent errors. Sign up for an evaluation of PDF Java Toolkit today by filling out the evaluation form.

Leave a Reply

Your email address will not be published. Required fields are marked *