Converting PDF to PDF/A-1b with Datalogics PDF Java Toolkit

Converting PDF to PDF/A-1b with Datalogics PDF Java Toolkit

In the world of PDF specifications, there are multiple versions and conformance levels for PDF/A. PDF/A is a variant of the PDF specification that adds additional requirements on what information needs to be entirely contained in the PDF itself. These additional requirements are meant to help preserve a document for long term archival so that when the PDF is viewed 50+ years in the future, it is presented exactly as it would appear today. Ensuring that documents can exist and can be viewed exactly as they were intended is critical for businesses who deal with long term contracts or document management. When you are focused on closing a sales deal or helping someone purchase a house, you may not think about the long term storage of the documents you are creating. This is why tools like Datalogics PDF Java Toolkit exist. Our PDF Java Toolkit helps businesses convert an existing PDF to one that meets the PDF/A specification – you just have to know which PDF/A variant suits your needs best. (Look out for an upcoming blog post by Vel Genov that will explain the differences between the various PDF/A standards, and help you find the one that fits your business needs.)

In this article, we will focus on using Datalogics PDF Java Toolkit to convert an existing PDFs to PDF/A-1b.

Converting PDF to PDF/A-1b

There are a couple of things you should know about PDF/A-1b and Datalogics PDF Java Toolkit before we dive into the sample for converting a PDF to PDF/A-1b. Let’s start with understanding a bit more about PDF/A-1b. PDF/A-1b is one conformance level defined by the PDF/A-1 specification that is built on top of PDF 1.4. PDFs that claim to be PDF/A-1b are PDFs that will always appear the same when rendered on screen or printed, there is no guarantee that any of the information can be extracted from the PDF though. PDF/A-1 also prohibits the use of transparency and the use of JPEG2000 compressed data. When PDF Java Toolkit encounters either of these features, it will throw an exception to let you know that it cannot convert the PDF to PDF/A-1b. In most business documents, you will not run into transparencies or JPEG2000 compressed data because business documents tend to be text heavy and not graphic or image heavy. But if you need to convert PDFs that contain transparencies or JPEG2000 compressed data to PDF/A-1, you will want to use the Adobe PDF Library instead of Datalogics PDF Java Toolkit – it is a better solution for those types of documents.

The ConvertPdfDocument sample in PDF Java Toolkit is the sample responsible for demonstrating how to convert a PDF to PDF/A-1b. In this sample, the code that is responsible for converting the PDF to PDF/A-1b is all contained in the method convertToPdfA1B. Let’s dissect the portions of the convertToPdfA1B method that is specific to the PDF/A conversion process.

Before the PDFDocument object is constructed from the input PDF, the sample constructs a FontSet from the fonts that are available on the system where the sample is being run.

final PDFFontSet pdfaFontSet = FontSetLoader.newInstance().getFontSet();

Note: When converting to PDF/A, it is extremely important to have the fonts that are used in the PDF available because in PDF/A, all fonts must be embedded in the PDF. This is one requirement that helps ensure that documents will be viewed in the future exactly as they are viewed today. With PDF Java Toolkit, the earlier that you can supply a set of fonts to the PDFDocument to work with, the better. That’s why we recommend supplying this as part of the PDFOpenOptions object that gets used when constructing the PDFDocument object.

final PDFOpenOptions openOptions = PDFOpenOptions.newInstance();

Once the PDFDocument has been constructed, the sample creates the required conversion handler and conversion options objects:

final PDFA1bConfiguredConversionHandler handler = new PDFA1bConfiguredConversionHandler();
final PDFAConversionOptions options = PDFAConversionOptionsFactory.getConfiguredPdfA1bInstance(pdfDoc);

Read that first line carefully because it uses a very specific conversion handler, the PDFA1bConfiguredConversionHandler. If you look in the package, you will notice that there are a couple of different classes that contain “ConversionHandler” in their name, so make sure that when you are using PDF Java Toolkit in your application you are using the correct one for your needs. The PDFA1bConfiguredConversionHandler is built specifically to be used when converting to PDF/A-1b and should not be used if trying to convert to any other PDF/A variant, the results of using the PDFA1bConfiguredConversionHandler for any other type of conversion is undefined.

After those objects have been setup, there is one API call to make to convert the PDFDocument object to a PDF/A compliant PDF:

PDFAService.convert(pdfDoc, PDFAConformanceLevel.Level_1b, options, handler)

In the sample, this line is part of an if statement because the convert method on PDFAService returns a boolean to indicate whether conversion to PDF/A was successful or not. Since conversion to PDF/A is not a guaranteed success, always check the return value of PDFAService.convert to determine if the conversion was successful. Since some government or regulatory institutions require PDF/A conformant documents, we suggest going a step further and using a PDF/A validator like the industry supported validator from veraPDF.

Don’t delay, preserve today

Preserving documents is vital for businesses to ensure that they have records that can be reliably used in the future. No one wants to be in a situation where their records cannot be used because the PDF can no longer be opened. PDF/A was specifically designed to meet the need of preserving documents over long periods of time, surviving past the author of the document and potentially past the lifetime of the company who originated the document.

Now that you understand more about PDF/A and converting your existing PDFs to PDF/A-1b, sign up for an evaluation of Datalogics PDF Java Toolkit to start preserving your documents today.

Leave a Reply

Your email address will not be published. Required fields are marked *