PDF File Variants: an Overview

PDF File Variants: an Overview

Adobe’s Portable Document Format (PDF) file format has evolved significantly since its introduction in 1993. While the original aim of PDF was as a successor to PostScript as a page description language, PDF has evolved during its lifetime as the meaning of a document has evolved, and as the expectations of users for what documents should and shouldn’t be has changed – from printed pieces of paper, to hypermedia (anyone remember that term?), to interactive tool, to generalized information holder. The PDF standard is now a rather complex file format meant to serve many different purposes, including as a standard document format in the form of ISO 32000. Several different technology industry niches have, over the lifetime of the PDF file format, introduced various subsets of PDF designed to reduce the complexity and variability allowed in the PDF file format for the purpose of allowing more reliable behavior when working with PDF files in various situations. Here I will introduce the most popular of the PDF file format variants.There are several different “flavors” of PDF that represent various restrictions on and/or extensions to the PDF standard as described in ISO 32000. The most popular of these include:

PDF/X (ISO 15930): PDF/X places various restrictions on PDF files such that PDF/X files can be reliably transmitted through a graphic arts workflow and printed with the appearance that the creator of the PDF/X file expected. PDF/X is a family of standards that has evolved since PDF/X-1a was first introduced in 2001. Presently PDF/X-1a:2003 and PDF/X-3 seem the most widely used PDF/X versions, with other variants extending PDF/X to handle specific functionality available in the PDF standard:

  • the expansion from CMYK+spot colors only being allowed in PDF/X-1a to also allowing calibrated RGB in PDF/X-3, to general ICC-defined colorspace use in PDF/X-5
  • the expansion from self-contained PDF files that can be processed without external font or other files in PDF/X-1a to allowing external references to alternate graphic representations in PDF/X-3, to expanding external graphic reference usage in PDF/X-5
  • updates on the PDF standard version used as a base, from PDF 1.3 in PDF/X-1a to PDF 1.4 in PDF/X-3, to PDF 1.6 in PDF/X-4 and PDF/X-5.

PDF/X has in many ways supplanted the use of TIFF/IT and other pure raster-based file formats, as well as EPS files, in many color-critical graphics arts workflows such as catalog and magazine production and in print proof approval workflows.

PDF/A (ISO 19005): PDF/A places various restrictions on PDF files for the purposes being able to interpret and understand PDF files in a long-term timeframe. While PDF/A does not address archival questions relating to computing hardware or environments, PDF/A does aim to ensure that a PDF/A conforming file can be properly read and understood in many years’ time if there is an interpreter for the file format. The primary PDF/A revision, known as PDF/A-1, has two major conformance levels:

  • PDF/A-1a seeks to preserve the ability for humans and for machines to understand the semantic and visual content of PDF/A files. PDF/A-1a files include structure and tagging information to preserve the semantic meaning of page contents, such as what words make up sentences and paragraphs; textual representation of characters and glyphs in a content stream for reliable text extraction; and alternate text for images and other visual items.
  • PDF/A-1b is a less stringent standard that seeks to preserve the ability for machines to properly display PDF/A files in a manner that humans can read and understand the pages. Unlike the ‘a’ conformance level, PDF/A-1b files do not require that information related to the meaning of the words and contents be present in the PDF/A file.
  • PDF/A-2 updates the PDF/A-1 standard to be based on the ISO 32000 PDF standard, rather than the PDF Reference version 1.4 that PDF/A-1 was based on. While all PDF/A-1 files are also PDF/A-2 compliant, the change in PDF standard used means that PDF/A-2 documents may use some PDF constructs such as transparency and JPEG2000-format images that are not allowed in PDF/A-1. PDF/A-2 also allows the embedding of other PDF/A (-1 or -2) format PDF files inside of PDF/A-2 files.
  • PDF/A-3 extends PDF/A-2 to allow embedding of arbitrary format data files into PDF/A-3 files. This is supported primarily in order to archive the source data that a PDF/A file was created from for future use.

Find more introductory information on PDF/A at http://www.crawfordtech.com/images/resources/PDF_A_101_An_Intro_Final.pdf.

PDF/UA (ISO 14289-1): PDF/UA places various restrictions on PDF files for the purposes of content accessibility. Content accessibility is important for users who have various physical restrictions, such as those who use screen magnifiers; the blind, who use screen reading software; and the deaf, who require descriptions of audio-based content. Because PDF/UA files require visual PDF content to be tagged with structural and semantic information (like PDF/A-1a files), software that reflows PDF pages can much more easily handle reflowing PDF/UA files to fit different screen sizes and aspect ratios. The structural and semantic information also enables much more reliable text-to-speech conversion for those who rely on this technology to work with PDF files.

Find more information on PDF/UA at http://www.commonlook.com/what-is-pdfua.

PDF/E (ISO 24517-1): PDF/E places various restrictions on PDF files for the purposes of document transfer and usability in 3D engineering and CAD workflows. These restrictions are aimed at making PDF/E a reliable format for transferring engineering documents between different software tools, and for creating engineering files that can be viewed (though not necessarily manipulated) by parties outside of these engineering software tools.

PDF/VT (ISO 16612-2): PDF/VT is a set of restrictions and enhancements for PDF files designed so that PDF files can be used as template data for fast rendering and printing of transactional and variable data. Variable data workflows typically have a pre-defined set of page content that is transformed or added to by data that varies on a per-item basis, such as a batch of postcards that each have a different address on the back. Transactional printing workflows are workflows where there is pre-defined page content that is transformed or added to by a variable amount of data on a per-item basis: each item can be a different length, and the number of items to print may not be known when the print job is started. PDF/VT builds on the PDF/X-4 and -5 standards to ensure print fidelity, and includes constructs to define reusable page contents and to enable better performance when printing varying data items. PDF/VT was derived from an earlier standard, PPML/VDX (the 1st version of ISO 16612). PDF/VT differs from the PPML/VDX standard in that it adds additional features for print fidelity, and uses different information for marking and using reusable page content than the PPML standard used by PPML/VDX.
Two different flavors of PDF/VT exist: PDF/VT-1 and PDF/VT-2. The most important difference between the two is that PDF/VT-1 is a self-contained file and is always printable without external data. PDF/VT-2 files, on the other hand, may require external resource files or external resource data streams in order to print.

While this only scratches the surface about what each of these PDF standards are for, remember that each of these are valid PDF files and can be processed by any PDF workflow that is compatible with the underlying PDF standard version that these are based on.

Leave a Reply

Your email address will not be published. Required fields are marked *