Linearized PDF With The Datalogics PDF Java Toolkit

Linearized PDF With The Datalogics PDF Java Toolkit

Sample of the Week:

Joel Geraci

Most people are familiar with streaming audio and video files to their browsers and apps. Some people, like me for instance, are streaming music all day long and then watching Netflix all evening; we’re always streaming. Because you can start listening or watching immediately, streaming is the best way to experience web content that tends to be stored as large files on the server. What most people don’t know is that you can stream PDF files too… sort of. Because PDF is random access, it can be byte-served; it’s not exactly the same thing as streaming but it is a way to retrieve the bytes that you need in order to see a particular page in a PDF document without having to download every other page. Today, this is commonly referred to as “Fast Web View” but was originally called “Linearization”. The name sort of stuck in developer circles.

A “Linearized” PDF is organized slightly differently than a regular PDF and allows an application  to display the content of the first page as soon as those bytes are available rather than forcing the user to wait until the entire document has been downloaded. This can be particularly useful for short documents that have a lot of resources (meaning large files), documents with lots of pages, and any document with even a few embedded fonts. The idea is that for any given file, regardless of the total number of pages, the user shouldn’t see any difference when they move from page to page within that document.

The “first page” doesn’t even need to be page one it could be any page in the file if you use the open parameters in the URL or set the initial view to an interior page. But there are other advantages to creating linearized files. For example, when the user navigates to another page, it will display as quickly as possible because the application knows which bytes to get. With the right viewer, even very large PDF files will perform well over slow connections because the page can display incrementally, showing the most useful data first. This is why sometimes you see a PDF page “snap” and suddenly look a lot cleaner; the embedded font arrived a few milliseconds after the text and images. And finally, for those of us who are impatient, like me, the viewer will accept user interaction, like clicking on a link, before the entire page has been displayed… or even been completely loaded.

Most of the popular PDF creation software that is marketed for that purpose, Adobe Acrobat for example but there are others, will create linearized PDF automatically and by default. However, most software that isn’t engineered specifically to create PDF but only export it, like Microsoft Word, Google Docs, and Open Office don’t create Linearized PDF. That’s understandable. But…

Unfortunately most PDF developer libraries and toolkits can’t create Linearized PDF files either… which brings me to my point.

The Datalogics PDF Java Toolkit sample, LinearizeDocument, demonstrates how to open a PDF file and save it as a Linearized file. Here’s the best part… the interesting section is one line of code.

PDFSaveOptions options = PDFSaveLinearOptions.newInstance();

Because creating a properly formatted Linearized PDF is non-trivial, Adobe made it simple for PDF developers. Like most classes in the Datalogics PDF Java Toolkit, the defaults do exactly what you need them to do. The PDFSaveOptions combined with the PDFSaveLinearOptions can be used on files that were created by the Datalogics PDF Java Toolkit as well as PDF files that were created in other applications and libraries. In both cases, the “linear save” operation, will rewrite the PDF document in a way that will provide more efficient incremental access over a network.

View and download LinearizeDocument sample or get all the samples and documentation by requesting an evaluation of the Datalogics PDF Java Toolkit.


5 thoughts on “Linearized PDF With The Datalogics PDF Java Toolkit

  1. A BIG warning about linearizing (optimizing for fast web view): Do NOT linearize multi-page fillable forms!

    The advantage of byteserving simple multipage documents (transmitting just the objects needed, object by object) turns into a serious slowdown, because a form field can consist of a dozen objects, which means that for each object, a channel has to be opened, handshake established, the object sent, and the channel closed. Many network connections are limited to 8 open channels at the same time… go figure.

    1. That’s a good point Max, thanks for the feedback. The characteristics of some PDF files make linearization a moot point and linearization in these cases may actually hinder performance. My article explains what linearization is and how it can be accomplished using the Datalogics PDF Java Toolkit, This is a significant differentiator for our product. While your warning and assertions about linearization are true, I’ve heard arguments like these from other PDF tool developers used as excuses to not provide linearization at all. The Datalogics PDF Java Toolkit gives developers the option to linearize… or not… based on the type of PDF file they are working with.

      1. Your article is very good explaining linearization. And, of course, how to accomplish it using the Toolkit.

        My comment is not aimed at the Toolkit. It is aimed at the Toolkit operator. In my experience, I have seen so many times that linearization has been activated where it should not have been (OK, we can blame Adobe on that, because in the really olden days, one had the choice to linearize in the File Save dialog, but then got taken out and turned into an application preference).

        So, it is the operator’s decision, as you say, and I do really like the idea to have the chance to make such a decision. The argument against not including linearization you mention in the comment, is indeed pretty bogus; it sounds more like an excuse to avoid bothering with linearization).

        Making the decision does require some knowledge, but then, we should expect that knowledge be present among the operators… (or is that wishful thinking)?

    1. Jim:

      I’ll break that up into two questions; can attachments be linearized and does it help performance to do so. Attachments to PDF files can be linearized and the Datalogics Java Toolkit is certainly capable of processing non-linear attachments to make them linearized and then attach them back to the root PDF. However, when byte-served, these attachments will be considered one big block of data to be retrieved. This is why I generally recommend that PDF Portfolios be served up from a directory that has it’s content disposition set to “attachment”. This will force all the bytes to be downloaded prior to the file being opened for viewing and generally provides a better experience.

Leave a Reply

Your email address will not be published. Required fields are marked *