When you can’t wait for [PDF’s] future to arrive…

When you can’t wait for [PDF’s] future to arrive…

If you read any science fiction, you’ve probably heard of the grandfather paradox: if you were to travel back in time and change history such that you were never born, then you would never exist in the present day…and therefore could never travel back in time.  But I made a realization the other day that a similar thing happens in developing technology: when creating something new, our choices today both enable the future…and limit it.

As part of my job at Datalogics, I look for PDF and technology questions to answer on popular forums. Recently, I answered this question on Quora.  In case you don’t feel like clicking through, in brief,the question was, “Why do I have to download an entire PDF before I can start looking at it?” This makes sense to ask: we’re used to streaming video, where you don’t need all the data before you can start looking at it.  The answer is straightforward enough; to quote my reply:

When the PDF format was first developed and released in the early 1990s, internet connectivity was much less common. Computers were relatively slow and memories were small, so the format was designed to work well in that environment: as something that could be read and written quickly with a slow CPU while using as little memory as possible.

A PDF basically consists of a series of data objects that describe the text and graphics on each page, followed by a cross-reference table that describes where each object is. Placing this table at the end of the file made it easy to write the file to disk quickly, because it didn’t have to calculate in advance where every object would be in the file before writing it. The disadvantage, of course, is that a PDF can’t be interpreted until your viewer has the cross-reference table—and in an ordinary PDF it’s the last thing to be downloaded because it’s at the end of the file.

First, I should clarify something. Everything I said is true, but since version 1.2 there has been an option in PDF called Linearization which places an abbreviated cross-reference table near the front of the file. This allows supporting readers to begin displaying the first part of the PDF immediately—in other words, in a manner more like a streaming file. This requires saving the PDF in a special way (often described as “optimized for web”) and is supported by most modern PDF authoring tools.

However, while reading an article by my colleague Matt Kuznicki, I realized something about how innovation happens, and how technology evolves.

We forget what computing was like in 1993, when the PDF standard was first announced. Computers had arrived on every desk and in every home, but they were twenty pound boxes that cost thousands of dollars and had a tenth of the power of a cheap smartphone of today. The Internet did exist…as a vaguely mysterious thing you connected to via a phone line and a modem, through which you might get email or IRC, or download a few files. In fact, the primary way home users connected to one another was not via the internet itself, but with commercial services like America OnLine.

As for the World Wide Web, it had only just come into existence, and was still relatively nascent; and yet, we knew it was going to be a big thing. I can remember how quickly it seemed to become an everyday thing. Already, magazines and commercials were sporting the weird incantation of “www.something-something.com”, even as we didn’t fully understand what it was. There was talk of online shopping, online news, information at your fingertips, even if none of it had quite materialized.

PDF aspired to be a broadly used standard that would infiltrate myriad daily uses: the electronic equivalent of paper. Anything where paper was currently used, a PDF on a computer could replace it. That meant of course that it would be used on the web as well…even if the web as we know it didn’t exist yet. But first, it would have to establish itself.

So, here’s the paradox: in order to make PDF a viable standard at the time it was introduced, it had to be designed to run on the hardware that was common at that time. Than meant optimizing it for the machines in use then. But that design choice meant it was ill-suited for use in a streaming environment: it would not work on the internet as well as it could had it been designed for that purpose from the beginning.

But suppose the Adobe engineers had a crystal ball in 1993: suppose they knew the Internet was going to become big (as it indeed eventually did). They could have chosen to optimize it for a streaming environment by putting the dictionary up front. But then they would have had a format that was slow and unwieldy on the machines people had at the time. It’s quite possible no one would have used the format; large amounts of data in PDF would never have been generated. It would not have become popular. And today, twenty years later, no one would care how well it worked on the Internet, because no one would need it to.

This is an example of an engineering tradeoff: optimizing for one condition at the expense of another, based on what seems the wisest choice at the moment. The paradox is that one never has perfect knowledge of the future. Decisions that seem rational at one time may be overturned by events you couldn’t know—or even events you can predict, but still can’t wait for.

‘Engineering for today’ and ‘engineering for the future’ are often contradictory. If you want to have a future, it behooves you to make sure you’re around for it.

Leave a Reply

Your email address will not be published. Required fields are marked *