PDF rendering and coordinate systems

PDF rendering and coordinate systems

With the PDF Library, we get a number of questions from people who are rendering PDF pages to raster devices or to raster image files. One of the trickier concepts to grasp is the translation of coordinate systems from the system used in PDF files to that used for rasterization. In this article, I’ll briefly discuss the factors involved in rendering PDF files.

Coordinate Systems

The PDF file format uses a different coordinate system – a different means of specifying locations relative to each other – than most raster image formats. In the PDF file format, increasing X values specify the rightward direction and increasing Y values specify the upward direction. That is, the point (X + 1, Y + 1) is one PDF unit above and one PDF unit to the right of the point (X, Y). This differs from most image formats where the opposite direction for Y values are used. In most image formats, including how raster images are stored in the PDF file format, increasing Y values specify the downward direction: (X + 1, Y + 1) is one pixel to the right of and one pixel below the pixel at (X, Y). Also note the distinction between PDF points and raster pixels. It is perfectly legitimate and expected that content in a PDF content stream can be placed at non-integer points.

The following hold true for PDF files in most cases but not always:

  • The origin of the PDF coordinate system (0, 0) represents the bottom-left corner of the PDF page
  • PDF files specify 72 points to 1 physical inch

It is very imporant to know that these are true most of the time, but not all of the time. What this means is that when writing a program that renders a PDF page, you need to account for both of these. You must also account for any page rotation that is specified with a Rotate key for the PDF page when rendering.

The PDF API documentation makes reference to these two coordinate systems as user space and device space. User space is used to refer to the PDF page coordinate system, where points are specified in PDF units. Device space is used to refer to the coordinate system of where you are drawinng to, where units are those used in your output type – typically in pixels for a raster image.

Rendering parameters

There are three input parameters to the rendering APIs in the PDF Library / DLE that control how pages are rendered: the transformation matrix, the user space updateRect and the device space destRect. These parameters are used in the following way:

  • The updateRect is used to clip the PDF page to be rendered, restricting the drawing to a specific region of the PDF page. This is specified in user space (PDF) coordinates.
  • The matrix is used to scale, rotate and transform user space (PDF) coordinates to device space coordinates. Usually this has the following:
    • A scaling factor to transform user space coordinates into suitably sized device space coordinates. In a typical situation, someone who is rendering a PDF page to a 300dpi raster would specify scaling of 300/72 in the X and Y direction.
    • A rotation factor to account for rotated PDF pages. PDF pages with Rotate keys specified need to have a transformation matrix applied to cause a suitable rotating for rasterization.
    • A rotation factor to flip the Y coordinates of user space, to account for the different directions that increasing Y values go in between the two coordinate systems.
    • A translation factor in the Y direction to normalize the start of the PDF page to draw in user space to the origin of the device space. This accounts for the flipping of the Y coordinates.
    • A translation factor in the X and/or Y direction to normalize the start of the PDF page to draw in user space to the user space origin (0, 0). This accounts for PDF pages where the visual contents (the CropBox) do not start at the user space origin.
  • The destRect defines the boundaries of device space to draw into. This is specified in device coordinates; typically as the number of raster pixels in the device X and Y coordinates.

Notes

  • Specification of the updateRect is optional; if it is not specified, no clipping of the PDF page will be carried out by the rendering call. The matrix and destRect are required.
  • PDF points that are transformed by the matrix to values outside of the destRest are not drawn; they are clipped.
  • The matrix is not required to transform user space fully into device space. It is legal to have a matrix that draws the PDF page only to part of the destRect. However, you are strongly advised to use the updateRect to restrict the drawing to the region of the PDF intended for imaging.
  • PDF pages can have content outside of their intended viewing region (the CropBox) and outside of their intended print region (the MediaBox). If you do not restrict your rendering region appropriately, then rasterizing PDF pages that have content outside of these regions will show this content. This may lead to unexpected results.

One thought on “PDF rendering and coordinate systems

  1. Team,

    I need to understand your Adobe PDF library to capture and extract text from PDF file.

    Can we have a call set up quickly?

Leave a Reply

Your email address will not be published. Required fields are marked *