Splitting a PDF Document by File Size Using the Datalogics PDF Java Toolkit

Splitting a PDF Document by File Size Using the Datalogics PDF Java Toolkit

Sample of the Week:

PDF is widely accepted for submitting judicial documents across the world. In most cases, there are size restrictions but the legal system does allow you to e-file documents in sections and they suggest you try to divide up the document in logical places; between chapters or sections of the document for example.

Acrobat allows you to do this fairly easily by letting you split the document based on the number of pages in a document, by file size, and by top level bookmarks. But if you want to automate the process on a server, or integrate the process into a document management system, Acrobat isn’t the right tool; it’s automation features are designed or end-user interaction and it isn’t licensed for this type of server use. However, the Datalogics PDF Java Toolkit can be used to replicate the Acrobat functionality. This week is the second of a three-part series that will discuss how to programmatically split a document in the same way that Acrobat does. Follow this link to read about how to split a document based on top level bookmarks.

What You Need to Know First:

Part of what makes PDF files so flexible is the fact that the entire file doesn’t need to be downloaded before a viewer can begin to display the file. The necessary parts can be accessed separately and the page can be essentially streamed to the browser. The feature of PDF that allows for this is the Cross-Reference Table which contains information that permits random access to indirect objects within the file so that the entire file need not be read to locate any particular object. The table contains entries for each indirect object, specifying the byte offset of that object within the body of the file. These indirect objects can also be compressed within the PDF file. The compression and the writing of the Cross-Reference Table happens when the file is saved. So, in order to determine the file size of a particular PDF file you may be working on in memory, you need to save it to disk.

The Process:

With the above in mind, the process of splitting a file based on a maximum file size is quite simple. You just create a new, one page, document from the first page of your source file and then append one page at a time until you reach the maximum file size you require, saving the file and checking the size between each append.

If the Gist runs correctly, you end up with 22 new files that are about half a megabyte each. To get started with splitting PDF files, download this Gist and request an evaluation copy of The Datalogics PDF Java Toolkit.

Leave a Reply

Your email address will not be published. Required fields are marked *