Splitting a PDF Document by Number of Pages using the Datalogics PDF Java Toolkit

Splitting a PDF Document by Number of Pages using the Datalogics PDF Java Toolkit

Sample of the Week:

PDF is widely accepted for submitting documents by courts across the world. In most cases, there are size restrictions but the courts do allow you to e-file documents in sections and they suggest you try to divide up the document in logical places; between chapters or sections of the document for example.

Acrobat allows you to do this fairly easily by letting you split the document based on the number of pages in a document, by file size, and by top-level bookmarks. But if you want to automate the process on a server, or integrate the process into a document management system, Acrobat isn’t the right tool; it’s automation features are designed or end-user interaction and it isn’t licensed for this type of server use. However, the Datalogics PDF Java Toolkit can be used to replicate the Acrobat functionality. This week is the last of a three-part series that will discuss how to programmatically split a document in the same way that Acrobat does.

The previous articles are at the links below:
Splitting a PDF Document by Top Level Bookmarks
Splitting a PDF Document by File Size

Continue reading to learn how to split a document based on a maximum number of pages.

What You Need to Know First:
There’s really not much background information to know with this one. Again, we’re going to use the PMMService to extract pages from our input document. It was engineered to make this exact process super-simple. You just pass the extract method a start PDFPage and the number of pages you want to extract and it creates a new PDF file for you automatically. The only interesting detail is that when extracting pages, if the number of pages you specify to extract including the start page exceeds the number of pages in the remainder of the document, the Java Toolkit will simply extract the remaining pages, it won’t throw an exception. This makes the process of splitting the document based on a fixed number of pages extremely easy.

The Process:
It’s really just a matter of figuring out how many files you want to split the source file into and then looping through the document and extracting the pages you need to create a new file. Because the Java Toolkit will do the right thing even when you are at the end of the document.

If the Gist runs correctly with the supplied 180 page input file you end up with 7 new 25 page files and an 8th file with just 5 pages. To get started with splitting PDF files, download this Gist and request an evaluation copy of The Datalogics PDF Java Toolkit.

Leave a Reply

Your email address will not be published.