Extracting Images using the Datalogics PDF Java Toolkit

Extracting Images using the Datalogics PDF Java Toolkit

Sample of the Week:

Joel GeraciExtracting images from PDF files was once thought impossible. As a matter of fact, there was a time when PDF was considered the “Roach Motel of file formats;” information went in but it never came out. That was never actually true… but the phrase was so pithy that PDF’s reputation as being static and locked caught on. But as I said, nothing could be further from the truth. There are many tools available that can extract text, convert PDF to other formats like .DOCX or .SVG and PDF can be placed into other layout applications like InDesign. This article will focus on images.

http://www.dreamstime.com/royalty-free-stock-photography-rusty-old-motel-sign-tree-image28601127

If you need to convert an entire PDF document to a set of images, Datalogics provides the PDF to Image product to convert PDF pages to a variety of image formats at extremely high resolutions by seamlessly rendering fonts, vectors and images to bitmaps… all using the Adobe Color Engine to ensure that the output file looks exactly the same as what is it does in Adobe Acrobat.

But if you just want to be able to reuse the images that are embedded in the page, the Datalogics PDF Java Toolkit provides two samples that show you how to extract just the images from a PDF file.

The ImageExtractionSample and the TransformedImageExtractionSample samples show you how to extract images from the PDF to a series of graphics files. The samples use the ImageManager class which facilitates converting images between standard Java representations and those used inside PDF.

PDFXObjectImageWithLocationMap pdfXObjectMap = ImageManager.getPDFXObjectMap(doc);
Iterator<ASName> keyIterator = pdfXObjectMap.keySet().iterator();
while (keyIterator.hasNext()) {
	ASName key = (keyIterator.next());
	PDFXObjectImageWithLocation imageWithLocation = (pdfXObjectMap.get(key));
	...
	...
	...

The ImageExtractionSample exports images from the PDF exactly as they are stored in the file regardless of what other PDF operators are affecting the appearance of that image in the PDF. So images that are scaled or rotated are returned to their original state when exported.

The TransformedImageExtractionSample sample exports PDF images that exactly match the size and orientation as those on the PDF page. This means that large images that were reduced, cropped, and rotated when the PDF file was created will be scaled, cropped, and rotated in the exported image, exactly matching what was on the PDF page.

The Datalogics PDF Java Toolkit also provides a RasterizationSample that can rasterize the entire page that is worth considering when you have simple documents that are not color managed and you need a pure Java solution.

The Datalogics PDF Java Toolkit provides multiple avenues to extract images, text, form data, XMP metadata, and just about anything else from PDF files and then reuse those objects in other workflows. View and download the TransformedImageExtractionSample sample or get all the samples and documentation by requesting an evaluation of the Datalogics PDF Java Toolkit.

Leave a Reply

Your email address will not be published. Required fields are marked *