PDF Java Toolkit: Solving the Case of the Missing Filters

PDF Java Toolkit: Solving the Case of the Missing Filters

Years ago, when Datalogics first acquired from Adobe what would become Datalogics PDF Java Toolkit (PDFJT), it was missing a couple of parts. Actually, possibly more than two, but these two have remained missing as the use-cases we imagined for PDFJT didn’t really depend on them. The two missing parts I’m referring to are the filters for JPEG2000 and JBIG2 decompression.

The scuttlebutt I heard from a former Adobe person was that the reason they were missing was because Adobe had been using a JNI-bridge for its C++-based JPEG2000 compression library, while JBIG2 compression was handled by a Java-based package but from a 3rd party which had, uhm, perhaps over-leveraged its relationship with Adobe.

Many years later, while we haven’t actually replaced these two missing parts, we have made improvements in other areas of PDFJT for customers that pushed the technology beyond our original conception of what the use-cases would be. And the lack of these two filters is more problematic for some of these expanded use-cases.

Fortunately, the tech world has evolved a bit since PDFJT’s conception. Thanks to Linus Torvalds, we have Git. From Git we have GitHub (now part of the Borg, I mean Microsoft). From GitHub, we have a multitude of forks of JJ2000 to choose from. I went with Mike Bremford’s fork for my little experiment.

Implementation

Now, possibly because of PDFJT’s history with JPEG2000 and JBIG2, it actually has hooks for third party filters.

This JPXCustomDecodeFilter object that we are instantiating and passing to PDFOpenOptions is fairly lightweight. It basically implements the CustomDecodeFilter interface.

Of course, that also means providing a decode method. But that decode method is actually a tiny bit simpler than the code example provided in the README.

Basically, PDFJT provides access to the compressed data via an InputStream, and the JPXIO class makes it accessible to the J2KReader via the J2KFile. The J2KReader decompresses it, and we stuff the decompressed data back into an InputStream for the next chain down the link.

The JPXIO class implements the jj2000.j2k.io.RandomAccessIO interface. It does so with a constructor that reads the InputStream into a byte Array.

And the rest is mostly boilerplate implementation of standard I/O code:

A lot of the code I’m skipping over actually came from the AbstractRandomAccessIO class. Looking back, perhaps JPXIO should have extended this class. It would have made it even simpler.

Of course, once I got all of this scaffolding into place, it turns out there is a small bug in the J2KReader that needs to be addressed.

Basically, if the J2KReader’s scale variable is left uninitialized, it defaults to 0. This means that fullscale and scale start off with the same value. Which means that this initialization code is skipped:

But the solution is trivial:

Recompile the jj2000.jar with ant, drop into the right location so it can be picked up, and go.

In conclusion

I tested all of this with JPeg2000-encoded versions of our beloved Ducky.pdf, which I created using a modified version of Adobe PDF Library’s RenderPage sample app:

Which is hardly exhaustive testing, but the RGB, GrayScale, and CMYK versions of JPX-encoded ducky all printed from my modified version of PrintPdf as expected.

A JPEG2000 encoded, grayscale picture of a rubber ducky.
A JPEG2000 encoded, grayscale version of Ducky.pdf

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *