Redaction with Overlay Text using Datalogics PDF Java Toolkit

Redaction with Overlay Text using Datalogics PDF Java Toolkit

When publishing documents online, you have to operate under the assumption that someone, somewhere, has made a copy of it and that it will exist forever. Because of that, we need to take extra care to remove sensitive data from those files before they go to the Internet universe. This is why redacting (the removal of the sensitive content) is such an important feature. In a PDF file, you are creating redaction annotations that will eventually be applied to the document and take the place of the content that was redacted. When content is redacted, you can specify what should be put in the space left by removing the sensitive content – this is typically a black rectangle. In some cases, though, you may be required to provide additional information in the document to indicate what type of content was removed, or why the content was removed.

Redaction annotations have a unique property designed to handle those instances where you need to specify what type of content was removed, or why the content was removed, and it is called Overlay Text. By setting the Overlay Text on a redaction annotation, you are saying “this text should be displayed on top of the redaction annotation” which will enable a human to read that text, understand what type of content was removed, and why, without knowing the actual content. If you are redacting content in PDFs that will be published due to a Freedom of Information Act request, Overlay Text is extremely important to you. The Freedom of Information Act requires that content that is redacted is replaced by one of the defined redaction codes to indicate what was redacted. We recently updated the RedactAndSanitize sample for our PDF Java Toolkit to demonstrate setting the Overlay Text property so that those who need to comply with the Freedom of Information Act can do so with ease! Let’s take a look at the updates to the sample so you know how to update your application to specify overlay text.

Redaction with Datalogics PDF Java Toolkit

Specifying the text to be used as Overlay Text is as straightforward as it gets; here is the online that does it assuming you already have a PDFAnnotationRedaction object to work with.

        annot.setOverlayText("Redacted");

There is one issue with this, though. Earlier we said the most common item used to replace redacted content is a black rectangle, and if you only specify the text that should be used as Overlay Text, you will end up with black text on a black rectangle! This is not what you would expect and you would not be able to see or read the Overlay Text, so you still would not be meeting the requirements of the Freedom of Information Act. To change the appearance of the Overlay Text, we just need to work with a few more objects so that our PDFAnnotationRedaction object does not use black text on a black rectangle. Let’s start by setting up the required resources to use Helvetica as the font for the Overlay Text

        final PDFResources resources = document.requireCatalog().procureInteractiveForm().procureResources();
        final PDFContents contents = PDFContents.newInstance(document);
        final ModifiableContent content = ModifiableContent.newInstance(contents, resources);
        final PDFFontSimple font = PDFFontSimple.newInstance(document, ASName.k_Helvetica, ASName.k_Type1);
        final ASName fontName = content.addResource(font);

And then creating a new PDFDefaultAppearance using our resources

        // Helvetica 8pt, color green (3 values == RGB, 4 == CMYK, 1 == grayscale)
        final PDFDefaultAppearance pdfDefaultAppearance = PDFDefaultAppearance.newInstance(document, fontName, 8.0,
                                                                                           new double[] { 0, 1, 0 });

Now with our resources constructed and our PDFDefaultAppearance setup to use Helvetica with a point size of 8 and the text color set to Green, we just need to set the appearance on our PDFAnnotationRedaction object using the PDFDefaultAppearance object we just constructed

        annot.setDictionaryValue(ASName.k_DA, pdfDefaultAppearance);

With those changes made, when content is redacted, the Overlay Text will be visible as it will be green text on a black rectangle.

In the screenshot, you might notice that the text specified as Overlay Text is not fully displayed (“Redact” instead of “Redacted”). Since we are replacing existing content with a new piece of content, there may be a discrepancy between the sizes of the new and old content, this is why the Freedom of Information Act uses redaction codes (like “(b)(1)(a)”) instead of the words that correspond to what was redacted.

If you are currently redacting documents, or you have a need in the future to redact documents, try out Datalogics PDF Java Toolkit to automate this process.

Leave a Reply

Your email address will not be published. Required fields are marked *