The PDF Java Toolkit includes the RichTextContentGenerator, a lightweight layout engine that accepts HTML formatted rich text as input, formats it, and draws it on the page handling word wrap and font styling automatically. This article discusses the use of the RichTextContentGenerator, for creating a form letter and covers some of the idiosyncrasies of the tool.
The RichTextContentGenerator provides the functionality for generating formatted PDF content from Rich Text strings. The functionality provided allows users to convert Rich Text strings (text strings with style information) into a PDF Content stream (a stream of PDF instructions). This conversion produces an output that looks the same as the input Rich Text string. Except for the numbered annotations, the letter looks like the example output file to the right. You can download the example output file and read the annotations in numeric order to learn how the page was constructed.
- A logo for the letterhead
- A date
- The text of a form letter.
- A “Confidential” Watermark
Each area demonstrates different capabilities of the RichTextContentGenerator.
The RichTextContentGenerator formats the text and creates an XObject and requires a set of options for how to format the text and options for how it will be used in the document. Once the XObject is created, we can use XObjectApplyOptions to apply the XObject to the page.
In this sample, I create a couple of rectangles to represent the company letterhead using the InstructionFactory and the techniques I discuss in my previous “Hello World Advanced” article. Creating these rectangles won’t be covered here other than these two important details.
- Any content that you need to use the InstructionFactory to create needs to be added to the page prior to any XObjects you place.
- Colors in the PDF Java Toolkit are specified in RGB values between 0 and 1. If you normally specify RGB values from 0 to 255, some quick and easy division will get you the number you want. Just divide the number you have by 255.
Creating Content Using the RichTextContentGenerator
There are four steps to creating an XObject using the RichTextContentGenerator and adding it to a PDF page.
Create the RCGOptions: The first thing you need to do is create an RCGOptions object which defines the options that should be used to produce PDF content from the rich text. The RCGOptions define the bounding box, the default style, padding, overflow mode, text wrapping and vertical alignment. In most cases you will need to define most or even all of these properties since the defaults were designed more for watermarks and other page backgrounds rather than text layout.
It’s important to note that the RichTextContentGenerator has no hyphenation or justification engine so text can only be aligned left, right or center and will not break across word syllables.
The default style can be set using a string that is formatted similarly to standard CSS style sheet codes. See “Rich Text Strings” in the PDF Reference. However, the same styles can be used inside tags to specify local overrides to the default formatting.
Also, the RCGOverflowMode setting is a bit misleading. “Auto” indicates that the text is to be laid out using the default style unless there are local overrides specified by inline styles. A setting of “ShrinkToFit” will automatically scale the content to fit the bounding box. ShrinkToFit is the default, so if you want strict control of the font size, you must set the RCGOverflowMode to “Auto”.
Create the XObjectUseOptions: The XObjectUseOptions define the attributes associated with the XObject and assign it to the foreground or background. These options also control whether it prints or is view-only and can define the XObject ContentType as a header, footer, watermark, background, bates number or general purpose XObject.
Applications such as Acrobat can be used to edit or remove all content types except “general”. For this reason, the “general” content type should be used for any content that you want to be persistent and not modifiable using the Acrobat “Edit Page Design” features. However, this type can be edited using the “Edit Text and Images” feature just like any other page. Securing the file can prevent editing of all content types.
Create the ByteArrayInputStream: The ByteArrayInputStream is created from the rich text string or an external ASCII file. Again, refer to “Rich Text Strings” in the PDF Reference for instructions on how to format these strings. Unicode characters and HTML entities need to be specified using the “\u” format. For example, use “\u00AE” rather than “®” for the registered trademark symbol.
Finally XObjectApplyOptions is used to both set the placement options and to add the XObject to the page. The coordinates in setPosition represent the bottom / left corner of the bounding box. See the code sample for more details on how to use this object.
That’s really all there is to it. One more thing to point out though, generally when XObjects are added to the page, they stack so that ones added later in the code may obscure ones added earlier. However, the word “Confidential” was added as a watermark with a ContentType of background making it appear behind the text of the letter even though it was added to the PDF page last.
The “CreateDocumentFromRichTextContent” sample assumes that the samples that come with the PDF Java Toolkit Version 2.0 have been installed so you’ll want to be sure to do that first if you haven’t already. Download this sample and place the RichTextContent in the samples src folder. The sample also uses Apache Commons IO so you’ll want to get a hold of that as well.