Converting flat text to PDF with the PDF Library

Converting flat text to PDF with the PDF Library

Back to basics time!

I recently helped put together a project to convert legacy data to PDF and thought that this would be a great opportunity to review the Adobe PDF Library’s text creation features.

Legacy data can remain long after the system and software that generated the data have been retired to the dustbin of history. In this project, the legacy data was stored as flat text files mixed with PCL printer codes that were used to generate print reports in the distant past. Hundreds of thousands of print reports.  They needed to be converted to a more portable format that could then be reliably archived for another epoch and PDF was the natural choice.

The flat files contain some basic text

    Value before Deduction (1.000)    | 123,456,789 
    Deduction (1.000 )                | 123,456,789 
    Value after deduction percentage  | 
    ...

intermixed with some PCL escape sequences to control paper orientation, font size and form feeds. So, we did not need to build an entire PCL interpreter,  just intercept a few specific codes.

All that we needed was a straightforward loop to read through each line of the text file and create the corresponding text objects in PDF format.  Text creation in the PDF Library is easy – you create a basic graphic state, a Font to be used by the text and a matrix for positioning. Then, add the individual text runs to a Text object and finally add the Text object to the page content. You’re done! Content can be added to an empty page or to an existing page that already has content. Let’s take a look.

The graphic state holds graphic control parameters such as the color model:

For this project, only a monospaced font was needed:

The text run and the matrix. Here we move (translate) to the initial x and y position and scale to the selected font size:

After reading each line, create the new text run with the appropriate font, graphic state, text state and matrix. And add it to the Text object:

Changing the matrix between each line / text run:

When you hit a form feed, add everything to the page content:

That’s about it! Of course, we need a bit of code to parse the input data for the PCL escape sequences and clean things up. The full code can be downloaded here.

Next time, we can look at additional types of content. What sounds interesting to you – transparencies, graphics, watermarks? Let me know!

Leave a Reply

Your email address will not be published. Required fields are marked *