APDFL and DLE For low-level PDF work

APDFL and DLE For low-level PDF work

Note: This is the 1st in a series of four articles exploring low-level PDF manipulation using APDFL and DLE

When implementing low-level PDF work using APDFL, we are essentially talking about using the API subset called the Cos Layer.  The Cos Layer functions manipulate objects which correspond to the basic PDF object types as specified in section 3.2 of the PDF v1.7 Reference (or section 7.3 of the PDF 32000-1:2008 specification).   DLE provides an object-oriented interface to the Cos Layer, but uses the PDF prefix instead of Cos.

In general, you need to use Cos-level functions when you want to implement functionality discussed in the PDF spec that is not covered by specific API calls in APDFL.  If you are considering making Cos-level modifications to PDFs using APDFL, you might want to first prototype using DLE.  So let’s discuss how these relate to each other and some gotchas to Keep In Mind.

Basic PDF object Types Cos Equivalents DLE object equivalents
Boolean values CosBoolean PDFBoolean
Integer and real numbers CosInteger,CosFixed,CosReal PDFInteger,PDFReal
Strings CosString PDFString
Names CosName PDFName
Arrays CosArray PDFArray
Dictionaries CosDict PDFDict
Streams CosStream PDFStream
The null object CosNull

Boolean values:

The simplest PDF Object – PDFBoolean – nonetheless comes with a good number of methods inherited from PDFObject. What is unique to PDFBoolean is its bool property called Value, and its constructors, which correspond to CosBooleanValue() and CosNewBoolean(), respectively. The rest of the methods roughly correspond to CosObj functions which apply to all Cos types.

Integer and real numbers:

PDFInteger corresponds to CosInteger, much like PDFBoolean to CosBoolean. But, in addition to the CosIntegerValue and CosNewInteger() functions, there are also CosIntegerValue64 and CosNewInteger64 if a 32bit int is not sufficient for you.

PDFReal likewise corresponds to CosReal, and from there to a number of different Cos functions, but CosDoubleValue() and CosNewDouble()  (or perhaps CosNewDoubleEx, if you need to specify significant digits) are the ones you will want to use.

Avoid using CosFixed as it has 16bit limitations and is partly deprecated.

Strings:

PDF Strings are actually a bit more complicated than what is discussed in section 3.2.3 of the PDF Reference; you also need to take into account the string types described in section 3.8.1. Using APDFL, you might find some of the ASText functions helpful (e.g. ASTextFromSizedUnicode() and ASTextGetUnicodeCopy()).  Using DLE, that logic is already in place underneath the hood.

Names:

PDF Names represent tokens.  The biggest pitfall is that Names are case sensitive.  Otherwise, while you can use CosNameFromString() in order to get a string from  CosNameValue()’s return value, you will have to pass that ASAtom to ASAtomGetString().

Arrays:

A PDF array is a one-dimensional collection of any and all types of PDF objects, including nested PDF Arrays. As such, it is relatively straightforward to use both in DLE and using the CosArray functions.

Dictionaries:

PDF Dictionaries are the heart of the PDF format. It is an associative table of key and value pairs, with the keys being PDFNames, but values being any PDF object.  There are a number of CosDict functions with KeyString() suffixes; these are helper functions which eliminate the need to create a separate PDF Name for each dictionary lookup.  In DLE, the same effect is achieved using method overloading.

Streams:

A Stream is basically a block of raw data with a dictionary associated with it. It may be compressed or encrypted.

We’ll continue with the discussion of low-level PDF work in an upcoming article.

Leave a Reply

Your email address will not be published. Required fields are marked *