Converting (Tiny) SVG to PDF

Converting (Tiny) SVG to PDF

At our recent PDF Connect event, one of the attendees suggested that I blog about converting SVG to PDF. However, we had a summer intern group that were potentially doing the same thing as part of their project, and I didn’t want to step on their toes. So I waited until they finished up. Though I did suggest to the interns that it might be fairly straightforward to convert SVG Tiny to PDF using our tools, sufficient for their purposes. That wasn’t quite hubris, but it came fairly close.

For my own project, I’d come across This super Tiny Icons GitHub repository that seemed ideal as input data. The first file I chose as input was straightforward with only 5 line segments.

bluetooth logo

So straightforward that I didn’t even notice that it was upside down:

hackernews logo, upside down.

Which I did straighten out going forward:

stackoverflow logo

And then I got to the good stuff:

gitlab logosoundcloud logoGithub logointel logo

So let’s look at this fairly straightforward code.

The Code

Let’s start in Main:

Here, we are using the content returned by ParseXmlToContent to create a Form XObject. But, an SVG has its origin in the top left corner and increases going down, like an image, instead of having its origin in the bottom left corner and increasing going up, like, oh, the Cartesian coordinate system. Which is why the Hacker News (and Bluetooth) Icon(s) was initially upside down.

In order to correct this, I modified the Form XObject’s Matrix entry to use essentially the same vertical-mirroring as used in our DrawToBitmap sample. This way, I can modify the Matrix for the Form in the page’s content stream without having to further worry about vertical-mirroring. I just center the Form on the page. The frm.Matrix will be written into the content stream that it will be added to. But, the Matrix entry in the Form’s stream dictionary will be part of the XObject. It will be automatically appended to any current transformation matrix if the Form XObject is re-used in multiple content streams.

Moving on to ParseXmlToContent:

Here, we iterate over the SVG elements. There are a number of elements that I’m not handling yet. Of the elements I am handling, the most interesting one is the ‘g’ element. The ‘g’ element is handled by recursively calling ParseXmlToContent, and we add the content returned to a Group element which can be added to the current content object. Because of this recursion, and because graphic state properties can be inherited, I am keeping track of parent nodes and iterating from root to current node to get all of the graphic state properties:

Again, there are some holes in coverage where I’m not handling everything in the spec. But, what I am handling is a start. Ditto for parsing color:

The superTiny Icons favors the more succinct methods for specifying colors. So, I only implemented the 3 hex number method and the 6 hex number method.

The 3 hex number version is a variant of the more common 6 hex number where each value, say 0xn is a stand-in for 0xnn, or n * 0x11, or n *17 in decimal.

For the sake of expediency, I’m going to skip over ParseRect and ParseCircle as those are conceptually variants of ParsePath, which is the true hart of this sample app.

Path Commands

The first part takes care of figuring out what the PaintOp flags should be, and for parsing the ‘d’ attribute, we are going to use 3 regular expressions.

The most horribly complicated of these is for recognizing numbers (and it was only half as complicated until I came across the Airbnb logo which featured coordinate numbers lacking a leading zero before the decimal point). Next is the regular expression for recognizing commands. Note that lower case commands indicate that the coordinates are going to be relative to a current point, while with upper-case commands, the coordinates are absolute. And finally we have a regular expression for recognizing any separation between coordinate numbers.

The closePath command essentially closes a loop, so it resets the current point to the start point, and invalidates the previous control point that may have been used by a curve command.

The MoveTo command is a bit of an odd duck; the initial set of coordinates are exactly what you would expect. But, additional coordinate pairs are handled as if there was an implicit LineTo command in between. (So says the spec, which I guess makes it explicit). So, we’ll skip over the LineTo command (because) to consider its variants:

The horizontal and vertical line commands are mirrors of each other. In theory, I should be handling any additional coordinate-pair halves that might be specified, just in case. But let’s move on to the Curve commands variants. The CurveTo command itself isn’t particularly interesting so we’ll skip it.

Curve Commands

I am grateful to Gabi on Codepen for his blog post explaining how to translate Smooth Cubic Bézier curves to Cubic Bézier. Basically, instead of having two control points specified for the curve, there is an implicit first control point. This implicit control point is the previous control point reflected to the other side of the current Point. If there was no curve preceding the command, then the current Point is used as the first control point.

Gabi also had another blog post explaining how to translate a Quadratic Bézier to a Cubic Bézier curve which likewise has one fewer control points specified than a Cubic Bézier curve. There is also the potential for a smooth Quadratic Bézier curve, but I haven’t implemented that yet.

 

In Conclusion

While I got pretty far, I didn’t add all the features needed to convert all of the icons to PDF. Notably, I haven’t yet implemented support for gradients or clipping. But the most important feature I still need to implement (arcs, and unlike gradients, not actually an SVGT feature), I will need to cover in Part 2, because it is the great white whale to my captain Ahab. Not this one:

docker logo

but this one:

Acrobat PDF logo.

Any questions you have regarding the code or this article, please comment below or contact us.

Leave a Reply

Your email address will not be published. Required fields are marked *