Wexflow: Automating Datalogics PDF Tools

Wexflow: Automating Datalogics PDF Tools

There are many different workflow creation and automation tools available for download or purchase. Some are fairly simple, others quite complex. Likewise, some are open-source and freely available, while others are offered as a part of other commercial automation offerings. I’ve recently been able to work a bit with the Wexflow open-source workflow engine. Wexflow is easy to get installed and get started with, and comes with a number of functions built-in. In this article, we’ll take you through the steps involved in installing Wexflow and setting up a simple service to optimize PDF files for mobile viewing using Wexflow to drive the Datalogics PDF OPTIMIZER scriptable server tool.

Installation

Wexflow is a .NET based workflow automation engine that runs on Windows, and comes with server management GUIs for Mac OS, Linux and Android – but in this article we’ll focus on Windows.

You can download the latest source from the main Wexflow github page, or you can download an installer from the releases page. The Windows installer – as of version 2.7 – may be installed via the instructions at https://github.com/aelassas/Wexflow/wiki/Installation. However, the installer is not signed by the author, and may be flagged by Windows Defender or your anti-virus software as potentially dangerous because of the unknown publisher. You may prefer to build from sources and install from those instead.

Datalogics PDF OPTIMIZER is a scriptable server tool designed for optimizing PDF files. For example, PDF OPTIMIZER can shrink PDF files for faster loading, improve PDF file compatibility across different viewers, and optimize PDF files for archiving. You may obtain an evaluation copy from the Datalogics website. We recommend you install into the default installation location, though this is not required.

Adding a Workflow

Wexflow has two main components: Wexflow Manager and the Wexflow Web Designer. The Wexflow Manager has a Windows app interface as well as a web interface. The Usage page has information about how to use each of these, though we’ll be adding a workflow directly from an XML file rather than the Web Designer in this article.

There are two key directories that Wexflow installs, that we’ll be using for adding a PDF optimization workflow:

  • C:\Wexflow\Workflows contain the workflow definition files, one XML file per workflow definition
  • C:\WexflowTesting contains the input and output folders that the workflows reference

Let’s go ahead and make three folders: one that we’ll monitor for incoming PDF files to process, a second to move files to for processing, and a third to hold optimized output PDFs. In this example, we’ll create

  • C:\WexflowTesting\MobileOptimizePDF – this is the folder we’ll have Wexflow watch for incoming files
  • C:\WexflowTesting\PDFTemp – this is the location incoming files are moved to for processing
  • C:\WexflowTesting\OptimizedPDFOutput – this is the folder that optimized PDFs made by PDF OPTIMIZER are moved to

Why move inputs to an intermediate location for processing? Optimizing large or complex PDF files can sometimes take more time than the polling interval on the watched folder. We don’t want our workflow to process input files more than once, which it may do if it sees the input in the watched folder in more than one polling interval. To prevent this, we first move the file to be optimized out of the watched folder and then process the file.

Now we’ll add the workflow. In C:\Wexflow\Workflows create a file named Workflow_MobileOptimizePDF.xml with the following contents:

<Workflow xmlns="urn:wexflow-schema" id="100" name="Workflow_MobileOptimizePDF" description="Workflow_MobileOptimizePDF">

    <Settings>
        <Setting name="launchType" value="periodic" />
        <Setting name="period" value="00.00:00:05.00" />
        <Setting name="enabled" value="true" />
    </Settings>

    <Tasks>
        <Task id="1" name="FilesLoader" description="Loading PDF file" enabled="true">
            <Setting name="folder" value="C:\WexflowTesting\MobileOptimizePDF" />
        </Task>

        <Task id="2" name="FilesMover" description="Moving PDF file" enabled="true">
            <Setting name="selectFiles" value="1" />
            <Setting name="destFolder" value="C:\WexflowTesting\PDFTemp\" />
        </Task>
            
        <Task id="3" name="ProcessLauncher" description="PDF OPTIMIZER launch" enabled="true">
            <Setting name="selectFiles" value="2" />
            <!-- Install PDF2PRINT and add executable location here -->
            <Setting name="processPath" value="C:\Program Files\Datalogics\PDFOptimizer\pdfoptimizer.exe" />
            <!-- variables: {$filePath},{$fileName} -->
            <Setting name="processCmd" value="{$filePath} {$output:$fileNameWithoutExtension.opt.pdf} mobile.json" />
            <Setting name="hideGui" value="false" />
            <Setting name="generatesFiles" value="true" /> 
        </Task>
        
        <Task id="4" name="FilesMover" description="Move PDF output files from temp folder" enabled="true">
            <Setting name="selectFiles" value="3" />
            <Setting name="destFolder" value="C:\WexflowTesting\OptimizedPDFOutput\" />
        </Task>

        <Task id="5" name="FilesRemover" description="Remove input PDF files from temp folder" enabled="true">
            <Setting name="selectFiles" value="2" />
        </Task>
    </Tasks>
</Workflow>

This workflow contains settings and five sequential tasks.

Setting Up Automated Watch

The Settings section controls how Wexflow launches the workflow action. Here, the launchType value of periodic indicates that we want Wexflow to run this workflow periodically, instead of requiring manual user execution of the workflow. The period value controls how often the workflow is run. In this example, the value for period corresponds to every 5 seconds. Finally, the enabled parameter value of true indicates that this workflow should be run by Wexflow.

The first task in the workflow is a file loading task. This task signals to Wexflow that it should load each file that it finds in the folder “C:\WexflowTesting\MobileOptimizePDF” into Wexflow for processing.

The combination of the periodic launch type and the file load task is how Wexflow implements its version of watched folders.

Moving Input and Optimizing Files

The second task is a file moving task. This task will take each of the files that are loaded in the previous task, and move the file into the “C:\WexflowTesting\PDFTemp” folder for processing. The file moving task references the files from the previous step with the selectFiles setting and the value of 1: this value corresponds to the id of the task whose outputs we want to use as inputs for this task. Note the id value of 1 for the earlier file load task, and note the id value of 2 for the current file move task; we’ll come back to this later on.

Files are moved to this directory so that the file loading task does not repeat processing when it runs again next.

The third task is the most complicated part of the workflow – the running of Datalogics PDF OPTIMIZER. There are several important pieces for this task:

  • The ProcessLauncher task name indicates that this task launches an external program, rather than using any of the actions built into Wexflow
  • The selectFiles value of 2 references the output files from the previous file move task that we gave the id value of 2
  • The processPath value states the actual program to launch. This should be an absolute path to an executable program.
  • The processCmd value gives the command line parameters for launching the program. This value includes variable references.
    • $filePath will be replaced with the full name and path of the input file being processed.
    • $output:$fileNameWithoutExtension will be replaced with the full name and path of the input file being processed, with the file extension trimmed off. This is also used to store the names of output files so that these can be passed as input files to later tasks.
  • The generatesFiles value of true indicates that the program launched will generate output files.
  • The hideGui setting is passed to the Windows system call that handles process launching. For our example, the value doesn’t matter.

Cleaning Up

Tasks 4 and 5 are cleanup tasks. The fourth task moves each of the output files generated by PDF OPTIMIZER into the output folder, “C:\WexflowTesting\OptimizedPDFOutput”. The fifth and final tasks remove the original input PDF files from the temporary folder that this workflow copied them into. The file removal task itself only takes one parameter, selectFiles, to specify the files to remove. Note the reference back to the second task with the value of 2 for the selectFiles parameter.

Workflow Automation with Datalogics

This article just scratches the surface of what one can do with the Wexflow automation framework. There are many built-in tasks to take in files from various sources, transform files in various ways, and to allow conditional processing logic. Likewise, Datalogics PDF tools and APIs bring a wide variety of robust, professional grade PDF processing capabilities to your organization or programs.

We hope we’ve inspired you to think about how you can add PDF processing automation to your business workflows. Automating PDF processing with Datalogics PDF tools can enable greater PDF compatibility, faster downloads, and lights-out high volume PDF transformations – saving you and your company hassle, time, and ultimately enabling greater productivity.

Leave a Reply

Your email address will not be published. Required fields are marked *