Datalogics is proud to be a sponsor of the PDF Association‘s PDF Day events in December: on December 10th in scenic Washington, DC and on December 11th in exciting New York City. I’ll be giving an introduction to content extraction from PDF files in both locations and invite all who are interested to register for these day-long, education-filled events. To share a brief overview of what I’ll be discussing about content extraction:
- In the beginning there were image files: Images are easy for humans to interpret but do not contain extractable information
- Information must be guessed or approximated, for example with OCR
- So a better way was created: PDF
- PDF is not just a flat image format – it’s a container for a variety of content that can be extracted and re-purposed, and information about the content
- What can a PDF contain? A discussion on many things
- Demonstration: extracting various types of content with Adobe Acrobat
- Common pitfalls: PDFs are only as good as they were created…
- As with any document format, what we see isn’t always what’s inside – some examples of how what’s visible can be different from what’s actually there
Unlike image files such as TIFFs, PDF is not just an archival or final format. PDFs contain actual content that can be extracted and used to drive business processes. Come learn more about how to integrate PDF as an intelligent document format in your workflows.
With two different information-packed events going on – DC’s schedule focusing on government and enterprise IT organizations, and NYC focusing on finance and legal applications – the PDF Association has put together a great event for many different audiences. Register soon for the PDF Day on December 10th in Washington, DC and on December 11th in New York City. We look forward to seeing you there!