PDFs & Search
Many websites utilize PDFs as content, and with organic search becoming a significant channel for most businesses, it makes sense to understand the relationship between PDFs and search engines.
This post is for those who want to learn more about this relationship; how to leverage search to find PDF files, how to block sensitive PDF documents from being indexed, and how to optimize PDF files to perform and rank in search engines.
Does Google search PDF content?
Yes! Google can index PDF content as long as the PDF is not password protected or encrypted in any way. You’ll notice that some searches are extremely PDF heavy, try searching “us 1040 2019 schedule a” and see how many are PDF results.
How to search a PDF on Google?
If you know your Google-Fu, you might already know how to do this. Google search helps users narrow down their search results with search operators. Some common ones are inurl: or quotation marks, but the you want to use the filetype: operator to Google search PDF files only or other specific file types.
Paste the following search query into the Google search box for an example:
Check out the complete list of Google search operators here.
How to block Google from indexing PDF files?
So now you know that Google does index PDF content, and you know how to search Google for PDF files only. What if you don’t want your PDFs showing up in search? There’s a few ways to do this:
- Use the robots.txt file to block PDF files from search engine crawlers
- Place rel=”nofollow” on links pointing to your PDF files
- Use x-robots-tag: noindex in the HTTP header to prevent crawlers from indexing
- I advise not using the robots.txt and x-robots-tag method together, as the crawler will be blocked from viewing the x-robots-tag in the page header
How to optimize a PDF for web & SEO?
If you do want your PDF files to be indexed and perform in search, follow these best practice guidelines to make your PDF SEO friendly.
Choose an SEO friendly filename
Before saving your PDF file for web upload, consider what keywords best sum up the content of the PDF. The PDF filename will likely be shown in the search results pages, so make sure the filename is relevant and meaningful for potential visitors. Separate each word with a hyphen and maintain consistent capitalization.
Create an engaging title and description
Google will often use the title and description tags of a web page in the search results. This is also true for PDF files. These fields are accessible via the document properties within your PDF reader. The title field is equal to the meta title, and the subject field is the same as the meta description.
- An optimized title tag will contain one or two of your target keywords and be no longer than 60 characters
- An optimized meta description should be between 150 and 160 characters (more if targeting mobile) and contain engaging copy to convince a user to click through to your content
Use headings to structure the page and break up content
HTML web pages and PDFs can both utilize headings to help structure your content. Headings and subheadings help both users and search engines quickly read and understand what your content is about. This becomes extremely important if you’re targeting mobile users.
Try to follow heading best-practices:
- Use one H1 heading per PDF
- Multiple H2-H6s are fine, but try to keep a consistent hierarchy
- Use headings to succinctly describe the upcoming section
- Optimize headings by using a target keyword (if relevant)
Include relevant links in the PDF content
Try to link back to your own web properties from PDF content. Google will follow the links on your PDF, so it’s a great opportunity to transfer page authority to relevant content across your site. Don’t be afraid of linking out to external websites as well; it’ll help Google give context to your brand and whom you associate with.
Link to the PDF internally with good anchor text
The internal linking structure of a site is very important for SEO. Site structure outlines the way Google and users will interact with your site, and determines the importance of certain pages and folders.
Avoid orphaned PDF files by creating contextual internal links with descriptive anchor texts. Google views contextual internal links as a signal that your PDF file is important, and should be indexed and ranked.
Orphaned PDF files will not be indexed and will not rank for any keywords.
Avoid duplication errors by canonicalizing PDF documents
It’s a common problem to find that two or more of your web pages are competing for the same keywords. PDFs can also compete with your web pages, so it’s important to prevent content duplication by setting up canonical links.
A canonical link is a special tag that tells Google to view the linked canonical page as the authority. Google then treats any links pointing to the PDF as pointing to the canonical page.
This mostly becomes an issue if you create a PDF version of web content for users to easily download. When this happens, you must ensure that you set up the canonical URL in the HTTP header of the PDF file.
Don’t save PDF files as images, use plain text
Save your PDF content as plain text as opposed to images to help reduce the amount of work Google has to do to understand your PDF. While Google can read and index image-based PDFs, it doesn’t make sense to make it an arduous task. Plain text means Googlebot can read your content quickly and without difficulty.
Optimize images that live on the PDF
Make sure the images that do continue to live within the PDF document aren’t too large. Google sees load speed as an important ranking factor, so it helps to ensure your images are compressed (within reason).
Image alt-text is another useful tool in your SEO arsenal. Use alt-text tags of to describe the contents of your image, and try to use any relevant keywords. Alt-text will help images to rank in Google Image Search.
Optimize PDF for mobile viewing
Mobile traffic accounts for more than 55% of all organic search traffic. Optimizing for mobile is no longer an option, it’s pretty much necessary. There’s no such thing as a responsive PDF, but the next best thing you can do is left align your files. This will help readers consume your content, as they won’t have to scroll horizontally as much.
Compress file size to reduce download speed
This kind of goes hand in hand with image compressing, but a smaller file means a shorter download speed. Just be aware and ensure that you’ll reduce file size if the PDF is larger than 5MB and you want it to rank.
Use our PDF Optimizer tool to make this easier.
Optimize PDF loading with Fast Web View
Enable “Fast Web View” to optimize the PDF file’s load order. This option loads the first page ASAP instead of waiting for the whole document to download before displaying. Optimize for Fast Web if users will be viewing one page at a time online instead of downloading directly to local storage.
Check out another Datalogics blog post about Fast Web View (or Linearized PDFs).
Start tracking PDF performance in GA
This one isn’t really a PDF optimization tip, but it falls in the same arena. After you’ve done all that work to optimize your PDFs, you should make sure you have the correct tracking implemented.
Set up your analytics to track PDF downloads so you can track engagement of your users. Take it a step further and look into what other marketing channels were involved, or what pages they went through before downloading your PDF.
Optimizing PDFs for search engines seems like a daunting task, but if you implement these best practices in your workflow before you click save, you should see improvements in a matter of months.
Let me know if this was helpful, and feel free to ask questions in the comments.