Often during the web migration project we felt like we were drowning in a sea of PDFs, especially while working on heavy sites. The team has made a huge effort to drastically reduce the number of PDFs we have on the University website and, where possible, convert them into HTML pages.
People sometimes ask us why we’re doing this, and many assume it would be easier to upload a PDF than to create a web page. In the short term it will often be easier and faster to do this, but unfortunately quick and easy does not equate to a better user experience, help with SEO, or assistance with content management and tracking in the long run.
Of course, we still have plenty of PDFs on our site – 1,020 at last count! We’re working to reduce this number so that only those best suited to PDF format are published this way (for reasons I’ll explain later in the blog post).
Why do we prefer html pages to PDFs?
Firstly, PDFs are mostly optimized for paper, usually A4 in the UK. That’s fine if it’s a document you expect your users to print, but in an era where we’re actively discouraging printing and moving towards a paperless office and carbon-neutral campus, it’s not the ideal. A quick straw poll of our office also suggests that far fewer people have printers at home even these days.
Most of us working for University will be working on a computer and be used to all the display space on the laptop or desktop, but the experience is very different on a device with a smaller screen. If we publish as HTML, the page is designed to be responsive and is optimized for a scrollable screen: for example, the text size automatically adjusts for a comfortable reading experience and the text reflows to fit the available space. If you’ve ever tried to read a PDF on a mobile device, you know that it can be tricky to zoom, scan, and scroll as you need to move the page in all directions to read the content.
Another consideration is that if you’re reading a PDF on a mobile device, you’ll need to use a separate app. This results in a worse user experience as it takes you away from the website, losing both navigation and context.
Once zoomed in, you can see that the text no longer fits the screen and you need to scroll both horizontally and vertically to read it. This makes for an extremely awkward and uncomfortable viewing experience as you are constantly moving text back and forth to read the lines.
Compare to a document that has been converted to html, where the text has been scaled to fit the screen:
Text looks much clearer, and it’s easier to scroll and locate important information. It’s easy to see at a glance that this provides a far superior user experience for those accessing our content from a mobile device.
When converting PDFs to HTML we can also take the opportunity to improve the layout of the content: simple things like bulleted lists, adding scannable headers and sometimes changing the text to align with the University’s style guide, ensuring consistency in the whole website.
Additionally, HTML pages use on-page headers to form contextual navigation: the in-page links you see at the side of the page on a computer or at the top of the page on a mobile device. Document navigation on a PDF is less straightforward since, while you can set an index, it doesn’t follow you through the page in the same way.
Since it’s much easier to use the HTML version, we’re likely to get more user engagement with the content. Research by the Nielsen Group showed that task completion is higher when using HTML.
Under the Public sector bodies (websites and mobile applications) Accessibility Regulations 2018 we have a legal obligation to make our content as accessible as possible. If we fail to provide accessible content, we may be in breach of the Equality Act 2010.
While you can certainly make PDFs accessible, it can be a little tricky if you’ve never had any training or instruction. There’s actually a lot more work for the person creating the PDF as you’d need to make sure you’ve properly included navigational aids such as bookmarks, tags to provide a logical reading order, and accessible form fields (among other things). Tables must also be labeled correctly. Simply converting a word document to pdf is often not enough.
Most PDFs sent to us have not been made accessible, and if this has not been done in the source document, making changes can be time consuming and may require access to the original document. We have even been provided with PDFs where the entire text has somehow been rendered as an image – obviously this is something that would be completely inaccessible to a screen reader!
Using HTML, however, is much easier. The styles and layout we use on the website have been created with both accessibility and usability in mind. Producing an HTML document will also give the end user more control over how they consume the information as they can change the look of the page in their browser window, for example they can change the background colour, font and text size. text.
PDFs can be large, in some cases extremely large. Not everyone who accesses your content will have a device with plenty of storage space or the data to spare on large files. If you don’t have a strong wifi signal, the HTML page that uses less bandwidth will be preferable.
We’ve also found that duplication issues can arise more easily with PDFs. Several people have uploaded the same content with slightly different filenames (we recently found seven instances of one particular PDF!) We find this easier to track when we’re using HTML pages and the CMS handles the URL structure for us, as it notifies us of duplication.
It’s also hard to see how a content in PDF format behaves. While we can track downloads, we can’t see how long someone spends reading content or how far down they scroll, for example. Compared to a web page where we can generate a heatmap and track metrics like time on page and where they click to go next, this is a big disadvantage.
Links in PDFs
Links in PDFs can also be problematic. With the number of PDFs on our site, there will always be external links that change and end up out of date and therefore broken. It’s much easier for our support team to refresh an HTML page when they find a broken link – you can’t easily edit PDFs and often need to go back to the original source (which we don’t keep) to make changes and then regenerate the PDF.
It’s also worth considering that if your PDF has a number of links, it might actually work better as a web page – again, this is particularly useful on a mobile where you have to open a separate application to read a PDF document and you don’t want to keep switching from browser to app.
Search Engine Optimization
Historically, it was thought that Google found it easier to index content presented as HTML. This is not entirely true as PDFs can be searched by Google and will also show up in search results. However, unless they are properly optimized, they may not rank as highly as a comparable web page. Links within PDFs don’t tend to have the same value for SEO as links on a web page, and Google also tends to prioritize mobile content.
If someone finds one of your PDFs on Google, they can download it without having to log in to the website. This means there is a risk of reading it out of context, without navigating to the other relevant parts of your site. Maybe the information provided in the PDF isn’t quite the answer they needed, but had they landed on part of your site they might have found something else that was related and answered their questions, or seen other links of interest and learned something new.
Old PDFs can also remain in search results even when the pages linking to them have disappeared – this can lead to outdated and misleading information.
Acceptable reasons for PDFs
Despite these compelling arguments for making your content HTML, there are still a few cases where PDFs are acceptable:
- Downloadable forms to print and fill out offline (although you might want to think about whether an online form would be faster and easier for users)
- Detailed multi-page legal documents and reports
- Flyers or brochures to print and use offline – like the University prospectus
- Where there is a legal obligation to publish a formal and signed document, such as the University statute
- Material that people are likely to print and annotate themselves
If you’re uploading a PDF for any of these reasons, you need to make sure you comply with our’Publication of PDF on the University website‘ guidelines.
I won’t repeat it here, but in short, you need to make your documents fully accessible and we will add them to a company information page to put the document in context and provide a summary. We never link directly to a PDF within the main body of a page, but in a separate download section at the bottom. Adding PDFs in this way prevents them from getting lost in the site and makes it easier to check them: we can see at a glance when they were last updated. Providing the summary also helps in search visibility.
It is clear that there are many reasons to ditch PDFs online. We did this gradually as part of our migration, but for practical reasons we had to leave some of the longer documents in PDF format. In due course we hope to correct this and publish more documents in HTML format so that our content is as accessible as possible.