PDF vs HTML
Choosing between HTML and PDF on the web.
Web sites often provide information in PDF when it is inappropriate.
Here are some of the difficulties that causes. Good,
standards-compliant HTML is almost always better for use on the web.
First, though, I'll note that PDF does have some benefits:
-
PDF is good when the file is destined for printing, and the precise
printed page layout is important, or when there are images which
should be rendered at high resolution on the printed page.
-
PDF is useful for taking fancy newsletters which are designed for the
printed page, and making them available online without much hassle.
-
PDF format is currently useful for math and special symbols, until
MathML and better internationalization is widely supported for HTML.
Beyond those few cases, HTML is generally much better for providing
information via the web.
-
The aspect ratio (which is usually not 4:3) and size of the pages in
a PDF document is a poor match for a computer screen. So reading PDF
on-line is generally a worse experience than reading HTML. With good
HTML, the user can choose a font size and a window width so the text is
easy to read, and the paragraphs are laid out to match the user's
preference. But with PDF the user has no choice of line length. At
a nice font size the page often doesn't fit on the screen. Reading
2-column output requires either a small font or lots of up-and-down
scrolling. Headers and footers and page breaks get in the way.
-
Your audience is limited with PDF because it doesn't work on all
platforms - e.g. handheld browsers. It requires extra software and
takes more memory and CPU power, which is a problem for older
computers.
-
PDF information is less accessibile than HTML, e.g. for those with
vision impairments.
-
PDF is designed for printing, not browsing or spreading information.
When the user does a copy-and-paste of text, it all ends up in one
blob with paragraphs, lists, etc. all mushed together into one
unformatted paragraph. Words that contain ligatures (like when "fi"
is represented by a single joined printing symbol) will often be
garbled. Often the entire selection doesn't show up in the
clipboard. Problems with hyphenation and multiple columns,
etc. often crop up.
-
Images are embedded, so they aren't easy to pull out as a .jpg or
.gif file for reuse.
-
Hyperlinks work badly. Relative hyperlinks are generally preferable
but don't work on some platforms. Putting it all in one file to
avoid this handicap makes it slower and harder to just read one part
of the document.
-
PDF files are usually larger than a simple HTML version.
-
PDF documents are harder to reuse since they are not an editable
source format and the formatting instructions are gone.
-
Programs that create PDF are less available or cost more
money than programs to produce HTML.
-
PDF "Security" is sometimes viewed by producers as a useful feature,
but it is mostly useful "for keeping honest people honest". Anything
that someone can view on the screen can be captured for use
elsewhere, so it won't protect you from plagarizers, etc. Features
like printing or saving to a file can be made more difficult, but
simply can't be prevented.
See also
Neal McBurnett
Last modified: Fri Sep 16 18:13:27 MST 2008