Creating PDFs

Sharing documents such as papers, reports, and specifications is made easy with Adobe’s Portable Document Format (PDF). Here’s a primer on how to create PDFs on Linux.
Exchanging documents such as papers, reports, and technical specifications can pose a challenge: how do you deliver a file to a recipient in a form that preserves fonts, graphics, and layout, and is readable in any operating system? Plain ASCII text is highly portable, but you can’t embed graphics or use special fonts in such a text file. Word processing formats enable you to embed graphics and use whatever fonts you like, but they aren’t universally readable, and even when they are, fonts and formatting can change when the file moves between machines. HTML is a good candidate for embedding graphics and using fonts, but formatting is likely to change, and you’ll need to deliver multiple files for the complete document to be readable.
The format that’s best for handling text or mixed text and graphics documents with careful page formatting is Adobe’s Portable Document Format (PDF). Adobe’s Acrobat Reader software is available on most platforms, including Linux, and alternative programs, such as Xpdf, can also handle the PDF format. These PDF-reading programs are readily available, so even if a recipient has somehow managed to escape getting one, correcting the problem is quick and simple.
The question then becomes: How do you create PDF documents using Linux tools? Fortunately, this task is not too onerous. Ghostscript supports creating PDFs from PostScript files, which can be generated by most Linux programs that can print. In fact, an increasing number of Linux word processors and other print-centric tools provide direct PDF-generation options. Still, knowing how to use Ghostscript to do the job can be helpful at times, because you can use that tool’s options to help optimize the PDF files you create for specific purposes.

PDF Principles

PDF documents (sometimes called Acrobat documents, particularly by Macintosh users) are designed as an electronic variant of printed documents. The contents of PDF files are precisely defined, in the sense that the position of every character on the virtual page can be fixed.
PDF goes beyond a mere electronic page, though; PDF files can contain indices, links, and other features that aren’t available in printed documents. The methods of creating PDFs described here don’t automatically create such features, but if your source document contains them, they may be preserved in the PDF-creation process.
PDF is essentially an outgrowth of PostScript. Where PostScript was designed as a printer language, though, PDF is a format for on-screen viewing. PDF also employs compression to help keep file sizes manageable — it’s not uncommon for a PDF file to be a half to a tenth the size of the PostScript file from which it was created.
Like PostScript, PDF can embed fonts within the document. This feature alone makes it a great format for distributing documents — you can create the PDF file using whatever fonts you like and not worry about whether or not the recipient has the same fonts. Embedded fonts can cause problems, though, particularly if you’re using an older version of Ghostscript. Under certain circumstances, fonts are converted from outline to bitmap formats, which results in slow and ugly displays at screen resolution, although such documents usually print reasonably well.

Creating a PostScript Document

The traditional way to create a PDF file under Linux is to run a PostScript file through Ghostscript or some other tool. Thus, to begin the process, you must have a PostScript file. Fortunately, these are easy enough to generate from most applications via a print option.
For instance, Figure One shows the “Print” dialog box in OpenOffice.org. To create a PostScript file, check the” Print to File” option. When you click OK, the program prompts you for a filename and saves PostScript to that filename rather than sending it to the printer.
Figure One: Most GUI programs allow you to print PostScript to a file

Some programs allow you to create PostScript through a file export option instead of or in addition to a print option. Thus, if you can’t figure out how to create a PostScript file via the print option, check the file save and export options.
An increasing number of programs, including OpenOffice.org, provide direct PDF-creation tools. These frequently call Ghostscript, automating the (otherwise manual) process described below. Other times they generate the PDFs without Ghostscript’s help. You can certainly avail yourself of such tools, and their output may suit your needs just fine. Sometimes, though, creating PDF files from PostScript files with the help of Ghostscript produces better results or gives you better control over PDF creation options.

Basic PDF Creation with Ghostscript and ps2pdf

Ghostscript’s gs command creates a wide variety of output formats, which you can specify with its –sDEVICE= option, one of which is pdfwrite. You can call gs with this option and half a dozen or so others to create a PDF file from a PostScript file; however, in most cases it’s easier to use the ps2pdf program. This program is actually a script that calls gs with the appropriate options to create a PDF.
In fact, several variants of ps2pdf exist, named after the PDF version numbers they create: ps2pdf12, ps2pdf13, and ps2pdf14 to create PDF files conforming to versions 1.2, 1.3, or 1.4 of the standard, respectively. If you call ps2pdf (with no version number), the script actually calls ps2pdf12, at least as of Ghostscript 7.07; this could change with future versions of Ghostscript. (As the magazine goes to press, Ghostscript 8.50 is the latest version of Aladdin Ghostscript, but version 7.07 is the latest version of GNU Ghostscript, which is shipped with most Linux distributions.)
To create a PDF, you need only call the ps2pdf program with the input filename. It generates an output file with the same name as the input file, but with a .pdf extension:
$ ps2pdf document.ps
This command creates a PDF file called document.pdf, which you should be able to view with Adobe Acrobat Reader, Xpdf, gv, or other programs that can read PDF files.
Of course, you can do a lot more than just create a PDF file in this simple way. In particular, you can change the output filename by adding it to the end of the command line and you can add various options. Consult the ps2pdf documentation on the web (http://www.cs.wisc.edu/~ghost/doc/AFPL/7.07/Ps2pdf.htm) for details. (Change 7.00 in that URL to your Ghostscript version number, or one close to it.)
Some options that might interest you include:
*Compatibility level. You can set the level of the PDF standard used by ps2pdf with the –dCompatibilityLevel= level switch, as in –dCompatibilityLevel=1.4 to use the 1.4 compatibility level. This option is essentially redundant with the different ps2pdf script variants.
*Resolution. You can set the resolution that’s used for graphics and fonts that must be converted to bitmaps with the –r resolution option. For instance, –r300 will create 300 dpi bitmaps, which will produce good results when printed, whereas –r72 will create 72 dpi bitmaps, which will be smaller and might be more suitable for documents that you expect to be viewed on the screen.
*Graphics resolution. You can set the resolution of embedded graphics independently of text with the –dColorImageResolution= resolution option. For instance, to include 100 dpi images, you’d use –dColorImageResolution=100. (The default value is 72, but that goes up to 300 for printer or prepress settings, as described shortly.)
*Page rotation. Sometimes, documents with varying page orientation can cause Ghostscript problems. You can exercise some control with the –dPageRotation= value option, which you can set to /None (no page rotation), /All (rotate all pages to match the first one), or /PageByPage (set each page independently based on its predominant text orientation).
*Embedding fonts. You can tell Ghostscript to embed all the fonts with the –dEmbedAllFonts= boolean option, as in –dEmbedAllFonts=true. This is generally desirable for font legibility, but it can produce larger files. (Note that Ghostscript often embeds fonts even if you set this value to false, but this varies from one document to another.) This option does not apply to the standard thirteen PDF fonts (Symbol and four variants each of Times, Helvetica, and Courier).
*Subsetting fonts. Normally, Ghostscript subsets its fonts, meaning that it only copies over the characters that are actually used from a font that’s embedded in the source file. You can change this behavior by specifying –dSubsetFonts=false. The main result is likely to be larger files, but if you believe the subsetting is causing problems, this can be a good way to work around them. Note that some programs subset fonts when creating PostScript output, so Ghostscript options might have little or no real effect on this feature.
*Compression. The CompressPages= boolean option enables or disables compression. In theory, Ghostscript supports two forms of PDF compression: LZW and flate. Because of patent issues, though, LZW compression is not actually supported, and requests in the source document or via the –dLZWEncodePages=true option are silently turned into flate compression requests (–dUseFlateCompression=true).
*CMYK conversion. Most documents specify colors using red/green/blue (RGB) encoding; however, some use cyan/magenta/yellow/black (CMYK) encoding. You can control whether Ghostscript tries to convert CMYK graphics to RGB form with the –dConvertCMYKImagesToRGB= boolean option. Typically, you’ll want to leave this alone (that is, set to true), but in some prepress environments, you might want to leave CMYK graphics encoded as such.
*PDF settings. The –dPDFSETTINGS= value option provides a set of shortcuts for other options. It takes four possible values: /screen, /printer, /prepress, and /default. The first three options are designed to optimize a document for screen display, for printing on a printer, and for pre-press operations, respectively. The /default option creates a general-purpose file. These options mimic the effects of options with similar names in Adobe’s Acrobat Distiller program. To use them, add the value to the option, as in –dPDFSETTINGS=/printer.
Many of these options have different default values depending on the value of the –dPDFSETTINGS option. Consult the ps2pdf web page for details.

Optimizing PDFs

Many of the features just described are present in Acrobat Distiller under the same names; in fact, these options can be set in the source PostScript file.
Ghostscript lacks a few options, though. Perhaps the most important of these is the ability to optimize a document — that is, to lay it out in such a way that it loads very quickly when you open it in a PDF reader. Although ps2pdf includes an option to do this, the option actually does nothing, at least as of GNU Ghostscript 7.07 and Aladdin Ghostscript 8.50.
Fortunately, you can optimize your PDFs using a separate program: pdfopt. To use this command, type it followed by input and output filenames:
$ pdfopt input.pdf output.pdf
Note that you must specify both input and output filenames, and they must not be the same; if they’re the same, the file will be destroyed! Optimized files may be bigger than their non-optimized counterparts, so if your goal is minimizing file size, you shouldn’t blindly optimize the files. Optimization can make sense if the file displays slowly or if readers will be downloading and reading it off of the network (as in files linked from a web page).

Maximizing PDF Text Quality

Traditionally, one of the problems with generating PDFs with Ghostscript has been poor text quality, particularly when viewed on the screen. The reason is that old versions of Ghostscript were unable to embed fonts in PDF documents except as bitmaps. That problem is slowly changing. Currently, Ghostscript can embed Type 1 fonts in outline format, which produces good displays on both screen and paper. Other font types are a bit iffier. Type 3 fonts are converted to bitmaps and Type 0 fonts might or might not be converted to bitmaps, depending on their precise features. Some programs that print TrueType fonts convert them to Type 3 format, so they end up being converted to bitmaps. Even Type 1 fonts can be so converted; for instance, WordPerfect for Linux prints Type 1 fonts by converting them to bitmaps itself, so Ghostscript can’t recover an outline font.
The bottom line is that you should evaluate the appearance of your fonts when you create a PDF file. If you’re uncertain about what’s happening with your fonts, you can use Adobe Acrobat Reader to get a few clues. Load your PDF file into the program and select” File, Document Properties, Fonts.” You may want to click” List All Fonts,” which causes all fonts to be summarized, not just those used on the first page. The result should resemble Figure Two, which shows the fonts used in a test document created with OpenOffice.org.
This document includes three fonts: Times-Roman, A, and Helvetica. The first and last are Type 1 fonts. In fact, Times-Roman was substituted by OpenOffice.org for Times New Roman, but OpenOffice.org didn’t go the distance and use the standard PostScript font; instead, it embedded a Type 1 version that was installed on the system. The mysteriously-named A font descends from Nimbus Roman. Helvetica was unchanged, and is one of the thirteen embedded PostScript/PDF fonts.
Figure Two: Adobe Acrobat Reader lets you see a list of fonts used by a PDF file

In any event, the key is to recognize that the Type 3 fonts are likely to look chunky on screen. If you see such fonts, you should try to identify them in your source document and replace them with something more palatable. You can do this in several ways:
*You can change the font in the source document to one of the core thirteen PostScript fonts. Assuming the software doesn’t try to embed the same font in the PostScript output, the result should look better, and the PDF file will be smaller as well.
*You can change the font in the source document to a Type 1 font rather than a PostScript font. This won’t work with all programs, though; some, such as WordPerfect for Linux, convert their fonts to bitmaps themselves.
*You can replace the font in the application (perhaps in Linux’s Xft font subsystem) with a Type 1 version. If you’ve only got a TrueType version of the font, consider converting it to Type 1 format using FontForge (available online at http://www.fontforge.sf.net, and described previously in the December 2004″ Guru Guidance” column, available at http://www.linuxmagazine.com/2004-12/guru_01.html).
Another approach is to try an alternative PDF creation tool. For instance, OpenOffice.org includes its own PDF export feature. This feature preserves TrueType fonts as such rather than converting them to Type 3 format, so the results look much better if you’re using TrueType fonts. Of course, this means you’ll give up any Ghostscript PDF creation options.
If you’re mainly concerned about the printed appearance of fonts, you may be able to get by with ensuring that a good graphics resolution is set via the –r resolution option, as described earlier. Typically, –r300 produces acceptable printouts, but for the best results you should set the resolution based on the printer’s resolution. Using this feature is likely to increase the file size, though, and the on-screen display is still likely to look pretty bad.

Roderick W. Smith is the author or co-author of over a dozen books, including Linux in a Windows World and Linux Power Tools. He can be reached at class="emailaddress">rodsmith@rodsbooks.com.

Comments are closed.