Being a magazine editor isn't easy. In addition to fretting deadlines and the regular care-and-feeding of authors, editors spend a lot of time and effort reading, re-writing, re-reading, and tweaking articles. Editing is really more of an art than a science -- and that's why editors haven't been replaced by computers. Yet.
Being a magazine editor isn’t easy. In addition to fretting deadlines and the regular care-and-feeding of authors, editors spend a lot of time and effort reading, re-writing, re-reading, and tweaking articles. Editing is really more of an art than a science — and that’s why editors haven’t been replaced by computers. Yet.
Editors also have to keep track of page counts. After all, each issue of the magazine has a limited number of editorial pages (subtract the number of “ad pages” from the total number of pages to find the number of content or “editorial” pages). Every time an author sits down to write, he or she needs to know how much to write. “How many words?” is a fundamental question. It’s the editor’s job to allocate pages to each article (a column or feature), compute the word count and communicate the word quota to the author. Luckily, that’s a simple math problem, right? Just plug a few numbers into a formula and get the answer.
The trouble is that, until very recently, none of the magazine’s staff really had a good feel for the formula. Everyone had some intuitive feel for computing word counts, but intuition didn’t prove to be very accurate. So, the editors of Linux Magazine recently put their heads together with the art director to come up with something a little more scientific.
Formatting and Layout Elements
The amount of “normal” text (like the text in this paragraph) needed to fill a page depends largely on how many other elements appear in the article. For example, a sub-head (such as “Formatting and Layout Elements” immediately above) takes up a certain amount of space, where space is measured in column lines. So, if a sub-head (including the white space above it), takes up four column-lines, that’s four less column lines for normal text.
As you’ll see, if you account for the size (in column lines) of all of the elements in an article — subheads, code listings, in-line code, sidebars, figures, pull quotes, lists, and screen shots — you can create an equation, plug the numbers in, and get a reliable word count. Just find the typical number of words and lines on a page (this depends on the number of columns on the page) and subtract the space consumed by all of the elements.
Knowing the exact formula isn’t all that pertinent. It’s just a bunch of variables that ultimately produce a number. That sounds a lot like a programming task, doesn’t it?
LAMP to the Rescue
Let’s build a simple web-based calculator that Linux Magazine authors and editors can use to figure out how much text is needed for a particular article (like this one). As time goes on, the formula can be refined and tweaked, but with everyone using the same tool, everyone will benefit right away.
And since the calculator is web-based, even potential authors can use it to help write a more accurate article proposal:
“I will write an 847-word article about Microsoft Active Directory Replication in LDAP on Linux. It will have 2 figures and one 200 line code listing. This translates to roughly 2.5 pages in your fine publication.”
Or something like that.
PHP makes it easy
|Figure One: The Linux Magazine word count calculator|
To keep things easy, we’ll build the word count calculator as a single PHP script that displays a list of the elements that affect the length and layout of an article. The user will enter some numbers (e.g., desired number of pages, number of figures, etc.) and click a Calculate button. The data will be submitted back to the same PHP script to compute the approximate number of words needed to fill the space. On the resulting page, the results will be prominently displayed and the same form will contain all the data as the user entered it. Repeating the form as it was submitted makes it very easy to ask “what if” — just modify a couple of values and re-calculate the results.
Figure One shows the calculator in action. The values you see were the numbers used during the development of this very article. The screen shot and code listing are all accounted for. The code used to generate Figure One appears in Listing One.
To try out the calculator for yourself or download the latest version, head over to the Linux Magazine web site (http://www.linux-mag.com/article-calc.php).
Listing One: article-length.php – Part 1
1 <? /* Calculate the amount of text required to fill a
2 given set of pages after accounting for headline,
3 subheads, screen shots, listings, text figures,
4 pull quotes, bulleted lists, in-line code, bio,
5 and so on. */
9 Estimating Linux Magazine Article Lengths</title></head>
12 <h1>Linux Magazine Article Length Estimator</h1>
14 Estimate how much text is needed to fill out an article.
15 We assume 1 headline and a standard 4-line author bio.
16 Please provide estimates for the following items:
20 if ($HTTP_GET_VARS["submit"] == “Calculate”)
22 $pages = $HTTP_GET_VARS["pages"];
24 $words_per_page = 1000;
25 $columns_per_page = 2;
26 $lines_per_page = 52;
27 $lines_needed = $lines_per_page *
28 $columns_per_page * $pages;
29 $lines_not_needed = 0; # we’ll be adding to this
31 // screenshots (18 lines)
32 $lines_not_needed += 18 * $HTTP_GET_VARS["ss"];
34 // subheads (3 lines)
35 $lines_not_needed += 3 * $HTTP_GET_VARS["sh"];
37 // listing figures (5 lines per, plus number of lines)
38 $lines_not_needed += (5 * $HTTP_GET_VARS["cl"]) +
41 // sidebars (5 lines + 10 words per line)
42 $lines_not_needed += (5 * $HTTP_GET_VARS["tf"]) +
43 ($HTTP_GET_VARS["tfw"] / 10);
45 // pull quotes (5 lines each)
46 $lines_not_needed += 5 * $HTTP_GET_VARS["pq"];
48 // bullet lists (1 line, plus 2 time num of bullets
49 $lines_not_needed += (1 * $HTTP_GET_VARS["bl"]) +
50 (2 * $HTTP_GET_VARS["bli"]);
52 // author bio
53 $lines_not_needed += 4;
55 // headline
56 $lines_not_needed += 10;
58 // inline code (2 lines each, plus number of lines)
59 $lines_not_needed += (2 * $HTTP_GET_VARS["ics"]) +
60 (2 * $HTTP_GET_VARS["icl"]);
62 // final calculations
63 $lines_really_needed =
64 $lines_needed – $lines_not_needed;
65 $pages_really_needed =
66 $lines_really_needed /
67 ($lines_per_page * $columns_per_page);
68 $words_really_needed =
69 intval($words_per_page * $pages_really_needed);
71 echo “<!– pages really: $pages_really_needed –>\n”;
73 echo “<table border=1 cellpadding=8 cellspacing=0>”;
74 echo “<tr><td>You need roughly $lines_really_needed”;
75 echo “lines or $words_really_needed words to fill “;
76 echo “$pages pages using the numbers below.”;
77 echo “</td></tr></table>”;
81 <form method=”get”>
84 <td>Pages to fill (decimal numbers okay)</td>
85 <td><input type=”text” name=”pages”
86 value=”<? echo $HTTP_GET_VARS["pages"] ?>”></td>
90 <td><input type=”text” name=”sh”
91 value=”<? echo $HTTP_GET_VARS["sh"] ?>”></td></tr>
93 <tr><td>Bulleted lists</td>
94 <td><input type=”text” name=”bl”
95 value=”<? echo $HTTP_GET_VARS["bl"] ?>”></td></tr>
97 <tr><td>Total bulleted list items</td>
98 <td><input type=”text” name=”bli”
99 value=”<? echo $HTTP_GET_VARS["bli"] ?>”></td></tr>
101 <tr><td>Inline code samples</td>
102 <td><input type=”text” name=”ics”
103 value=”<? echo $HTTP_GET_VARS["ics"] ?>”></td></tr>
105 <tr><td>Total lines of inline code</td>
106 <td><input type=”text” name=”icl”
107 value=”<? echo $HTTP_GET_VARS["icl"] ?>”></td></tr>
110 <td><input type=”text” name=”ss”
111 value=”<? echo $HTTP_GET_VARS["ss"] ?>”></td></tr>
113 <tr><td>Pull quotes</td>
114 <td><input type=”text” name=”pq”
115 value=”<? echo $HTTP_GET_VARS["pq"] ?>”></td></tr>
117 <tr><td>Code listings</td>
118 <td><input type=”text” name=”cl”
119 value=”<? echo $HTTP_GET_VARS["cl"] ?>”></td></tr>
121 <tr><td>Total lines of code listings</td>
122 <td><input type=”text” name=”cls”
123 value=”<? echo $HTTP_GET_VARS["cls"] ?>”></td></tr>
125 <tr><td>Text figures (sidebars)</td>
126 <td><input type=”text” name=”tf”
127 value=”<? echo $HTTP_GET_VARS["tf"] ?>”></td></tr>
130 <td>Total <b>words</b> in text figures (sidebars)</td>
131 <td><input type=”text” name=”tfw”
132 value=”<? echo $HTTP_GET_VARS["tfw"] ?>”></td></tr>
134 <tr><td><input type=submit name=”submit”
136 <td> </td></tr>
Step By Step
Let’s glance over the code and see what’s happening. The first few lines simply identify what the script does — always a good idea for your PHP code. Then we have some basic HTML to construct the page header.
At line 20, we check to see if the current request is the result of a click on the Calculate button. Since the $HTTP_GET_VARS array contains all of the variables that the script receives, we can simply check to see if submit is Calculate.
If submit is set, the script performs all the computation necessary to figure out how many lines and words must be written to fill the page(s). The computation is simply some addition, subtraction, and division. No rocket science here. Note the use of intval() at line 63 to yield an integer value, preventing silliness like 867.3333333 words.
If submit is not set (usually meaning the script was called for the very first time), control passes to line 81 where the form output begins.
Once the values have been computed, lines 73-77 produce a table that contains the word and line counts.
The remainder of the code is responsible for producing the HTML form and finishing up the page. Notice that we use the $HTTP_GET_VARS array to set each field’s value attribute to the most recently submitted value. On a user’s first visit to the page, all the fields will be empty, and that’s just what we want to happen.
This simple calculator solves a mildly interesting problem for a small group of people. It formalizes a part of the magazine editing process that was previously informal and highly variable. The implementation is relatively simple, but that’s a good thing. It’s easy to change.
Next month we’ll turn the PHP application into a Web service so it can be reused in other applications or called from a command-line tool.
Jeremy Zawodny uses Open Source tools in Yahoo! Finance by day and is is writing a MySQL book for O’Reilly & Associates by night. Reach him at: Jeremy@Zawodny.com.