Processing Web Forms Carefully

CGI applications are often used to search through some database. For example, a catalog might let you look for an item by color, or an on-line dating service might let you pick people by gender, location, age, and interests.

CGI applications are often used to search through some database. For example, a catalog might let you look for an item by color, or an on-line dating service might let you pick people by gender, location, age, and interests.

When the CGI application is in Perl, the database query is frequently performed using the DBI interface. This amazing product, the result of man-years of effort (coordinated by Tim Bunce), allows a Perl program to interact nearly identically with over a few dozen types of databases, including both commercial and open-source databases, and even “non-database” databases like a comma-separated-values (CSV) file. With DBI, the interaction between script and database is primarily in the form of a series of industry-standard SQL statements. And while common, I frequently see the transition from a CGI form element to a SQL query statement as a security risk.

For example, let’s say that the form field firstname accepts an SQL LIKE pattern (e.g., “Tom%”) for the first name of a person I’m searching for in my department. The Perl code to construct an SQL query using firstname as a parameter might look something like Figure One.

Figure One: Typical Perl code to construct an SQL query

my $department = 123; # determined by some login process
my $firstname = param(‘firstname’); # from the field
my $sql = “SELECT id FROM employees WHERE department = $department” .
” AND firstname LIKE \”$firstname\”";

At first glance, the code in Figure One appears to be fine because the WHERE clause restricts matches to people in my department. For example, if I enter FR% as the SQL pattern in the form, I get the SQL statement:

SELECT id FROM employees WHERE
department = 123
AND firstname LIKE “FR%”

However, suppose I know, or can guess, the syntax of this generated statement, and I enter:

%” OR “X” = “X

in the form in the same firstname. Yes, that’s odd-looking, but notice how it expands:

SELECT id FROM employees WHERE
department = 123
AND firstname LIKE “%” OR “X” = “X”

What have I done? I’ve introduced an OR condition that is true for every employee, and I can now dump out far more than the application originally intended! Of course, when this happens, someone usually screams, “How do I make sure that the field doesn’t contain anything that’ll let the bad guys do this?” This is the wrong approach.

Instead, the DBI interface includes this nice tool called placeholders. The Perl query code should have been written like Figure Two.

Figure Two: A safer approach to SQL queries

my $department = 123; # determined by some login process
my $firstname = param(‘firstname’); # from the field
my $sql = “SELECT id FROM employees WHERE department = ?” .
” AND firstname LIKE ?”;
my $sth = $dbh->prepare($sql);
$sth->execute($department, $firstname);

Each question-mark is a placeholder. For every question-mark in the SQL, the DBI interface will take the corresponding value from the execute method, and place it into the SQL in such a way that the value is properly quoted and protected. And that solves the security problem!

Of course, a search form also might have form fields that are left blank, meaning “any value is good enough here”. What’s an easy way to generate the corresponding SQL and get the execute list correct? Well, you just need a little method to your madness, as illustrated in Listing One, a simple application showing a phonebook lookup form.

Listing One: A more secure way to lookup names in a database – Part 1

1 #!/usr/bin/perl -Tw
2 use strict;
3 $|++;
5 ## constants
6 my @STATES = qw(
9 );
11 ## configuration
13 (
14 ['firstname', 'person.firstname is ?'],
15 ['lastname', 'person.lastname is ?'],
16 ['street', 'person.street is ?'],
17 ['city', 'person.city is ?'],
18 ['state', 'person.state is ?',
19 sub {
20 my $v = shift;
21 return undef unless $v =~ /\S/;
22 return undef if $v eq "n/a";
23 $v;
24 }],
25 ['zip', 'person.zip is ?'],
26 );
27 ## end configuration
29 use CGI qw(:all);
31 print header, start_html, h1(‘Query’);
33 print hr, start_form;
35 print table({-bgcolor => ‘#ffff88′},
36 Tr(td(“First:”), td(textfield(‘firstname’)),
37 td(“Last:”), td(textfield(‘lastname’))),
38 Tr(td(“Street:”), td(textfield(‘street’)),
39 td(“City:”), td(textfield(‘city’))),
40 Tr(td(“State:”),
41 td(popup_menu(‘state’, ['n/a', @STATES], ‘n/a’)),
42 td(“Zip:”), td(textfield(‘zip’, ”, 10)),
43 td(submit)));
45 print end_form, hr;
47 {
48 my @sqls;
49 my @vals;
51 for (@QUERY_FIELDS) {
52 next unless defined(my $val = param($_->[0]));
53 if ($_->[2]) {
54 next unless defined($val = $_->[2]->($val));
55 } else {
56 $val =~ s/^\s+//;
57 $val =~ s/\s+$//;
58 next unless length $val;
59 }
60 push @sqls, $_->[1];
61 push @vals, $val;
62 }
63 if (@sqls) {
64 my $sql = “select id from person where “.join (” and “, @sqls);
65 print h2(“Resulting query”);
66 print pre(escapeHTML($sql));
67 print pre(escapeHTML(join “, “, map { qq{“$_”}} @vals));
69 ## typical use:
70 ## my $sth = $dbh->prepare($sql);
71 ## $sth->execute(@vals);
72 ## while (my ($id) = $sth->fetchrow_array) { … }
73 }
74 }
76 print end_html;

Lines 1 through 3 start nearly every CGI program I write. The -T switch turns on “taint mode”, so that input values are not accidentally used to alter the external behavior of the program. The -w switch enables warnings. use strict turns on the three common compiler restrictions for larger programs: disabling symbolic references, demanding that variables be properly introduced, and disabling bare words as subroutine invocations. And finally, standard output is unbuffered (not strictly necessary for this application, but handy if we ever fork a child that might need to include things into the output stream).

Lines 6 through 9 establish a list of abbreviations for the states. I’ll use this in a pop-up box definition.

Lines 12 to 26 define the query fields of my form, including the SQL snippets that should be used if the fields are present in the response. Each element of @QUERY_FIELDS is an arrayref. The first element is the form field name. The second element is an SQL snippet, using the question-mark placeholder. And a third element, if present, is a coderef for the routine to validate the data and alter it if necessary. We’ll see precisely how these fields are used later.

Line 29 pulls in the standard CGI module, along with importing all the form access and HTML generation shortcuts.

Line 31 prints the HTTP/CGI header, the beginnings of the HTML header, and a first-order head of Query. As this is not a complete application, I didn’t spend much time adorning the output.

Line 33 prints the tag for a horizontal rule, and the beginning of the form. The form action parameter defaults to the same URL, bringing a form submission right back to this same program, so this CGI script both generates the original form, and processes the results. This is a typical strategy for CGI.pm-based programs.

Lines 35 to 43 print the form itself, with the layout aided by a simple table. A background color of a light-yellow defines the area of the form, designated in line 35. The first row picks up the first and last name, with their appropriate labels. The second row similarly holds the street address and the city.

The third row contains a popup field named state. The list of states comes from the array defined earlier. An additional item of n/a is added to the front of the list, and also selected as the initial default. To keep this type of searching form easy to use, all fields should default to their “insignificant” state. The third row also contains a zip code field. Of course, a zip code would also uniquely define a city and state, but that’s beyond the scope of this example, so we’ll pretend these are all independent fields. The third row also contains the submit button.

For all three rows, the individual table cells are enclosed in a td HTML shortcut, defined by CGI.pm. And each of these cells are further nested into a table row shortcut, defined as Tr. This shortcut has an unusual name because as an entirely lowercase word, it would collide with the built-in tr operator. Finally, the three rows are enclosed inside a table shortcut, with an additional hashref parameter which becomes the attributes of the table tag for the output.

Line 44 closes out the form with a horizontal rule.

When the form is initially displayed in the browser, I have a chance to fill out one or more of the fields of my choice, and press submit. The same program is reinvoked, but with access to the form parameters using the param function. The code beginning in line 47 handles this response.

Lines 48 and 49 hold the pieces of SQL and their corresponding values. Initially, they’ll start out empty. If they’re still empty after we’ve processed all the known fields, then we didn’t get a valid query, and there’s nothing further to do.

Lines 51 to 62 process each of the known form fields. $_ holds the arrayref of each field in turn. Line 52 pulls out the value for that particular parameter into the scalar $val. (Note that this won’t work very well for a multi-value parameter, but that’s merely a minor change.) If the parameter is not present, we skip on to the next item.

Lines 53 to 59 clean up the data from that form element. If there’s a value in the third arrayref element, it’s a coderef, which we call, passing the form value as its first argument. We expect the called subroutine to return either the cleaned-up data, or undef, again indicating that this field is “not present”. If there’s no special handling subroutine, a simple editing pass is performed instead: line 56 deletes any leading whitespace, while line 57 deletes any trailing whitespace.

Looking back at lines 18 to 23, we can see that the cleanup routine for the popup menu takes that value and returns undef unless it contains a non-whitespace character but is not our default value of n/a.

Now, if we’ve made it to line 60, we have a good value in $val, and the corresponding SQL as the second element of the arrayref. So, we’ll push each one onto the end of the their corresponding arrays to be gathered and processed later.

Once the individual form elements have been examined, it’s time to determine if we have a valid query. Line 63 examines the size of the @sqls array. If this is non-empty, then we’ve got to proceed to building an SQL statement, in line 64.

Each snippet in @sqls is a clause to be AND-ed with the other clauses. So we’ll take the snippets and join them with an “and” keyword in line 64. In front of that, we’ll select the id field from the resulting record for further processing.

In a real program, we’d then execute code similar to lines 69 to 72 (also shown immediately below):

my $sth = $dbh->prepare($sql);
while (my ($id) = $sth->fetchrow_array) {


The SQL statement would be prepared from the joined snippets, the placeholders would be bound to their corresponding values, and finally we’d process the matched records.

However, since this is just a demo harness, we’ll dump out the SQL in line 66, and the list of values for the binding in line 67.

So, we’ve seen how to walk through a form, pick out the fields, and construct the corresponding SQL for them. We’ve also seen the importance of placeholders. With placeholders, you’ll have no more worries about malicious or broken input values spoiling your day. Until next time, enjoy!

Randal L. Schwartz is the chief Perl guru at Stonehenge Consulting and can be reached at merlyn@stonehenge.com. Code listings for this column can be found at http://www.stonehenge.com/merlyn/LinuxMag/.

Comments are closed.