Live Logs

Last month, we configured Apache with mod_log_sql to log all Web traffic to a MySQL database, using one table for each virtual domain.

Last month, we configured Apache with mod_log_sql to log all Web traffic to a MySQL database, using one table for each virtual domain.

This month, we’ll build a basic Web interface to query and summarize the growing mass of data that MySQL is now collecting.

Organizing Things

Rather than create one large PHP script, let’s build the project so that it’s manageable even after it grows beyond what’s introduced in this article. Here’s the approach:


  • Let’s store all of the common functions in a library file named lib.php.

  • Let’s use an external style sheet (style.css) to control the look and feel of the interface. In fact, let’s break the page headers and footers into two separate files — that way the individual PHP files are rather small and easy to understand — and use include() to pull the header and footer into other files as needed. We’ll use the PHP function require_once(“lib.php”) in the page header to make sure all of the functions are available in the page.

  • Let’s keep all of the files in a single directory called logs.

And to get something working as quickly as possible, let’s keep the interface quite simple:


  • Point any Web browser to http://yoursite.com/logs to see a list of domains and the total number of hits each domain has received. Then click on the name of any domain to see that domain’s most recently requested URIs and the most popular URIs, including how many requests each URI has had.

Head and Feet

To get started, let’s create the header and footer that all of the pages will use.

header.html, shown in Listing One, sets up the initial HTML for the page, including the style sheet reference, and then checks for the existence of a $title variable. If $title is set, $title is used as part of the page title. Otherwise, “Apache Logs” is used. (This enables each page to set a custom page title by setting the $title variable before require()‘ing the header.) The header then uses require_once() to include lib.php if it hasn’t been included already, and sets up the page body.




Listing One: header.html


<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd“>

<html><head>
<meta http-equiv=”Content-Type” content=”text/html; charset=iso-8859-1″ />
<link title=”Style” rel=”STYLESHEET” href=”style.css” type=”text/css”>

<?
if (! isset($title))
{
$title = “Apache Logs”;
}
else
{
$title = “Apache Logs: $title”;
}
?>

<title><?= $title ?></title></head><body>

<? require_once(“lib.php”) ?>

<div class=”topbar”><p><?= $title ?></p></div>
<div class=”pagebody”>

The footer.html file in Listing Two simply closes off the page body and prints out a basic HTML footer. You’ll certainly want to customize footer.html to include your email address and other contact information.




Listing Two: footer.html


</div>
<div class=”footer”>
<address>
Copyright 1998-2002, <a
href=”http://jeremy.zawodny.com/“>Jeremy D. Zawodny</a>
<<a
href=”mailto:Jeremy@Zawodny.com“>Jeremy@Zawodny.com</a>>
</address>
</div>
</body>
</html>
<? disconnect(); ?>

Page One

As described above, the first page of our interface presents a list of all the domains for which there is data. We’ll call this page index.php. Listing Three shows the code for it.




Listing Three: index.php


<? include “header.html”
$domains = get_domain_list();
$count = count($domains);
?>

<p>Select one of the <?= $count ?> domains.</p>

<ul>
<?
for ($i = 0; $i < $count; $i++)
{
$table = $domains[$i];
$domain = table_to_domain($table);
$total = total_in_table($table);
echo “<li><a href=\”domain.php?table=$table\
“>$domain</a> ($total)</li>”;
}
?>
</ul>

<? include(“footer.html”) ?>

As you can see, there’s not a lot there. index.php calls get_domain_list() to fetch the list of domain names, counts them, and presents a message telling the user to select one. Then, it produces a bulleted list. For each domain (actually a table name from the database), it calls table_ to_domain() to convert the table name to the actual domain name. It then calls total_in_table() to get the hit count for the domain, and finally, prints the information to the user. The domain name is a hyperlink to domain.php, or page two.

Page Two

The next page isn’t much more complex than the first. Listing Four contains the code.




Listing Four: domain.php


<?
include_once “lib.php”;

$table = $_REQUEST['table'];
$domain = table_to_domain($table);
$title = “Domain View ($domain)”;
$num = $_REQUEST['num'];

if (! isset($num))
{
$num = 20;
}

include “header.html”;
?>

<h1><?= $num ?> Most Recent Requests</h1>

<ul><? print_uri_list(most_recent_requests
..($num, $table)); ?></ul>

<h1><?= $num ?> Most Popular Requests</h1>

<ul><? print_uri_list(most_popular_requests
($num, $table)); ?></ul>

<? include(“footer.html”) ?>

First, we extract any arguments passed to the script. The table name is extracted from $_REQUEST['table'] and converted to the domain name (for display purposes) by calling table_to_domain(). An optional num parameter controls how many entries are displayed on the page.

With that data in hand, we can then present two lists on the page: a list of the $num most recent requests logged for the domain, and a list of the $num most popular requests. By calling most_recent_requests() and passing the data straight to print_uri_list(), we can easily generate the first list. Similarly, for the list of most popular requests, we call most_popular_requests(), again passing the data to print_uri_list() to be printed. That’s all there is to it.

An important security note: because this script accepts input from the outside world that is then used to build an SQL query, you should exercise care before putting this application on a public Web server. The input really needs to be scrubbed for malicious data (for more information on validating form input, see the “Hackproofing” article in the September 2002 issue, available online at http://www.linux-mag.com/2002-09/hackproof_01.htm).

The Library

So far, the code we’ve seen has been very simple. The meat of the code is in lib.php, shown in Listing Five. lib.php is included by each page, and contains the functions used to connect to the database and perform various queries. Indeed, the code in the individual pages has no mention of database connectivity at all.




Listing Five: lib.php – Part 1


<?
$db_host = “localhost”;
$db_user = “apache”;
$db_pass = “XXXXXXX”;
$db_name = “apache”;
$db_conn = 0;
$rs = 0;

/*
* Collect a list of the domains and return an array.
*/

function get_domain_list ()
{
$sql = “show tables”;
$rs = run_query($sql);
$cnt = 0;
$domains;

while ($row = mysql_fetch_row($rs))
{
$domains[$cnt++] = $row[0];
}

return $domains;
}

/*
* Convert table name like access_foo_bar_com to foo.bar.com
*/

function table_to_domain ($table)
{
$domain = preg_replace(‘/^access_/’, ”, $table);
$domain = preg_replace(‘/_/’, ‘.’, $domain);
return $domain;
}

/*
* Fetch the $num most recent successful requests for a table/domain.
*/

function most_recent_requests($num, $table)
{
$sql = “select request_uri from `$table` order by time_stamp desc limit $num”;
$rs = run_query($sql);
return $rs;
}

/*
* Given a table/domain and number, find the $num most popular requests
*/

function most_popular_requests($num, $table)
{
$sql = “select count(*) as cnt, request_uri from `$table`
group by request_uri
order by cnt desc limit $num”;
$rs = run_query($sql);
return $rs;
}

/*
* Find the total number of requests in a given table/domain.
*/

function total_in_table ($table)
{
$sql = “select count(*) from `$table`”;
$rs = run_query($sql);
$row = mysql_fetch_row($rs);
return $row[0];
}

/*
* Given a record set, print the list of URIs (linked), and optional count.
*/

function print_uri_list ($rs)
{
global $table;
$domain = table_to_domain($table);
while ($row = mysql_fetch_assoc($rs))
{
echo “<li><a href=\”http://$domain$row[request_uri]\“>$row[request_uri]</a>”
;
if ($row[cnt])
{
$url = “uri.php?table=$table&uri=$row[request_uri]“;
echo ” (<a href=\”$url\”>$row[cnt]</a>)”;
}
echo “</li>”;
}
}

/*
* Run the query and return the handle.
*/

function run_query ($sql)
{
global $db_conn;
$rs = mysql_query($sql, $db_conn);
return $rs;
}

/*
* Connect to the database if we’re not already connected
*/

function connect ()
{
global $db_name, $db_conn, $db_host, $db_user, $db_pass;

if ($db_conn)
{
return;
}

if ($db_conn = mysql_pconnect($db_host, $db_user, $db_pass))
{
if (mysql_select_db($db_name, $db_conn))
{
return 1;
}
else
{
return 0;
}
}
else
{
return 0;
}
}

/*
* Disconnect from MySQL if we’re connected.
*/

function disconnect ()
{
global $db_conn;

if ($db_conn)
{
mysql_close();
}
}

/*
* Make sure we’re connected when this file is included.
*/

if (! connect())
{
echo “Can’t connect.”;
exit;
}

?>

Most of the functions in lib.php are used for performing particular database queries — they’re rather self-explanatory. Instead, let’s look at a few of the other functions.


  • connect() and disconnect() do just what you’d expect: they connect and disconnect from MySQL.

  • The table_to_domain() function converts a table name like access_jeremy_zawodny_com to jeremy.zawodny.com by stripping off the access_ prefix and changing the underscores back to dots.

  • print_uri_list() is used on pages that need to print a hyperlinked list of URIs with optional counts.

  • run_query() is a simple wrapper around PHP’s mysql_ query() function. By keeping mysql_query() separate, you can easily add debugging code to all of your SQL queries in a single place.

And notice that the very end of the file contains:


if (! connect())
{
echo “Can’t connect.”;
exit;
}

This little snippet ensures that every page that uses lib.php gets a valid connection to the database. If there’s no connection, the script exits. Of course, you really ought to have some more graceful error handling in your code.

Going Deeper

What we’ve covered here is only the beginning. With a basic framework in place and some working examples, you can customize and extend this to dig up whatever information you may want from your logs. Here are some ideas for other pages:


  • google.php: Lists the most common search terms that send Google users to your Web site.

  • browsers.php: Presents a summary of the most popular browsers that visit your site.

  • country.php: Using the GeoIP PHP API (http://maxmind.com/geoip/api/php.shtml), this page determines which countries visit your Web site most often.

You’re sure to come up with other interesting statistics to track. There are virtually no limits with PHP and MySQL.



Jeremy Zawodny uses Open Source tools in Yahoo! Finance by day and is writing a MySQL book for O’Reilly & Associates by night. Reach him at: Jeremy@Zawodny.com. You can download the PHP files used in this column at http://www.linux-mag.com/downloads/2002-11/lamp.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62