Scale Out with Hypertable

Introducing Hypertable, an open source, distributed database that's modeled after Google's Bigtable, but available for anyone to use. Scale databases to meet any need.

Traditional database technology such as MySQL works well so long as your data fits on a single machine. As soon as your data exceeds the capacity of a single machine, the problem gets exponentially more difficult. Hypertable is an open source, distributed database, specifically designed to overcome this scaling barrier. It is modeled after Google’s Bigtable, which has been successfully deployed at Google for several years and underpins many of their major services.

This technology was designed to be massively scalable and to that end, certain traditional database features were sacrificed, most notably transactions and table joining. Though lack of support for these features makes Hypertable unsuitable for certain classes of applications, such as financial applications, there is a large set of Web applications in which this technology is well suited.

System Overview

The Hypertable data model consists of a multi-dimensional table of information that can be queried using a single primary key. The first dimension of the table is the row key. The row key is the primary key and defines the order in which the table data is physically stored. The second dimension is the column family. This dimension is somewhat analogous to a traditional database column. The third dimension is the column qualifier.

Within each column family, there can be a theoretically infinite number of qualified instances. For example if we were building a URL tagging service, we might define column families content, url, and tag. Within the “tag” column family there could be an infinite number of qualified instances, such as tag:science, tag:theater, tag:good, etc. The fourth and final dimension is the time dimension.

This dimension consists of a timestamp that is usually auto assigned by the system and represents the insertion time of the cell in nanoseconds since the epoch. Conceptually, a table in Hypertable can be thought of as a three dimensional Excel spreadsheet with timestamped versions of each cell. This data model is more versatile to that used by Distributed Hash Table (DHT) technology in that it supports efficient traversal of elements in primary key order.

The Hypertable system is made up of several components: some number of RangeServers, a Master, the Client Library, and Hyperspace. RangeServers are responsible for managing ranges of physical table data. In general, there will be a RangeServer running on each machine in the cluster. There is a single Master that is responsible for meta operations such as table creation/deletion and range assignment. Client data does not move through the Master, so temporary Master failures do not affect typical client operations such as scanning and updating. Though there is a single Master, the system has been designed to support hot standbys.

The client library is what gets linked into an application to give it access to the Hypertable system and provides APIs for creating, updating, scanning, and deleting tables. Hyperspace is somewhat analogous to Google’s Chubby service (see labs.google.com/papers/chubby.html). It is a distributed lock manager and provides a global filesystem for storing small amounts of metadata. At present, Hyperspace is implemented as a single server but will be distributed and highly available in a future release.

Scaling: How it is Achieved

The key to Hypertable’s ability to scale is the way it manages table data. Tables are broken into a set of contiguous row ranges, each of which is managed by a RangeServer. Initially each table consists of a single range that spans the entire row key space. As the table fills with data, the range will eventually exceed a size threshold (default is 200MB) and will split into two ranges using the middle row key as a split point.

One of the ranges will stay on the same RangeServer that held the original range and the other will get reassigned to another RangeServer by the Master. This splitting process continues for all of the ranges as they continue to grow. Active ranges will consume some amount of system resource (e.g. memory and CPU) on the RangeServer machine. As load increases on the cluster as a whole, new machines can get added to provide more capacity.

To illustrate how this works, Figure 1 shows a three node Hypertable cluster. There are two tables in this example (one green, one yellow) and the tables are split into ranges which are evenly distributed across the three nodes, filling each of them to capacity.




FIGURE 1: Three Node Cluster at Maximum Capacity

At this point, the system is stretched to the limit and cannot handle anymore additional load. Expanding the capacity of the cluster is simply a matter of adding some number of new machines and starting RangeServers on them. The RangeServers will register themselves with the Master, at which point the Master will begin to migrate ranges from the overloaded machines onto the new machines that have plenty of spare capacity. Figure 2 shows the cluster after two new machines have been added.




FIGURE 2: Balanced Load with Two Additional Machines

As can be seen by the figure, the addition of the two machines along with the automatic migration of ranges has had a balancing effect of load across the entire cluster. Once these five machines hit capacity, more machines can again be added to balance load. This process can continue indefinitely to meet the growing load demands of our application.

API Walkthrough

The following example assumes that Hypertable has been downloaded, built, and installed. There is a README file in the toplevel directory of the distribution that contains instructions on how to do this. Hypertable has been designed to run on top of an existing distributed filesystem such as Hadoop’s HDFS that provides high data availability via inter-machine data replication. Hypertable writes all of its data files and commit logs into the underlying filesystem and depends on it for fault tolerance.

It assumes that these data files and log files will be available and uses checksums to verify data integrity. To get things running on top of HDFS, install Hadoop and get the distributed filesystem up and running by following the directions that can be found at the Hadoop site. To configure Hypertable to work with your HDFS instance, read the document entitled, “Up and Running with Hadoop” which can be found on the Documentation page of the main project site (www.hypertable.org). You can also just run it on top of your local filesystem on a single machine by using the default configuration and executing the following shell commands:

$ cd $HYPERTABLE_INSTALL_DIR
$ bin/start-all-servers.sh local

The code in the following walk-through has been taken from the Apache Log example which can be found in the examples/apache_log/ subdirectory of the distribution. In this section we’ll walk you through the code in the apache_log_load.cc file which illustrates how to load an Apache log into Hypertable. The other source file, apache_log_query.cc illustrates how to issue queries against tables in Hypertable. We don’t have time to cover the query code in this article, but I encourage you to take a look at it to get a feel for the query/scanner APIs.

Listing 1 shows the main() function in apache_log_load.cc. At the top of the function is a block of variable declarations. The first two, ApacheLogParser and ApacheLogEntry are classes that are included in libHypertable for convenience and handle the logic of parsing an Apache web server log. The next three variable declarations are smart pointers to the hypertable client, a table, and a table mutator. Next comes a declaration for a variable named “key” of type KeySpec, which is used to hold the key specification for each cell inserted into the table. The remaining declarations are for some local state variables, including a boolean called “time_order”. By default, the row keys are constructed as page timestamp. This allows individual pages to be efficiently queried for their click history in chronological order. However, if the user passes in –time-order on the command line, then the row keys are constructed as timestamp page which allows contiguous portions of the log history to be queried efficiently.

The “if” statement following the block of variable declarations handles parsing of the command line. It determines the name of the input file and checks for the --time-order switch, setting the time_order variable to true if it is found. Next comes a try block that is wrapped around three statements.

The first statement constructs a Hypertable client object. The next statement uses the client object to open the “LogDb” table, setting the table smart pointer. The third statement creates a mutator object on the table object, setting the mutator smart pointer.

Listing 1: Loading an Apache Log

int main(int argc, char **argv) {
  ApacheLogParser parser;
  ApacheLogEntry entry;
  ClientPtr client_ptr;
  TablePtr table_ptr;
  TableMutatorPtr mutator_ptr;
  KeySpec key;
  const char *inputfile;
  bool time_order = false;
  String row;

  if (argc == 2)
    inputfile = argv[1];
  else if (argc == 3 && !strcmp(argv[1], \
   "--time-order")) {
    time_order = true;
    inputfile = argv[2];
  }
  else {
    cout << usage << endl;
    return 0;
  }                                                         

  try {
    client_ptr = new Client(argv[0]);
    table_ptr = client_ptr->open_table("LogDb");
    mutator_ptr = table_ptr->create_mutator();
  }
  catch (Exception &e) {
    report_error(e);
    return 1;
  }                                                         

  memset(&key, 0, sizeof(key));                             

  parser.load(inputfile);                                   

  while (parser.next(entry)) {                              

    if (time_order) {
      row = format_timestamp(entry.tm);
      row += " ";
      row += extract_page(entry.request);
    }
    else {
      row = extract_page(entry.request);
      row += " ";
      row += format_timestamp(entry.tm);
    }                                                       

    key.row = row.c_str();
    key.row_len = row.length();                             

    try {
      key.column_family = "ClientIpAddress";
      mutator_ptr->set(key, entry.ip_address);
      key.column_family = "UserId";
      mutator_ptr->set(key, entry.userid);
      key.column_family = "Request";
      mutator_ptr->set(key, entry.request);
      key.column_family = "ResponseCode";
      mutator_ptr->set(key, entry.response_code);
      key.column_family = "ObjectSize";
      mutator_ptr->set(key, entry.object_size);
      key.column_family = "Referer";
      mutator_ptr->set(key, entry.referer);
      key.column_family = "UserAgent";
      mutator_ptr->set(key, entry.user_agent);
    }
    catch (Exception &e) {
      report_error(e);
      return 1;
    }
  }                                                         

  mutator_ptr->flush();                                     

  return 0;
}

After creating the mutator, the key object is cleared and then the load method of the parser object is called to initialize it with the name of the Apache log file. Then the code drops into a while loop that iterates over each entry (line) in the log file. The first thing that happens inside the while loop is that the row key gets constructed. The format of the key depends on the value of the time_order variable as described earlier.

Note that the definitions for the functions extract_page, format_timestamp, and report_error are not displayed in the listing, but can be seen in apache_log_load.cc. After that, each field in the log line gets inserted into a corresponding column with a call to the TableMutator::set() method. After the loop, a call is made to the TableMutator::flush() method which flushes the mutator’s internal send buffers.

To run the example, we first need to create the LogDb table. To do this, we’ll execute the HQL commands in the file create-table.hql, by running the hypertable command interpreter as follows (modified to reflect your installation directory):

~/hypertable/0.9.0.5/bin/hypertable --batch < create-table.hql

Then we need to compile the example by first modifying the variables at the top of the Makefile to point to our installation and then type ‘make’ to build the executables. Once the example has been successfully built, we can run the example program as follows:

./apache_log_load access.log.gz

At this point, the Apache log has been loaded into the LogDb table and we’re ready to try out some queries.

HQL and the Hypertable Command Interpreter

The Hypertable distribution comes with a program called “hypertable” which is an interpreter for a language we call HQL. This language is modeled after SQL and provides commands for creating and manipulating tables. Run with no arguments, the program will enter interactive mode. If you have ever used the mysql command interpreter, this program should be familiar. The first query we’ll issue will return the history, with all of the click information, for the page “/index.html”.

hypertable> SELECT * FROM LogDb WHERE
         -> ROW STARTS WITH "/index.html";

This command produces the output shown in Listing 2.

Listing 2: Selecting all fields for /index.html

/index.html 2008-01-26 00:21:27 ClientIpAddress 81.52.143.15
/index.html 2008-01-26 00:21:27 UserId
/index.html 2008-01-26 00:21:27 Request GET /index.html HTTP/1.1
/index.html 2008-01-26 00:21:27 UserAgent       Mozilla/5.0 ...
/index.html 2008-01-26 00:21:27 ResponseCode    304
/index.html 2008-01-26 00:21:27 ObjectSize
/index.html 2008-01-26 00:21:27 Referer
/index.html 2008-01-26 07:57:15 ClientIpAddress 218.111.214.141
/index.html 2008-01-26 07:57:15 UserId
/index.html 2008-01-26 07:57:15 Request GET /index.html HTTP/1.1
/index.html 2008-01-26 07:57:15 UserAgent       Mozilla/4.0 ...
/index.html 2008-01-26 07:57:15 ResponseCode    200
/index.html 2008-01-26 07:57:15 ObjectSize      12310
/index.html 2008-01-26 07:57:15 Referer http://www.battlephase.com/
heavysteel/
/index.html 2008-01-26 08:29:33 ClientIpAddress 69.208.248.144
/index.html 2008-01-26 08:29:33 UserId
/index.html 2008-01-26 08:29:33 Request GET /index.html HTTP/1.1
/index.html 2008-01-26 08:29:33 UserAgent       Mozilla/4.0 ...
/index.html 2008-01-26 08:29:33 ResponseCode    200
/index.html 2008-01-26 08:29:33 ObjectSize      12310
/index.html 2008-01-26 08:29:33 Referer
...

The next query we’ll issue will return just the client IP addresses for all of the clicks for the page “/index.html”.

hypertable> SELECT ClientIpAddress
         -> FROM LogDb WHERE
         -> ROW STARTS WITH "/index.html";

This command produces the output shown in Listing 3.

Listing 3: Selecting the client IP addresses for /index.html

/index.html 2008-01-26 00:21:27 ClientIpAddress 81.52.143.15
/index.html 2008-01-26 07:57:15 ClientIpAddress 218.111.214.141
/index.html 2008-01-26 08:29:33 ClientIpAddress 69.208.248.144
/index.html 2008-01-26 08:32:56 ClientIpAddress 69.208.248.144
/index.html 2008-01-26 11:09:42 ClientIpAddress 62.24.251.240
/index.html 2008-01-26 12:46:45 ClientIpAddress 88.75.239.225

Summary

Hypertable is an open source, high performance, scalable database designed for modern web 2.0 applications that have a need for scale. If you are interested in using or contributing to Hypertable, please join one of the mailing lists. You can find pointers to them from the main project Web site.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62