Embed SQL with SQLite

Suppose that you want to contact a company that you've found on the Internet, but you don't know where it's located, and the only contact information provided on the "About the Company" web page contains a phone number with the unfamiliar area code of 323. You could call them up, hope to reach a real person, and ask where the company is located. Or, you could look up what geographical region uses that area code. Using everyone's favorite search engine, and within a few clicks, you see that 323 is a new area code for Los Angeles. Problem solved, and time to move on.

Suppose that you want to contact a company that you’ve found on the Internet, but you don’t know where it’s located, and the only contact information provided on the “About the Company” web page contains a phone number with the unfamiliar area code of 323. You could call them up, hope to reach a real person, and ask where the company is located. Or, you could look up what geographical region uses that area code. Using everyone’s favorite search engine, and within a few clicks, you see that 323 is a new area code for Los Angeles. Problem solved, and time to move on.

A few days later, you hear a factoid on the evening news that proclaims Afghanistan is slightly smaller than the state of Texas. OK, but if you’re from, say, Pennsylvania, and don’t know how large Texas is, the factoid is rather meaningless. For you, a comparison of the size of Afghanistan relative to the size of Pennsylvania would be a much better comparison. So, you hop on over to the CIA World Fact Book (http://www.cia.gov/cia/publications/factbook) and the United States Census Bureau’s Quickfacts page (http://quickfacts.census.gov). According to the Census Bureau, Texas is roughly 5.8 times as large as Pennsylvania. Checking back at the CIA, Afghanistan is about 5.5 times as large as Pennsylvania. Aha! Another problem solved by the Internet!

Back in the old days (you know, like 1989), if you wanted to look up factoids, you’d reach to your bookshelf and pick up an almanac. What do you do today? You probably visit Google. But what do you do if you want to keep all of your precious factoids handy? Or what happens if you’re not connected to the Internet?

One handy solution is to write a program that collects and manages factoids. There are simpler solutions, like storing a file of area codes in your home directory or some place like /usr/local/share, but large, unstructured, flat files are not necessarily the best choice. You could store all of these factoids in a set of XML files, but writing a lot of code to manage and query XML data is entirely too much work.

You could also use a relational database, but relational databases have a downside, too: database management. Using MySQL, PostgreSQL, Firebird, or any other relational database management system (RDBMS) means administering a database. Is the database running? Are your credentials with the database up-to-date? Do you have adequate permissions to add or view the almanac data? If you want to run an almanac program locally, do you have to install the database server on every machine before you use the almanac program? For a simple little utility or a stand-alone application, a server database like MySQL or PostgreSQL is overkill.

But a relational database is the right solution to this problem — if only you could do away with the “M” part of RDBMS. And that’s exactly where SQLite, written by D. Richard Hipp and available from http://www.sqlite.org, comes in.

SQLite: All the Taste, None of the Fat

SQLite is a full featured, embeddable, relational database engine. It supports a good chunk of SQL 92, including subqueries, views and transactions, and sports a very powerful, yet minimalist C interface, consisting largely of three API functions: sqlite_open(), sqlite_exec(), and sqlite_close(). The full API is much richer, and includes the ability to add user-defined SQL functions into the database engine.

Compared to MySQL, SQLite offers many of the same features with only 10% as much code. On top of that, SQLite is twice as fast as MySQL or PostgreSQL for many common operations. SQLite bindings are available for Tcl, Perl, Python, PHP, Ruby, and other programming languages. The C binding is described at http://www.sqlite.org/c_interface.html.

Of course, there’s no such thing as a free lunch. While SQLite supports multiuser access, it’s light on multiuser features. For example, it doesn’t support fine-grained access rights within the database. Instead, it uses standard file permissions to control read/write access to the entire database. If you have read/write access to the file containing a complete SQLite database, you can make any changes you want to that database, including modifying the schema, selecting or deleting rows, or even dropping tables. The upside is that creating a read-only database is as simple as chmod -w some.db.

SQLite also requires that any directory containing a database be writable so SQLite can create its journaling file during transactions. Also, when one connection is in the middle of a transaction, the entire database is locked — so don’t run out to replace MySQL with SQLite in the name of speed.

Other key differences between SQLite and most other databases are data typing and data integrity. In SQLite databases, all columns are unbounded variable length strings, except for the row identifier, which is always an integer. Numeric values in numeric expressions (e.g. SUM(), MAX(), etc.) are interpreted intelligently, but SQLite will not stop you from changing the numeric value in a TOTAL_COST field from “325″ to “Four Hundred.” This may or may not be a problem in your application.

Even with all of these caveats, SQLite provides an excellent relational database backend for small, low-end applications. Not every application needs to support multiple writers or even frequent database updates. In many cases, the value of a relational database engine comes is its ability to make ad hoc queries of data stored across multiple tables. SQLite provides these features without any of the overhead required for more advanced applications.

Building the Almanac

Although a simple database lookup program can be written in practically any programming language, there are benefits to writing it in C. The biggest benefit? Coding in C avoids external dependencies that can complicate program maintenance.

For example, Matt Sergeant has written the DBD::SQLite interface for Perl, but writing an almanac program in Perl means maintaining dependencies on both DBD::SQLite and DBI. While both of these modules are excellent pieces of software, they’re two pre-requisites that need to be installed on every computer where you want to use the almanac program. Also, updating or removing either of these two Perl modules could cause the almanac program to stop working when you least expect it.

Writing the program in C avoids all dependencies, since the almanac program and the SQLite database engine are statically linked in a single executable. SQLite is tiny — the sample application and the SQLite engine builds from scratch in less than a 30 seconds. (Really.)

Building the Database

The almanac program needs to do exactly two things: store new data and query data in the database. Loading a set of factoids creates a new table and adds a query for that data to a separate queries table. Let’s look at each step.

The easiest way to load a new database is to issue a bunch of SQL statements in one batch. Fortunately, the sqlite_ exec() function takes a parameter containing one or more SQL statements. Therefore, you can convert some factoids into a file of SQL statements (using a Perl script or something), and then load them up all at once with a single sqlite_exec() call. This is shown in Listing One.

Listing One: Bulk-loading data into an SQLite database

1 char *read_file(char *filename) {

2 FILE *input = NULL;

3 struct stat statbuf;

4 char *buffer = NULL;

5 int bytes;


7 if(stat(filename, &statbuf) == -1)

8 return NULL;


10 bytes = statbuf.st_size + 1;

11 buffer = (char *) malloc(bytes);


13 if (input = fopen(filename, “r”)) {

14 bytes = fread(buffer, 1, statbuf.st_size, input);

15 if (bytes == statbuf.st_size) {

16 buffer[bytes] = 0;

17 } else {

18 free(buffer);

19 buffer = NULL;

20 }

21 fclose(input);

22 }


24 return buffer;

25 }


27 void load_data(sqlite *db, char *filename) {

28 char *error = NULL;

29 char *buffer = read_file(filename);


31 int i = sqlite_exec(db, buffer, NULL, NULL, &error);


33 if (error) {

34 fprintf(stderr, “Database error: %s\n”, error);

35 free(error);

36 } else {

37 printf(“%s: %d rows changed\n”, filename, sqlite_changes(db));

38 }


40 free(buffer);

41 }

The code couldn’t be any simpler. The read_file() function takes a file name as a parameter, uses malloc() to grab a chunk of memory to hold the entire contents of the file, and returns that buffer. The load_data() function slurps up the file of SQL statements, and passes them to sqlite_ exec().

The sqlite_exec() function takes two mandatory parameters and three optional parameters. The first two parameters are a SQLite database handle and the buffer of SQL statement(s) to execute, respectively. The third parameter is a pointer to a callback function that’s called once for each and every row returned. The fourth parameter is an optional piece of user data that’s passed to the callback function. The last parameter is a pointer that receives an error message if an error occurs.

Many SQLite API functions create buffers dynamically, but it’s the caller’s responsibility to free a buffer when it’s longer needed. This is why the error buffer is free‘d at the very end.

Registering Queries

By design, because each set of factoids is stored in a different table, it’s necessary to persist some bookkeeping information in the almanac database. Two tables track this data: the registry table keeps track of what sets of factoids have been loaded into the almanac, and the queries table stores ad hoc queries on factoids loaded in the database. These two tables need to be created when the database is initialized.

Listing Two details the two functions that initialize this bookkeeping information. First, the code checks to see if these tables exist. If they do not, then the database is new, and these two tables need to be added.

Listing Two: Initializing the database

1 void init_table(sqlite *db, char *check_sql, char *create_sql) {

2 char **result = NULL;

3 char *error = NULL;

4 int rows, cols;


6 int i = sqlite_get_table(db, check_sql, &result, &rows, &cols, &error);


8 if (i == SQLITE_ERROR) {

9 free(error);

10 i = sqlite_exec(db, create_sql, NULL, NULL, &error);

11 if (error) {

12 fprintf(stderr, “Database error: %s\n”, error);

13 free(error);

14 }

15 }


17 sqlite_free_table(result);

18 }


20 void init_database(sqlite *db) {

21 init_table(db, “SELECT COUNT(*) FROM registry”,

22 “CREATE TABLE registry (package, desc)”);


24 init_table(db, “SELECT COUNT(*) FROM queries”,

25 “CREATE TABLE queries (package, name, desc, sql)”);

26 }

The init_database() function in Listing Two handles the high-level initialization work, ensuring that the queries and registry tables exist in the database, creating them if necessary. This is done by specifying both a query that fails if a table is missing, and an SQL statement that creates the table if it’s missing.

The init_table() function in Listing Two uses a few new SQLite API calls. The sqlite_exec() function works well if you want to specify a callback function for each row found, but that can be difficult to coordinate, so the sqlite_ get_table() function yields all of the data at once. This function takes six parameters: the standard database structure; a SQL statement and error buffer; a buffer to receive the data; and pointers to integers to receive the number of rows and columns.

Data returned by sqlite_get_table() is a one-dimensional array of strings. Here, the simplicity of the SQLite architecture shows through: because everything is a string, no special case code is necessary to interrogate a field to determine its type and what kind of data it stores.

For the first query executed by init_table(), the result array contains two elements: a column name, followed by a single row containing a single column of data. If this query contained two rows and three columns, result would contain nine elements: three column names, followed by three data elements from the first row, followed by three data elements from the second row. (NULL field values are represented by null pointers.)

If this first query succeeds, the database was previously initialized. If not, then a call to sqlite_exec() is necessary to execute the CREATE TABLE statement that adds the table to the database.

When using sqlite_get_table(), SQLite will allocate the memory for the result parameter. This is one of the parameters that the caller needs to free when no longer necessary. However, the result parameter won’t be used if the query returns no results. Therefore, the sqlite_free_table() call exists to free this data, but only when it’s been used.

Querying Data

Once data has been loaded into the database, querying it is a little tricky. To use the almanac program, a query name and query parameter should be specified on the command line. The first step is to get the SQL statement that corresponds to a query name. Queries are stored in the queries table, which associates a query name with an SQL statement. Therefore, the first query will search the queries table to find the precise SQL statement to use next.

The second step is to use the query found from the first step and execute it with the query parameter from the command line. Executing a query with user specified parameter should return data from the almanac.

Listing Three shows the two functions that handle these tasks. The first, query_ database() uses sqlite_get_table_ printf(), a variant of sqlite_get_ table() that processes printf() style escape codes within an SQL statement. The first call, on line 6, selects the query that’s being used. If there’s no match, the code prints an error message. Otherwise, use that query string along with the parameter to query the database on line 16.

Listing Three: Finding data in the almanac

1 void query_database(sqlite *db, char *query_name, char *arg) {

2 char **result = NULL;

3 char *error = NULL;

4 int rows, cols, i, j;


6 i = sqlite_get_table_printf(db,

7 “SELECT sql FROM queries WHERE name=’%s’”,

8 &result, &rows, &cols, &error,

9 query_name);


11 if (rows == 0) {

12 fprintf(stderr, “Error: no such query: %s\n”, query_name);

13 } else {

14 if (i == SQLITE_OK) {

15 rows = 0;

16 sqlite_exec_printf(db, result[1], callback,

17 &rows, &error, arg);


19 if (error) {

20 fprintf(stderr, “Database error: %s\n”, error);

21 free(error);

22 }


24 if (rows == 0) {

25 printf(“Sorry, no matches for ;%s’\n”, arg);

26 }

27 } else {

28 fprintf(stderr, “Database error: %s\n”, error);

29 free(error);

30 }

31 }


33 sqlite_free_table(result);

34 }


36 int callback(void *pArg, int argc, char **argv, char **columns){

37 int i, *rowcount = (int *)pArg;


39 *rowcount += 1;


41 for(i=0; i<argc; i++) {

42 printf(“%s: %s\n”, columns[i], argv[i]);

43 }


45 printf(“\n”);


47 return 0;

48 }

The second query uses sqlite_exec_ printf(), which behaves like sqlite_ exec(), and takes a callback function. Recall that sqlite_exec() can take a pointer to a user-defined data structure. Here, the pointer is a pointer to the rows variable.

Each time a row is found, the callback() function is called, which prints out a sequence of name:value pairs on standard output. The pArg parameter is actually a pointer to row in query_ database(), so each time the callback() function is called, the row variable in query_database() is incremented.

When all rows have been processed, control returns to query_database(). If any rows have been processed, the row variable will be nonzero. If it’s still zero, then a warning message is sent to stdout, noting that no rows were found.

Pulling it All Together

Finally, there’s main(): it’s responsible for opening and closing the database, identifying the operation to perform (a load or a query), and performing that operation. At this level, init_database() is always called, but it only modifies the database when the registry and query tables are missing.

The only big decision here is determining whether to call load_data() and add data to the database, or call query_ database() and attempt to perform a query. When the -load argument is specified on the command line, the next parameter is a file containing SQL statements to load into the database. At least one of those SQL statements will add a row to the queries table, registering a new kind of query that can be performed with the almanac program.

If the first parameter is not -load, then it’s a name of a query, and the second parameter is a parameter for that query. For brevity, parameter validation has been omitted

Listing Four: Pulling it all together

1 int main (int argc, char *argv[]) {

2 sqlite *db;

3 char *error = NULL;


5 if (argc == 1) {

6 fprintf(stderr, “Usage: almanac ?load filename [filename....]\n”);

7 fprintf(stderr, ” <query name> <parameter>\n”);

8 exit(-1);

9 }


11 db = sqlite_open(“almanac.db”, 0, &error);

12 if (error) {

13 fprintf(stderr, “Database error: %s\n”, error);

14 free(error);

15 exit(-2);

16 }


18 init_database(db);


20 if (!strcmp(argv[1], “-load”)) {

21 int i;

22 for (i=2; i<argc; i++)

23 load_data(db, argv[i]);

24 } else {

25 query_database(db, argv[1], argv[2]);

26 }


28 sqlite_close(db);

29 return 0;

30 }

Keep it Simply Simple

Writing this program was an exercise in simplicity. Although this program could have been written using any database engine or any programming language, using SQLite and C provided some significant advantages. This combination eliminates all unnecessary dependencies that could possibly break in the future.

While SQLite may lack some of the features of databases like MySQL or PostgreSQL, it is certainly powerful enough for many small, interesting projects.

MySQL vs. SQLite: Which is really faster?

According to some benchmarks found on the SQLite web site (http://www.sqlite.org/speed.html), SQLite is faster than open source databases MySQL and PostgreSQL at some common tasks, and significantly faster at batch INSERT, UPDATE, and DELETE operations.

David Axmark, co-founder of MySQL AB, believes that SQLite can’t really be twice as fast MySQL. After all, the MySQL team prides themselves on optimizing every last inefficiency out of their database engine.

David attributes the performance difference to the fact that the SQLite database engine is embedded in the benchmark programs, while MySQL and PostgreSQL are using TCP/ IP-based client libraries to communicate with a database server. Therefore, all of the extra time MySQL and PostgreSQL spent in the benchmarks was primarily caused by network communication overhead.

MySQL AB is working on an embedded version of their database engine, libmysqld, callable from C++, that David suspects will perform comparably to SQLite.

Adam Turoff is an independent consultant based in the Washington, DC area. He can be reached at ziggy@panix.com. You can download the code used in this article from http://www.linux-mag.com/downloads/2003-10/compile.

Comments are closed.