SQLAlchemy

Object relational mappers are popular, because the software disguises the impedance mismatch between objects and databases. There are many ORM variants to choose from, but many simply get in the way when you need to do something the ORM wasn’t expressly designed to do. SQLALchemy is different. It aims to provide you with objects that look and act like real objects, and data sets that provide the full power and flexibility of relational algebra. Here’s a tour.

Objects and databases have been “shacking up” for a long time. At times, everything goes swimmingly, and the relationship between the two seems like a lot of fun. Other times, the couple’s fundamental differences are just plain obvious, and it takes real effort to keep the pair talking.

Object-relationship mappers (ORMs) can dispense with some of the bickering — a kind of marriage counseling for troubled software — and developers have many implementations to choose from. The trouble is, you can’t always trust your ORM’s advice. Choose the wrong ORM and your software’s back on the rocks.

SQLAlchemy (http://sqlalchemy.org) is different. It’s like Dr. Phil, and your objects and databases will live happily ever after.

Object relational mappers are popular, because the software disguises the impedance mismatch between objects and databases. There are many ORM variants to choose from, each with its own strengths and weaknesses, and any number of solutions might suit your needs.

However, many of the people who’ve built open source object relational mappers have only gone part way. The end-result is crippled objects or crippled relational data, and often, the limitations are overlooked because the software works well enough. David Heinemeier Hansson described the Active Record project this way: “I’d like to think, though, that Active Record gives you 80 percent of the features for 20 percent of the effort.”

For particular projects, where you don’t need any of that last 20%, Active Record certainly can make you life easier. But what happens if Active Record or your chosen ORM doesn’t provide the right 80 percent? You end up hard-coding SQL strings into your application, and lose most of the things your ORM solution was providing.

SQLALchemy is different. It aims to provide you with objects that look and act like real objects, and data sets that provide the full power and flexibility of relational algebra. While it almost certainly took Michael Bayer (fouder of the SQLAlchemy project) and the rest of the SQLAlchemy team 80 percent longer to write than most open source ORMs, it’s definitely not 80 percent harder to use.

In fact, once you understand the patterns and practices that SQLAlchemy is built on, it’s power and flexibility is easiy to apply in a large percentage of real world projects. This is particularly true whenever you have legacy data, large complex data sets, or have multiple applications hitting the same database.

A Front-To-Back Toolkit

One of the common failings of object relational mappers is the inability to easily handle unexpected situations. In fact, Ted Neward has called object/relational mapping “the Vietnam of Computer Science.” (http://blogs.tedneward.com/2006/06/26/The+ Vietnam+ Of+ Computer+ Science.aspx). He argues that choosing an object relational mapper “represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear[ victory] conditions, and no clear exit strategy.”

SQLAlchemy is a clear winner, because Michael Bayer, the founder of the SQLAlchemy project, decided up front to build a full database toolkit that is useful even when you don’t want or need the ORM. Using SQLAlchemy, you can drop down to the the base level of database tables, queries, and result sets. You don’t need any of the fancy ORM features to get real work done.

Let’s take a look as a simple database for user authentication and permissions. Generally, you need three entities to build this system: users, groups, and permissions. Users are related to groups, and groups to permissions. (This particular table structure and all the related code are taken directly from the SQLAlchemy support folder in the TurboGears project.)

The first thing to do is create the table structures in your database. SQLAlchemy makes this easy — just define a table object like this:

users_table = Table(’tg_user’, metadata,
Column(’user_id’, Integer, primary_key=True),
Column(’user_name’, Unicode(16), unique=True),
Column(’password’, Unicode(40)),
Column(’created’, DateTime, default=datetime.now))

The first parameter to the Table constructor is the database table name, or in this case, tg_user. The second parameter is a special metadata object that knows how what kind of database you are using and how to get a connection to it.

Once you’ve got a table object you can use it to create database tables programatically from a setup script (or your application for that matter) with a simple create() method call:

users_table.create()

If on the other hand you’ve got a legacy database, and you just want to use that without repeating the table definition information in your Python program, just write:

users = Table(’users’, metadata, autoload=True)

Of course, this all presupposes that you’ve created a metadata object to contain information about what database engine you are using. Fortunately SQLAlchemy makes that easy too, all you need to do is initialize the database engine…

my_engine = create_engine(’sqlite:////linuxmagazine-example.db’)

… and then create a new metadata object bound to that engine:

metadata = BoundMetaData(my_engine)

The URI string that you pass to create_engine() will vary slightly based on the type of database you are using, but it’s always easy to set up.

Here’s a flavor for doing basic database operations directly on a table object, and this is important, because it’s exactly what’s going on underneath the ORM layer. To add a new user to the database, create an insert object and pass it’s execute() method some values for each of the columns, by column name.

i = users_table.insert()
i.execute(name="Max", password=’headroom’)

If you want to insert multiple rows of data with a single SQL statement, pass the execute() method a series of dictionaries, one for each row in the database, like this:

i.execute(
{’name’: ’Theora Jones’, ’password’: ’carter’},
{’name’: ’Eugene’, ’password’: ’truth’})

You may have noticed that the previous inserts skipped the primary key value. SQLAlchemy automatically ensures that the primary key is assigned some unique value. Different database engines handle this kind of thing differently, but you don’t have to worry about any of that.

In the same way you create an insert object, you can also create other “SQL statement” objects. For example, you can create a new select object like this:

s = user_table.select()
returned_records = s.execute()

You can iterate over the returned_records object to extract one row at a time. You can access the data for each column via dictionary lookups, object attribute lookups, or as positional elements in a tuple.

for user in returned_records:
print user.name, ’has password’, user.password

SQLAlchemy has statements for pretty much every SQL statement you can think of including joins, ordering, grouping, functions, correlated subqueries, unions, and more. Detailed documentation on each is available online at http://www.sqlalchemy.org/docs. Remember you don’t have to work at this level all the time. But, you can whenever you want to.

Before continuing, take a look at closer look at the code in Listing One. You may notice that Python objects like datetime are automatically transformed into database-appropriate DateTime objects on the way into the database, and back to Python objects on the way out.

LISTING ONE: SQLAlchemy automatically converts objects to database-specific formats and vice versa

users_table = Table(’tg_user’, metadata,
Column(’user_id’, Integer, primary_key=True),
Column(’user_name’, Unicode(16), unique=True),
Column(’password’, Unicode(40)),
Column(’created’, DateTime, default=datetime.now)
)
groups_table = Table(’tg_group’, metadata,
Column(’group_id’, Integer, primary_key=True),
Column(’group_name’, Unicode(16), unique=True),
Column(’display_name’, Unicode(255)),
Column(’created’, DateTime, default=datetime.now)
)

permissions_table = Table(’permission’, metadata,
Column(’permission_id’, Integer, primary_key=True),
Column(’permission_name’, Unicode(16), unique=True),
Column(’description’, Unicode(255))
)

user_group_table = Table(’user_group’, metadata,
Column(’user_id’, Integer, ForeignKey(’tg_user.user_id’)),
Column(’group_id’, Integer, ForeignKey(’tg_group.group_id’))
)

group_permission_table = Table(’group_permission’, metadata,
Column(’group_id’, Integer, ForeignKey(’tg_group.group_id’)),
Column(’permission_id’, Integer, ForeignKey(’permission.permission_id’))
)

Now that you know a little bit about the bottom layers of SQLAlchemy, let’s take a higher level look at some of the industrial strength patterns it provides to make your life easier when working with large complex datasets.

If you’re familiar with Patterns of Enterprise Application Architecture by Martin Fowler, you’ve probably already heard terms like Active Record, Data Mapper, Domain Model, Unit of Work, and Identity Map. These are the concepts behind SQLAlchemy’s power, flexibility, and ease of use. Understanding how to use the data access patterns helps you write better code, even if you can’t use SQLAlchemy.

The Data Mapper Pattern

The Data Mapper pattern decouples database load and save logic from your domain objects (as seen in Figure One). This allows domain objects to act like normal everyday objects — but more important, it allows you to map objects to arbitrary relations in the database. Your database is free to be a relational algebra engine, and your objects are free to be plain old data objects.

This is the key element that makes SQLAlchemy more powerful than Active Record-based ORMs, which impose significant constraints on your domain model. With the Data Mapper pattern, you can create whatever objects you want, and once created, add a “mapper” that knows how to save the data in those objects to the database.

Hibernate and other Data Mapper based systems do this, but SQLAlchemy provides a remarkably simple mechanism for setting up your mappers. And because Python is a dynamic language, your mappers can automatically add properties to your domain objects. This keeps things easy, and makes it almost as simple to write a Data Mapper-based domain model as it would be to use the Active Record pattern.

For example you could create an ultra-simple Group class to go with the group_table found in Listing One:

class Group(object):
pass

The Group class can define any behaviors and attributes you need, and is just a normal Python object. But in this case, assume all you want is a few attributes based on the data from the database; these can be automatically added by our mapper. So the class definition can be remarkably simple. Assigning a mapper is simple:

assign_mapper(session.context, Group, groups_table)

This maps Group objects to the groups_table, adding properties for each of the columns in the database. But in this case, there’s more required. You must be able to easily get a list of users who are members of that group. You can use this mapper instead:

assign_mapper(session.context, Group, groups_table,
properties=dict(users=relation(User,
secondary=user_group_table,
backref=’groups’)))

The permissions keyword argument allows you to create special properties that map across relations. In this case, you’re mapping to users through the user_group_table, so a many-to-many join is required, and the property being added is a list. This adds getters and setters to the Groups class for this property too, so now you can add users to a group by simply appending them to the attached user list. Of course if you have a one-to-many relationship, SQLAlchemy can get a single item back rather than a list.

The backref=’groups’ argument to the relation setup is a convenience method that lets you create both sides of a relationship at once. So, in this case, Users objects will have a groups property, which is a list of the groups mapped through the user_group_table.

SQLAlchemy’s syntax for querying to instantiate objects from the database mirrors the syntax you saw earlier. But, in this case, you create a query object, and run the statements against that object. So, you can now write things like…

specific_user = session.query(User).get(user_id)

… to retrieve a paritcular user given a user_id. But you can also get back a collection of users, with code like this:

users_list = session.query(User).select()

This code uses queries attached to a session object. Sessions are a fundamental building concept in SQLAlchemy (more in a moment).

It’s worth mentioning that you can create Mapped Domain objects based on an arbitrary joins, or any select result set. This means that you could create a single user object that contains group and permission information. That means, you are data that spans 5 different tables to produce a single easy to use User object object.

The full Domain Object code from the TurboGears permission system, can be seen in Listing Two, and it should be pretty simple to understand. The creation of a permissions property in that code is done with a combination of SQLAlchemy syntax, and plain pPthon, but it demonstrates the way that TurboGears code uses the groups property on a user object to transparently access relational data in object-oriented style.

LISTING TWO: The full domain object code from the TurboGears permission system

class Group(object):
pass

class User(object):
def permissions(self):
perms = set()
for g in self.groups:
perms = perms | set(g.permissions)
return perms
permissions = property(permissions)

class Permission(object):
pass

mapper(User, users_table)
mapper(Group, groups_table,
properties=dict(users=relation(User, secondary=user_group_table, backref=’groups’)))
mapper(Permission, permissions_table,
properties=dict(groups=relation(Group, secondary=group_permission_table, backref=’permissions’)))

The Unit of Work and Identity Map Patterns

The final two patterns critical to understanding the power and flexibility of SQLAlchemy are the Unit of Work and the Identity Map. Fundamentally, the Unit of Work pattern provides a way to store up database changes and send them to the database all at once. This can make a huge performance difference, because a single query that updates a thousand user records will be significantly faster than 1000 queries that update a single record each.

Many ORMs that follow the Active Record pattern are forced to do SQL UPDATE commands immediately and do them one at a time. This is because the Active Record pattern is built on records knowing how to save themselves to the database.

In SQLAlchemy, the mapper knows how to save updated data from the session to the database via the session.flush() command. This is where the Identity Map comes in. All data that came from the database is stored in a special Identity Map object, which keeps track of where it came from and whether or not it has been changed.

So, session.flush() looks for all updated (or added) objects and saves them to the database all at once. But, of course the session object could have updates for various tables, with various foreign key constraints which mean that some updates have to be done before others can happen.

Unfortunately, for larger Units of Work, saving everything in the right order is a non-trivial task. But fortunately, SQLAlchemy does all that work for you, and all you really need to know is that you should flush() your sessions when you want to save to the database.

Pure Magic

When database libraries constrain what you can do because “your data structures shouldn’t be that complicated” something has clearly gone wrong.

If your data is simple, your structures can be simple, but when you have complex data and you short change your underlying relational data model to serve the needs of your object relational mapper, chances are your application will grow to the point where you come to regret parts of that decision.

With SQLAlchemy, on the other hand, you can be sure that whatever crazy thing you need to do tomorrow, the tools will be there to support it.

If you’re looking for more information on transaction management, or how to use stored procedures with SQLAlchemy, the online docs at sqlalchemy.org are amazing, and cover all of that and much more. And if you’re just looking for a way into SQLAlchemy, there’s an easy tutorial on Robin Munn’s blog at http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62