The best ORMs for database-powered Python apps
Posted by Richy George on 15 November, 2023
was originally published on this site
When you want to work with a relational database in Python, or most any other programming language, it’s common to write database queries “by hand,” using the SQL syntax supported by most databases.
This approach has its downsides, however. Hand-authored SQL queries can be clumsy to use, since databases and software applications tend to live in separate conceptual worlds. It’s hard to model how your app and your data work together.
Another approach is to use a library called an ORM, or object-relational mapping tool. ORMs let you describe how your database works through your application’s code—what tables look like, how queries work, and how to maintain the database across its lifetime. The ORM handles all the heavy lifting for your database, and you can concentrate on how your application uses the data.
This article introduces six ORMs for the Python ecosystem. All provide programmatic ways to create, access, and manage databases in your applications, and each one embodies a slightly different philosophy of how an ORM should work. Additionally, all of the ORMs profiled here will let you manually issue SQL statements if you so choose, for those times when you need to make a query without the ORM’s help.
6 of the best ORMs for Python
- Django ORM
- Tortoise ORM
The Django web framework comes with most everything you need to build professional-grade websites, including its own ORM and database management tools. Most people will only use Django’s ORM with Django, but it is possible to use the ORM on its own. Also, Django’s ORM has massively influenced the design of other Python ORMs, so it’s a good starting point for understanding Python ORMs generally.
Models for a Django-managed database follow a pattern similar to other ORMs in Python. Tables are described with Python classes, and Django’s custom types are used to describe the fields and their behaviors. This includes things like one-to-many or many-to-many references with other tables, but also types commonly found in web applications like uploaded files. It’s also possible to create custom field types by subclassing existing ones and using Django’s library of generic field class methods to alter their behaviors.
Django’s command-line management tooling for working with sites includes powerful tools for managing a project’s data layer. The most useful ones automatically create migration scripts for your data, when you want to alter your models and migrate the underlying data to use the new models. Each change set is saved as its own migration script, so all migrations for a database are retained across the lifetime of your application. This makes it easier to maintain data-backed apps where the schema might change over time.
Peewee has two big claims to fame. One, it’s a small but powerful library, around 6,600 lines of code in a single module. Two, it’s expressive without being verbose. While Peewee natively handles only a few databases, they’re among the most common ones: SQLite, PostgreSQL, MySQL/MariaDB, and CockroachDB.
Defining models and relationships in Peewee is a good deal simpler than in some other ORMs. One uses Python classes to create tables and their fields, but Peewee requires minimal boilerplate to do this, and the results are highly readable and easy to maintain. Peewee also has elegant ways to handle situations like foreign key references to tables that are defined later in code, or self-referential foreign keys.
Queries in Peewee use a syntax that hearkens back to SQL itself; for example,
Person.select(Person.name, Person.id).where(Person.age>20). Peewee also lets you return the results as rich Python objects, as named tuples or dictionaries, or as a simple tuple for maximum performance. The results can also be returned as a generator, for efficient iteration over a large rowset. Window functions and CTEs (Common Table Expressions) also have first-class support.
Peewee uses many common Python metaphors beyond classes. For instance, transactions can be expressed by way of a context manager, as in
with db.atomic():. You can’t use keywords like
not with queries, but Peewee lets you use operators like
Sophisticated behaviors like optimistic locking and top n objects per group aren’t supported natively, but the Peewee documentation has a useful collection of tricks to implement such things. Schema migration is not natively supported, but Peewee includes a
SchemaManager API for creating migrations along with other schema-management operations.
PonyORM‘s standout feature is the way it uses Python’s native syntax and language features to compose queries. For instance, PonyORM lets you express a
SELECT query as a generator expression:
query = select (u for u in User if u.name == "Davis").order_by(User.name). You can also use lambdas as parts of queries for filtering, as in
query.filter(lambda user: user.is_approved is True). The generated SQL is also always accessible.
When you create database tables with Python objects, you use a class to declare the behavior of each field first, then its type. For instance, a mandatory, distinct
name field would be
name = Required(str, unique=True). Most common field types map directly to existing Python types, such as
bytes (for BLOB data), and so on. One potential point of confusion is that large text fields use PonyORM’s
LongStr type; the Python
str type is basically the underlying database’s
PonyORM automatically supports JSON and PostgreSQL-style
Array data types, as more databases now support both types natively. Where there isn’t native support, PonyORM can often shim things up—for example, SQLite versions earlier than 3.9 can use
TEXT to store JSON, but more recent versions can work natively via an extension module.
Some parts of PonyORM hew less closely to Python’s objects and syntax. To describe one-to-many and many-to-many relationships in PonyORM, you use
Set(), a custom PonyORM object. For one-to-one relationships, there are
PonyORM has some opinionated behaviors worth knowing about before you build with it. Generated queries typically have the
DISTINCT keyword added automatically, under the rationale that most queries shouldn’t return duplicates anyway. You can override this behavior with the
.without_distinct() method on a query.
A major omission from PonyORM’s core is that there’s no tooling for schema migrations yet, although it’s planned for a future release. On the other hand, the makers of PonyORM offer a convenient online database schema editor as a service, with basic access for free and more advanced feature sets for $9/month.
SQLAlchemy is one of the best-known and most widely used ORMs. It provides powerful and explicit control over just about every facet of the database’s models and behavior. SQLAlchemy 2.0, released early in 2023, introduced a new API and data modeling system that plays well with Python’s type linting and data class systems.
SQLAlchemy uses a two-level internal architecture consisting of Core and ORM. Core is for interaction with database APIs and rendering of SQL statements. ORM is the abstraction layer, providing the object model for your databases. This decoupled architecture means SQLAlchemy can, in theory, use any number or variety of abstraction layers, though there is a slight performance penalty. To counter this, some of SQLAlchemy’s components are written in C (now Cython) for speed.
SQLAlchemy lets you describe database schemas in two ways, so you can choose what’s most appropriate for your application. You can use a declarative system, where you create
Table() objects and supply field names and types as arguments. Or you can declare classes, using a system reminiscent of the way dataclasses work. The former is easier, but may not play as nicely with linting tools. The latter is more explicit and correct, but requires more ceremony and boilerplate.
SQLAlchemy values correctness over convenience. For instance, when bulk-inserting values from a file, date values have to be rendered as Python date objects to be handled as unambiguously as possible.
Querying with SQLAlchemy uses a syntax reminiscent of actual SQL queries—for example,
select(User).where(User.name == "Davis"). SQLachemy queries can also be rendered as raw SQL for inspection, along with any changes needed for a specific dialect of SQL supported by SQLAlchemy (for instance, PostgreSQL versus MySQL). The expression construction tools can also be used on their own to render SQL statements for use elsewhere, not just as part of the ORM. For debugging queries, a handy
echo=True options` lets you see SQL statements in the console as they are executed.
Various SQLAlchemy extensions add powerful features not found in the core or ORM. For instance, the “horizontal sharding” add-on transparently distributes queries across multiple instances of a database. For migrations, the Alembic project lets you generate change scripts with a good deal of flexibility and configuration.
SQLObject is easily the oldest project in this collection, originally created in 2002, but still being actively developed and released. It supports a very wide range of databases, and early in its lifetime supported many common Python ORM behaviors we might take for granted now—like using Python classes and objects to describe database tables and fields, and providing high levels of abstraction for those activities.
With most ORMs, by default, changes to objects are only reflected in the underlying database when you save or sync. SQLObject reflects object changes immediately in the database, unless you alter that behavior in the table object’s definition.
Table definitions in SQLObject use custom types to describe fields—for example,
StringCol() to define a string field, and
ForeignKey() for a reference to another table. For joins, you can use a
MultipleJoin() attribute to get a table’s one-to-many back references, and
RelatedJoin() for many-to-many relationships.
sqlmeta class gives you more control over a given table’s programmatic behaviors—for instance, if you want to provide your own custom algorithm for how Python class names are translated into database table names, or a table’s default ordering.
The querying syntax is similar to other ORMs, but not always as elegant. For instance, an OR query across two fields would look like this:
A whole slew of custom query builder methods are available for performing different kinds of join operations, which is useful if you explicitly want, say, a
FULLOUTERJOIN instead of a
SQLObject has little in the way of utilities. Its biggest offering there is the ability to dump and load database tables to and from CSV. However, with some additional manual work, its native admin tool lets you record versions of your database’s schema and perform migrations; the upgrade process is not automatic.
Tortoise ORM is the youngest project profiled here, and the only one that is asynchronous by default. That makes it an ideal companion for async web frameworks like FastAPI, or applications built on asynchronous principles, generally.
Creating models with Tortoise follows roughly the same pattern as other Python ORMs. You subclass Tortoise’s
Model class, and use field classes like
ManyToManyField to define fields and their relationships. Models can also have a
Meta inner class to define additional details about the model, such as indexes or the name of the created table. For relationship fields, such as
OneToOne, the field definition can also specify delete behaviors such as a cascading delete.
Queries in Tortoise do not track as closely to SQL syntax as some other ORMs. For instance,
User.filter(rank="Admin") is used to express a
SELECT/WHERE query. An
.exclude() clause can be used to further refine results; for example,
User.filter(rank="Admin").exclude(status="Disabled"). This approach does provide a slightly more compact way to express common queries than the
.select().where() approach used elsewhere.
The Signals feature lets you specify behaviors before or after actions like saving or deleting a record. In other ORMs this would be done by, say, subclassing a model and overriding
.save(). With Tortoise, you can wrap a function with a decorator to specify a signal action, outside of the model definition. Tortoise also has a “router” mechanism for allowing reads and writes to be applied to different databases if needed. A very useful function not commonly seen in ORMs is
.explain(), which executes the database’s plan explainer on the supplied query.
Async is still a relatively new presence in Python’s ecosystem. To get a handle on how to use Tortoise with async web frameworks, the documentation provides examples for FastAPI, Quart, Sanic, Starlette,
aiohttp, and others. For those who want to use type annotations (also relatively new to the Python ecosystem), a Pydantic plugin can generate Pydantic models from Tortoise models, although it only supports serialization and not deserialization of those models. An external tool, Aerich, generates migration scripts, and supports both migrating to newer and downgrading to older versions of a schema.
The most widely used of the Python ORMs, SQLAlchemy, is almost always a safe default choice, even if newer and more elegant tools exist. Peewee is compact and expressive, with less boilerplate needed for many operations, but it lacks more advanced ORM features like a native mechanism for schema migrations.
Django’s ORM is mainly for use with the Django web framework, but its power and feature set, especially its migration management system, make it a strong reason to consider Django as a whole. PonyORM’s use of native Python metaphors makes it easy to grasp conceptually, but be aware of its opinionated defaults.
SQLObject, the oldest of the ORMs profiled here, has powerful features for evoking exact behaviors (e.g., joins), but it’s not always elegant to use and has few native utilities. And the newest, Tortoise ORM, is async by default, so it complements the new generation of async-first web frameworks.