Welcome to this user-friendly guide on utilizing the DuckDB SQLAlchemy driver. DuckDB is designed for fast analytical queries, making it a popular choice for data analysis in Python environments. In this article, we’ll walk through installation, usage, and some important considerations to keep in mind as you embark on your DuckDB journey.
Installation
To get started with the DuckDB Engine, you can easily install it using pip:
sh
pip install duckdb-engine
Additionally, there’s a conda feedstock available. You can find the instructions for usage in its repository.
Usage
Once installed, using the DuckDB Engine is straightforward. It fits seamlessly with SQLAlchemy, allowing you to create databases and manage models. Here’s an analogy that explains this concept:
Imagine you have a magical library (DuckDB) where every book (data) is meticulously organized in a special way (SQLAlchemy) so that you can easily find and retrieve any book you want with just a few commands!
Here’s a quick example of how to set up a basic database and add data:
python
from sqlalchemy import Column, Integer, Sequence, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session
Base = declarative_base()
class FakeModel(Base): # type: ignore
__tablename__ = 'fake'
id = Column(Integer, Sequence('fakemodel_id_sequence'), primary_key=True)
name = Column(String)
eng = create_engine('duckdb::memory:')
Base.metadata.create_all(eng)
session = Session(bind=eng)
session.add(FakeModel(name='Frank'))
session.commit()
frank = session.query(FakeModel).one()
assert frank.name == 'Frank'
Usage in IPython/Jupyter
Are you looking to run queries using DuckDB in your Jupyter notebooks? The integration is smooth with the IPython-SQL interface. Check out the DuckDB documentation or see an amazing demo by Alex Monahan on his blog.
Configuration
You can further configure DuckDB by passing specific connection arguments to the create_engine function. Here’s how:
python
create_engine(
'duckdb::memory:',
connect_args={
'read_only': False,
'config': {
'memory_limit': '500mb'
}
}
)
Check out the DuckDB docs for more configuration parameters.
How to Register a Pandas DataFrame
To register a pandas DataFrame with DuckDB, you will interact with the SQLAlchemy engine as follows:
python
conn = create_engine('duckdb::memory:').connect()
# With SQLAlchemy 1.3
conn.execute('register', (dataframe_name, pd.DataFrame(...)))
# With SQLAlchemy 1.4+
conn.execute(text('register(:name, :df)'), name='test_df', df=df)
conn.execute('select * from dataframe_name')
Things to Keep in Mind
While working with DuckDB, here are a few critical points:
- Auto-incrementing ID Columns: DuckDB does not yet support the SERIAL datatype that PostgreSQL uses. Instead, utilize SQLAlchemy’s Sequence object for auto-incrementing keys. For additional details, refer to the SQLAlchemy Sequence documentation.
- Pandas read_sql() Chunksize: As of version 0.5.0 of DuckDB, the pandas.read_sql() method works without issues. However, avoid supplying a chunksize parameter to prevent exceptions.
- Unsigned Integer Support: DuckDB supports unsigned integers, accessible through duckdb_engine.datatypes.
Alembic Integration
For managing database migrations, integrate SQLAlchemy’s Alembic by adding an implementation class for the DuckDB dialect:
python
from alembic.ddl.impl import DefaultImpl
class AlembicDuckDBImpl(DefaultImpl):
# Alembic implementation for DuckDB.
__dialect__ = duckdb
This integration prevents errors when generating or applying migrations.
Preloading Extensions (Experimental)
Starting from DuckDB version 0.9.0, there’s built-in support for autoloading extensions. To experiment with preloading extensions, you can use this code snippet:
python
from sqlalchemy import create_engine
create_engine(
'duckdb::memory:',
connect_args={
'preload_extensions': ['https'],
'config': {
's3_region': 'ap-southeast-1'
}
}
)
The Name
Just a quick note—the package is called duckdb-engine out of spontaneity. Renaming it would be too challenging now!
Troubleshooting
If you run into any issues while using DuckDB with SQLAlchemy, here are some common troubleshooting ideas:
- Double-check that you have the latest version of both DuckDB and SQLAlchemy installed.
- Ensure that the connection arguments you’re passing are correctly formatted as expected by DuckDB.
- If you encounter issues with queries not executing, inspect if you are trying to use features unsupported by DuckDB that’s derived from PostgreSQL.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Final Thoughts
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

