How to Use DuckDB with SQLAlchemy

Sep 10, 2023 | Programming

Welcome to this user-friendly guide on utilizing the DuckDB SQLAlchemy driver. DuckDB is designed for fast analytical queries, making it a popular choice for data analysis in Python environments. In this article, we’ll walk through installation, usage, and some important considerations to keep in mind as you embark on your DuckDB journey.

Installation

To get started with the DuckDB Engine, you can easily install it using pip:

sh
pip install duckdb-engine

Additionally, there’s a conda feedstock available. You can find the instructions for usage in its repository.

Usage

Once installed, using the DuckDB Engine is straightforward. It fits seamlessly with SQLAlchemy, allowing you to create databases and manage models. Here’s an analogy that explains this concept:

Imagine you have a magical library (DuckDB) where every book (data) is meticulously organized in a special way (SQLAlchemy) so that you can easily find and retrieve any book you want with just a few commands!

Here’s a quick example of how to set up a basic database and add data:

python
from sqlalchemy import Column, Integer, Sequence, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session

Base = declarative_base()

class FakeModel(Base):  # type: ignore
    __tablename__ = 'fake'
    
    id = Column(Integer, Sequence('fakemodel_id_sequence'), primary_key=True)
    name = Column(String)

eng = create_engine('duckdb::memory:')
Base.metadata.create_all(eng)

session = Session(bind=eng)
session.add(FakeModel(name='Frank'))
session.commit()
frank = session.query(FakeModel).one()
assert frank.name == 'Frank'

Usage in IPython/Jupyter

Are you looking to run queries using DuckDB in your Jupyter notebooks? The integration is smooth with the IPython-SQL interface. Check out the DuckDB documentation or see an amazing demo by Alex Monahan on his blog.

Configuration

You can further configure DuckDB by passing specific connection arguments to the create_engine function. Here’s how:

python
create_engine(
    'duckdb::memory:',
    connect_args={
        'read_only': False,
        'config': {
            'memory_limit': '500mb'
        }
    }
)

Check out the DuckDB docs for more configuration parameters.

How to Register a Pandas DataFrame

To register a pandas DataFrame with DuckDB, you will interact with the SQLAlchemy engine as follows:

python
conn = create_engine('duckdb::memory:').connect()
# With SQLAlchemy 1.3
conn.execute('register', (dataframe_name, pd.DataFrame(...))) 

# With SQLAlchemy 1.4+
conn.execute(text('register(:name, :df)'), name='test_df', df=df)
conn.execute('select * from dataframe_name')

Things to Keep in Mind

While working with DuckDB, here are a few critical points:

  • Auto-incrementing ID Columns: DuckDB does not yet support the SERIAL datatype that PostgreSQL uses. Instead, utilize SQLAlchemy’s Sequence object for auto-incrementing keys. For additional details, refer to the SQLAlchemy Sequence documentation.
  • Pandas read_sql() Chunksize: As of version 0.5.0 of DuckDB, the pandas.read_sql() method works without issues. However, avoid supplying a chunksize parameter to prevent exceptions.
  • Unsigned Integer Support: DuckDB supports unsigned integers, accessible through duckdb_engine.datatypes.

Alembic Integration

For managing database migrations, integrate SQLAlchemy’s Alembic by adding an implementation class for the DuckDB dialect:

python
from alembic.ddl.impl import DefaultImpl

class AlembicDuckDBImpl(DefaultImpl):
    # Alembic implementation for DuckDB.
    __dialect__ = duckdb

This integration prevents errors when generating or applying migrations.

Preloading Extensions (Experimental)

Starting from DuckDB version 0.9.0, there’s built-in support for autoloading extensions. To experiment with preloading extensions, you can use this code snippet:

python
from sqlalchemy import create_engine

create_engine(
    'duckdb::memory:',
    connect_args={
        'preload_extensions': ['https'],
        'config': {
            's3_region': 'ap-southeast-1'
        }
    }
)

The Name

Just a quick note—the package is called duckdb-engine out of spontaneity. Renaming it would be too challenging now!

Troubleshooting

If you run into any issues while using DuckDB with SQLAlchemy, here are some common troubleshooting ideas:

  • Double-check that you have the latest version of both DuckDB and SQLAlchemy installed.
  • Ensure that the connection arguments you’re passing are correctly formatted as expected by DuckDB.
  • If you encounter issues with queries not executing, inspect if you are trying to use features unsupported by DuckDB that’s derived from PostgreSQL.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox