How to Get Started with MyScaleDB: The SQL Vector Database for Scalable AI

May 28, 2022 | Programming

Welcome to the fascinating world of MyScaleDB! As the SQL vector database designed for developers who want to build production-grade and scalable AI applications, MyScaleDB allows you to leverage your familiar SQL skills to manage massive volumes of data. In this guide, you’ll learn how to set up MyScaleDB and troubleshoot any potential issues.

What is MyScaleDB?

MyScaleDB is built on top of the ClickHouse platform, optimizing for AI applications, enabling effective management and processing of diverse data types, including structured, text, and vectorized data. The key benefits of using MyScaleDB are:

  • Fully SQL-Compatible: Use familiar SQL syntax with vector-related functions, making vector search and SQL-vector join queries as seamless as ever.
  • Production-Ready for AI applications: A unified platform for managing various data formats, enhancing retrieval accuracy through metadata filtering.
  • Unmatched Performance and Scalability: Leverage cutting-edge architecture for lightning-fast vector operations.

Quick Start with MyScaleDB

Using MyScale Cloud

The simplest way to use MyScaleDB is through the MyScale Cloud service. You can sign up for a free pod supporting 5 million vectors and access the MyScaleDB QuickStart documentation directly for additional guidance.

Self-Hosted Installation

If you prefer self-hosting, you can use Docker to run MyScaleDB. Follow these steps:

Using MyScaleDB Docker Image

To pull and run the latest MyScaleDB Docker image, execute the following command:

docker run --name myscaledb --net=host myscale/myscaledb:1.7.1

Note: The default configuration allows localhost IP access only. Make sure to specify –net=host option.

Using Docker Compose

Set up your directory structure as follows, including the docker-compose.yaml file:

mymyscaledb
|-- docker-compose.yaml
|-- volumes
    |-- config
    |   |-- users.d
    |       |-- custom_users_config.xml

Here’s a sample configuration for your docker-compose.yaml:

version: '3.7'
services:
  myscaledb:
    image: myscale/myscaledb:1.7.1
    tty: true
    ports:
      - "8123:8123"
      - "9000:9000"
      - "8998:8998"
      - "9363:9363"
      - "9116:9116"
    networks:
      myscaledb_network:
        ipv4_address: 10.0.0.2
    volumes:
      - $DOCKER_VOLUME_DIRECTORY:-.volumes/data:/var/lib/clickhouse
      - $DOCKER_VOLUME_DIRECTORY:-.volumes/log:/var/log/clickhouse-server
      - $DOCKER_VOLUME_DIRECTORY:-.volumes/config/users.d/custom_users_config.xml:/etc/clickhouse-server/users.d/custom_users_config.xml
    deploy:
      resources:
        limits:
          cpus: "16.00"
          memory: "32Gb"
networks:
  myscaledb_network:
    driver: bridge
    ipam:
      driver: default
      config:
        - subnet: 10.0.0.0/24

After updating your configuration file, execute the following commands to start your MyScaleDB instance:

cd myscaledb
docker-compose up -d

Access your MyScaleDB command line interface:

docker exec -it myscaledb-myscaledb-1 clickhouse-client

Now, let’s jump into executing SQL statements!

MyScaleDB Example Usage

Creating a table with a vector column and inserting data can be done with SQL commands. Here’s how you can approach this:

  • Create a Table with Vector Column:
  • CREATE TABLE default.wiki_abstract(
        id UInt64,
        body String,
        title String,
        url String,
        body_vector Array(Float32),
        CONSTRAINT check_length CHECK length(body_vector) = 384
    ) ENGINE = MergeTree ORDER BY id;
  • Insert Data into Your Tables:
  • INSERT INTO default.wiki_abstract 
        SELECT * FROM s3('https://myscale-datasets.s3.ap-southeast-1.amazonaws.com/wiki_abstract_with_vector.parquet', 'Parquet');
  • Create the Vector Index:
  • ALTER TABLE default.wiki_abstract 
        ADD VECTOR INDEX vec_idx body_vector TYPE SCANN(metric_type=Cosine);
  • Execute Vector Search:
  • SELECT id, title, distance(body_vector, array(...)) AS distance 
        FROM default.wiki_abstract 
        ORDER BY distance ASC 
        LIMIT 5;

Troubleshooting

If you face issues while setting up MyScaleDB, here are some troubleshooting tips:

  • Check the Docker service status if your containers aren’t starting.
  • Ensure your configuration file paths are correct and directories are mounted properly.
  • If you encounter connection problems, verify that the correct port is exposed and accessible.
  • Inspect logs for any error messages that can provide insights into the issue.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Conclusion

With MyScaleDB, you can effectively manage both structured and vectorized data, making it a robust option for developers in the AI landscape. Utilize the tools and documentation provided to start building sophisticated datasets and applications now!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox