Welcome to this tutorial where we’ll explore the remarkable journey of indexing millions of Wikipedia articles using Upstash Vector. This project showcases how to create a semantic search engine and a RAG chatbot, demonstrating the power of vector databases and language models.
Project Overview
In this project, we prepared and embedded Wikipedia articles to build a semantic search engine and an engaging RAG chatbot. The steps we undertook include:
- Preparing and embedding Wikipedia articles
- Indexing the vectors using Upstash Vector
- Building a Wikipedia semantic search engine
- Implementing a RAG chatbot
Key Features
- Indexed over 144 million vectors from Wikipedia articles in 11 languages
- Utilized BGE-M3 embedding model for multilingual support
- Implemented semantic search with cross-lingual capabilities
- Created a RAG chatbot using Upstash RAG Chat SDK
Technologies Used
- Upstash Vector: For storing and querying vector embeddings
- Upstash Redis: For storing chat sessions
- Upstash RAG Chat SDK: For building the RAG Chat application
- SentenceTransformers: For generating embeddings
- Meta-Llama-3-8B-Instruct: As the LLM provider through QStash LLM APIs
How to Run the Project Locally
Follow these simple steps to get the project up and running on your local machine:
- Go to Upstash Console to manage your databases:
- Create a new Vector database with embedding model support, ideally choosing the BGE-M3 model for multilingual capabilities.
- Create a new Redis database for chat session storage.
- Copy the credentials for both Redis and Vector, along with QStash credentials for using Upstash hosted LLM models.
- Put the credentials into a .env file in the root of the project. Your .env file should resemble the following:
UPSTASH_VECTOR_REST_URL= UPSTASH_VECTOR_REST_TOKEN= UPSTASH_REDIS_REST_TOKEN= UPSTASH_REDIS_REST_URL= QSTASH_TOKEN= - Populate your Vector index. Note that this project uses namespaces for multilingual storage. For English, you should use the “en” namespace for your vectors.
- Install the necessary dependencies:
pnpm install - Run the development server:
pnpm dev
Troubleshooting Tips
If you encounter any issues while setting up or running the project, here are some common solutions:
- Double-check your
.envfile to ensure all credentials are correctly entered and there are no extra spaces. - Make sure the Upstash services you are trying to access are running and properly configured.
- If the application fails to retrieve data, verify that your namespace is correctly specified in your vector index operations.
- For any other questions, feedback, or to discuss potential collaborations on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
Check out our live demo to see the project in action!

