Project S.A.T.U.R.D.A.Y

Category :
GitHub Repository Logo

A toolbox for vocal computing built with Pion, whisper.cpp, and Coqui TTS. Build your own personal, self-hosted J.A.R.V.I.S powered by WebRTC


Stars Badge Forks Badge

View DemoGetting StartedRequest Features

Table of Contents

  1. About The Project
  2. Getting Started
  3. Roadmap
  4. Discord
  5. Built With
  6. Bugs
  7. Contributing
  8. License
  9. Support Me
  10. Contact Me
## About the Project Project S.A.T.U.R.D.A.Y is a toolbox for vocal computing. It provides tools to build elegant vocal interfaces to modern LLMs. The goal of this project is to foster a community of like-minded individuals who want to bring forth the technology we have been promised in sci-fi movies for decades. It aims to be highly modular and flexible while staying decoupled from specific AI models. This allows for seamless upgrades when new AI technology is released. ### How It Works Imagine Project S.A.T.U.R.D.A.Y as a highly versatile toolbox. Just like a toolbox contains various tools that can be used for different tasks, this project consists of multiple tools that help create vocal computing applications. Each tool has two critical parts: – **Engine**: Think of the engine as the brains behind a tool, which houses all the necessary instructions for specific actions. For instance, in the Speech-to-Text (STT) tool, the engine includes the logic that detects your voice and processes it accordingly. – **Backend**: This is akin to the power source of your tools, which runs the actual operations. It connects different components, allowing them to work together seamlessly. The project includes three main types of tools: #### STT (Speech-to-Text) These tools act as the ears of the system, converting spoken language into text. #### TTT (Text-to-Text) These tools function as the brains, processing the text derived from the STT tools. #### TTS (Text-to-Speech) The TTS tools are the mouth of the system, turning the processed text back into spoken language. ### Diagram Here is a diagram of how the main demo currently works: Saturday demo diagram ## Getting Started The demo that comes with this repository allows you to create your own personal, self-hosted J.A.R.V.I.S-like assistant. **DISCLAIMER**: This has primarily been tested on M1 Pro and Max processors, as it demands considerable processing power for local inference. Your experience may vary depending on your hardware and operating system. To run the demo, several prerequisites are required. ### Prerequisites Make sure you have the following installed: – [Golang](https://golang.org/doc/install) – [Python](https://www.python.org/downloads) – [Make](https://www.gnu.org/software/make) – A C Compiler Three processes must run concurrently: – **RTC**: The RTC server hosts the web page and connects to WebRTC. – **Client**: This is where the voice data is processed and responded to. – **TTS**: The TTS server converts text back into speech. **Note**: Start the RTC and TTS servers before launching the client. ### Steps to Start 1. **RTC** From the project root, run the following commands:
make rtc
2. **TTS** **FIRST TIME SETUP**: When running the TTS server for the first time, install dependencies using:
cd tts/server/coqui-tts
pip install -r requirements.txt
Then back at the project root, execute:
make tts
3. **Client** For the client, run:
make client
## Roadmap ### Local Inference Expanding TTT inference to run locally with something like [llama.cpp](https://github.com/ggerganov/llama.cpp) is currently prioritized. ### Ease of Use Continued improvement of setup and configuration is the next major goal. ### Building With S.A.T.U.R.D.A.Y Encouraging others to build applications using this toolkit opens up opportunities for enhancement and discovering new features. ## Discord Join the Discord to stay up to date! ## Built With This project utilizes several open-source packages: – [Pion](https://github.com/pion) – [whisper.cpp](https://github.com/ggerganov/whisper.cpp) – [Coqui TTS](https://github.com/coqui-ai/TTS) ## Bugs Bugs are a part of development. Please report any issues or clarification requests. Feel free to join us on Discord for help. ## Contributing Contributions make the open-source community thrive. To contribute: 1. Fork the project. 2. Create your feature branch with `git checkout -b feature/AmazingFeature`. 3. Commit your changes. 4. Push to the branch. 5. Open a Pull Request. ## License MIT ## Support Me If you find value in the project and wish to support it, consider buying me a coffee. ## Contact Me – GitHub: GRVYDEV – Twitter: @grvydev – Email: grvy@aer.industries ## Troubleshooting In case you encounter any issues during setup, consider the following troubleshooting tips: – Ensure all dependencies are installed correctly. – Check the order of process startup is followed meticulously (RTC then TTS, finally Client). – For any persistent issues, please open an issue on the GitHub page. For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox

Latest Insights

© 2024 All Rights Reserved

×