Have you ever wished to chat with your documents, extracting insights or clarifying concepts directly? In this guide, we’re going to delve into how to set up a simple yet powerful script that allows you to chat with PDF, .docx, .txt files, and more using Google Gemini Pro models, without needing to rely on Vertex Google Cloud.
Getting Started
Before we jump into the code, make sure you have the following prerequisites:
- A Google API Key from AI Studio.
- Python installed on your Windows 10 machine.
- The necessary library
google.generativeai
installed. You can do this via pip:
pip install google.generativeai
The Python Script
Here’s a straightforward Python script that accomplishes the task:
import os
import google.generativeai as genai
def chat_with_documents(input_folder, log_folder, output_responses_folder):
# Step 1: List all document files in input folder
doc_files = [f for f in os.listdir(input_folder) if f.endswith(('.pdf', '.docx', '.txt'))]
print("Available documents:")
for idx, file in enumerate(doc_files):
print(f"{idx}: {file}")
# Step 2: User selects documents
selected_indices = input("Select document numbers (comma-separated): ")
selected_files = [doc_files[int(idx)] for idx in selected_indices.split(',')]
# Step 3: Read and concatenate text
combined_text = ''
for file in selected_files:
with open(os.path.join(input_folder, file), 'r') as f:
combined_text += f.read()
total_tokens = len(combined_text.split())
print(f"Total token count: {total_tokens}")
# Step 4: Obtain instructions for the AI model
instructions = input("Enter the instructions for the AI: ")
# Use Google Gemini AI to process the combined text
response = genai.generate_response(model='gemini', prompt=combined_text, instructions=instructions)
# Step 5: Print and log the response
print(response.text)
with open(os.path.join(log_folder, f"{os.date}.log"), 'a') as log_file:
log_file.write(f"Instructions: {instructions}\nResponse: {response.text}\n")
# Save output response
with open(os.path.join(output_responses_folder, f"Output_{file}.rtf"), 'w') as out_file:
out_file.write(response.text)
# Replace with actual folder paths
chat_with_documents("input_documents_folder", "log_folder", "output_responses_folder")
Understanding the Script: A Kitchen Analogy
Imagine you’re in a kitchen and want to prepare a meal (chat with your documents). Each ingredient (document) needs to be selected and prepared. Here’s how the script works in terms of our kitchen:
- **Document Selection**: Just like picking your ingredients from the pantry, the script lists all available files for you to choose.
- **Preparation**: Once selected, the ingredients are mixed together (the text is combined) to make a delicious dish (the response from the AI).
- **Cooking Instructions**: When you provide instructions, you’re giving the recipe to the chef (the AI model) on how to use the combined ingredients to create a specific outcome.
- **Serving and Logging**: Finally, just like serving the dish, the script outputs the response and saves a log of what was done in the kitchen.
Troubleshooting
- If you encounter issues running the script, ensure your API key is set correctly and that you have the permissions required.
- Check that the document paths are accurate, and that the files are not corrupted or locked.
- For saving issues, confirm that the folders for logs and outputs exist beforehand.
- For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
Conclusion
With this guide, you now have a way to engage in a dialogue with your documents seamlessly. Chatting with your PDFs and .docx files has never been more efficient! Start leveraging the power of Google Gemini with just a few lines of Python code!