How to Chat with PDF and Docs Using Google Gemini on Your Local Computer

Category :

Have you ever wished to chat with your documents, extracting insights or clarifying concepts directly? In this guide, we’re going to delve into how to set up a simple yet powerful script that allows you to chat with PDF, .docx, .txt files, and more using Google Gemini Pro models, without needing to rely on Vertex Google Cloud.

Getting Started

Before we jump into the code, make sure you have the following prerequisites:

  • A Google API Key from AI Studio.
  • Python installed on your Windows 10 machine.
  • The necessary library google.generativeai installed. You can do this via pip:
  • pip install google.generativeai

The Python Script

Here’s a straightforward Python script that accomplishes the task:


import os
import google.generativeai as genai

def chat_with_documents(input_folder, log_folder, output_responses_folder):
    # Step 1: List all document files in input folder
    doc_files = [f for f in os.listdir(input_folder) if f.endswith(('.pdf', '.docx', '.txt'))]
    print("Available documents:")
    for idx, file in enumerate(doc_files):
        print(f"{idx}: {file}")

    # Step 2: User selects documents
    selected_indices = input("Select document numbers (comma-separated): ")
    selected_files = [doc_files[int(idx)] for idx in selected_indices.split(',')]
    
    # Step 3: Read and concatenate text
    combined_text = ''
    for file in selected_files:
        with open(os.path.join(input_folder, file), 'r') as f:
            combined_text += f.read()
    
    total_tokens = len(combined_text.split())
    print(f"Total token count: {total_tokens}")
    
    # Step 4: Obtain instructions for the AI model
    instructions = input("Enter the instructions for the AI: ")
    
    # Use Google Gemini AI to process the combined text
    response = genai.generate_response(model='gemini', prompt=combined_text, instructions=instructions)
    
    # Step 5: Print and log the response
    print(response.text)
    with open(os.path.join(log_folder, f"{os.date}.log"), 'a') as log_file:
        log_file.write(f"Instructions: {instructions}\nResponse: {response.text}\n")
        
    # Save output response
    with open(os.path.join(output_responses_folder, f"Output_{file}.rtf"), 'w') as out_file:
        out_file.write(response.text)

# Replace with actual folder paths
chat_with_documents("input_documents_folder", "log_folder", "output_responses_folder")

Understanding the Script: A Kitchen Analogy

Imagine you’re in a kitchen and want to prepare a meal (chat with your documents). Each ingredient (document) needs to be selected and prepared. Here’s how the script works in terms of our kitchen:

  • **Document Selection**: Just like picking your ingredients from the pantry, the script lists all available files for you to choose.
  • **Preparation**: Once selected, the ingredients are mixed together (the text is combined) to make a delicious dish (the response from the AI).
  • **Cooking Instructions**: When you provide instructions, you’re giving the recipe to the chef (the AI model) on how to use the combined ingredients to create a specific outcome.
  • **Serving and Logging**: Finally, just like serving the dish, the script outputs the response and saves a log of what was done in the kitchen.

Troubleshooting

  • If you encounter issues running the script, ensure your API key is set correctly and that you have the permissions required.
  • Check that the document paths are accurate, and that the files are not corrupted or locked.
  • For saving issues, confirm that the folders for logs and outputs exist beforehand.
  • For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Conclusion

With this guide, you now have a way to engage in a dialogue with your documents seamlessly. Chatting with your PDFs and .docx files has never been more efficient! Start leveraging the power of Google Gemini with just a few lines of Python code!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox

Latest Insights

© 2024 All Rights Reserved

×