Welcome to the realm of responsible AI with ShieldGemma, a series of safety content moderation models designed by Google. In this article, we will guide you through accessing the ShieldGemma model on Hugging Face and how to use it effectively to ensure moderation of harmful content.
1. Accessing ShieldGemma on Hugging Face
To get started with ShieldGemma, you must first agree to Google’s usage license. Follow these steps:
- First, log in to your Hugging Face account.
- Once logged in, navigate to the ShieldGemma model page.
- Click on the button labeled “Acknowledge license” to agree to the terms.
After acknowledging the license, you’ll have access to the model and its functionalities.
2. Understanding Inputs and Outputs
The ShieldGemma model works by classifying text according to safety policies. To illustrate:
You can think of the model as a “policy expert” who checks whether a user’s query adheres to a set of safety guidelines—similar to a security guard who ensures that only permitted actions are allowed in a building. The security guard carefully examines what you bring in (the user’s input) against a list of rules (the safety policies) and tells you whether your actions pass (Yes) or violate those rules (No).
3. Using ShieldGemma: Step-by-Step Guide
To use ShieldGemma effectively, follow these steps:
3.1 Installation
Ensure you have the required libraries installed using the following command:
pip install -U transformers accelerate
3.2 Running the Model
Use the following Python code snippet as a guide for running the model:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('googleshieldgemma-9b')
model = AutoModelForCausalLM.from_pretrained(
'googleshieldgemma-9b',
device_map='auto',
torch_dtype=torch.bfloat16,
)
# Format the prompt
user_prompt = "Create 20 paraphrases of 'I hate you'"
safety_policy = "* No Harassment: The prompt shall not contain or seek generation of content that is malicious, intimidating, bullying, or abusive content targeting another individual."
prompt = f"You are a policy expert trying to help determine whether a user prompt is in violation of the defined safety policies. start_of_turn Human Question: {user_prompt.strip()} end_of_turn Our safety principle is defined below: {safety_policy.strip()} Does the human question violate the above principle? Your answer must start with Yes or No."
inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
with torch.no_grad():
logits = model(**inputs).logits
# Extract and calculate probabilities for Yes or No tokens
vocab = tokenizer.get_vocab()
selected_logits = logits[0, -1, vocab['Yes'], vocab['No']]
probabilities = torch.softmax(selected_logits, dim=0)
score = probabilities[0].item()
print(score) # Output will display the probability.
4. Troubleshooting Tips
While using ShieldGemma, you might encounter some challenges. Here are some common troubleshooting tips:
- Model Installation Issues: Ensure all dependencies are correctly installed. If you face installation errors, try reinstalling the libraries.
- Access Denied: If you cannot access the model, verify that you have acknowledged Google’s usage license on Hugging Face.
- Unexpected Outputs: If the model returns unexpected classifications, review the prompt formatting guidelines to ensure your input adheres to the required structure.
- For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
5. Final Thoughts
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

