Google unveils open source Gemma 3 model with 128k context window

0


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Even as large language and reasoning models remain popular, organizations increasingly turn to smaller models to run AI processes with fewer energy and cost concerns. 

While some organizations are distilling larger models to smaller versions, model providers like Google continue to release small language models (SLMs) as an alternative to large language models (LLMs), which may cost more to run without sacrificing performance or accuracy. 

With that in mind, Google has released the latest version of its small model, Gemma, which features expanded context windows, larger parameters and more multimodal reasoning capabilities. 

Gemma 3, which has the same processing power as larger Gemini 2.0 models, remains best used by smaller devices like phones and laptops. The new model has four sizes: 1B, 4B, 12B and 27B parameters. 

With a larger context window of 128K tokens — by contrast, Gemma 2 had a context window of 80K — Gemma 3 can understand more information and complicated requests. Google updated Gemma 3 to work in 140 languages, analyze images, text and short videos and support function calling to automate tasks and agentic workflows. 

Gemma gives a strong performance

To reduce computing costs even further, Google has introduced quantized versions of Gemma. Think of quantized models as compressed models. This happens through the process of “reducing the precision of the numerical values in a model’s weights” without sacrificing accuracy. 

Google said Gemma 3 “delivers state-of-the-art performance for its size” and outperforms leading LLMs like Llama-405B, DeepSeek-V3 and o3-mini. Gemma 3 27B, specifically, came in second to DeepSeek-R1 in Chatbot Arena Elo score tests. It topped DeepSeek’s smaller model, DeepSeek v3, OpenAI’s o3-mini, Meta’s Llama-405B and Mistral Large. 

By quantizing Gemma 3, users can improve performance, run the model and build applications “that can fit on a single GPU and tensor processing unit (TPU) host.” 

Gemma 3 integrates with developer tools like Hugging Face Transformers, Ollama, JAX, Keras, PyTorch and others. Users can also access Gemma 3 through Google AI Studio, Hugging Face or Kaggle. Companies and developers can request access to the Gemma 3 API through AI Studio. 

Shield Gemma for security

Google said it has built safety protocols into Gemma 3, including a safety checker for images called ShieldGemma 2. 

“Gemma 3’s development included extensive data governance, alignment with our safety policies via fine-tuning and robust benchmark evaluations,” Google writes in a blog post. “While thorough testing of more capable models often informs our assessment of less capable ones, Gemma 3’s enhanced STEM performance prompted specific evaluations focused on its potential for misuse in creating harmful substances; their results indicate a low-risk level.”

ShieldGemma 2 is a 4B parameter image safety checker built on the Gemma 3 foundation. It finds and prevents the model from responding with images containing sexually explicit content, violence and other dangerous material. Users can customize ShieldGemma 2 to suit their specific needs. 

Small models and distillation on the rise

Since Google first released Gemma in February 2024, SLMs have seen an increase in interest. Other small models like Microsoft’s Phi-4 and Mistral Small 3 indicate that enterprises want to build applications with models as powerful as LLMs, but not necessarily use the entire breadth of what an LLM is capable of. 

Enterprises have also begun turning to smaller versions of the LLMs they prefer through distillation. To be clear, Gemma is not a distillation of Gemini 2.0; rather, it is trained with the same dataset and architecture. A distilled model learns from a larger model, which Gemma does not. 

Organizations often prefer to fit certain use cases to a model. Instead of deploying an LLM like o3-mini or Claude 3.7 Sonnet to a simple code editor, a smaller model, whether an SLM or a distilled version, can easily do those tasks without overfitting a huge model. 



Source link

You might also like
Leave A Reply

Your email address will not be published.