Complete Guide to the Gemini API with Python

Why This Matters

Google’s Gemini API is one of the most powerful available in 2025. With a 2-million token context window and multimodal capabilities (text, image, audio, video), it outperforms many competitors on complex technical tasks. Mastering this API gives you a significant edge for building production-grade AI applications.

Prerequisites

Python 3.9+
A Google account and API key (free at aistudio.google.com)
Basic Python knowledge

Installation

pip install google-generativeai python-dotenv

Create a .env file at the root:

GEMINI_API_KEY=your_api_key_here

Step 1 — Client Initialization

import google.generativeai as genai
import os
from dotenv import load_dotenv

load_dotenv()
genai.configure(api_key=os.environ["GEMINI_API_KEY"])

# Choose the model
model = genai.GenerativeModel("gemini-1.5-pro")

Step 2 — Simple Text Generation

response = model.generate_content("Explain the concept of tokenization in NLP.")
print(response.text)

Step 3 — Streaming (for long responses)

for chunk in model.generate_content("Write an article about LLMs.", stream=True):
    print(chunk.text, end="", flush=True)

Step 4 — Image Analysis (multimodal)

import PIL.Image

img = PIL.Image.open("screenshot.png")
response = model.generate_content(["What do you see in this image?", img])
print(response.text)

Gemini Model Comparison

Model	Context	Speed	Cost	Best for
gemini-1.5-pro	2M tokens	Slow	$$$	Long document analysis
gemini-1.5-flash	1M tokens	Fast	$	Real-time applications
gemini-1.0-pro	32k tokens	Medium	$$	General use

⚠️ Common Mistakes

ResourceExhausted: You’ve hit the rate limit. Add a time.sleep(1) between calls.
InvalidArgument: The image is too large. Resize it to max 4MB.
Truncated response: Increase max_output_tokens in generation_config.

Key Takeaways

The API is free up to a certain quota — enough to get started
gemini-1.5-flash is 10x cheaper than pro for 95% of use cases
Streaming significantly improves UX for long responses
The 2M token context window allows injecting entire codebases
Always handle rate limiting errors with exponential retry