Getting Started with Gemini Pro on Google AI Studio

TL;DR

Create an API key in Google AI Studio (new project or existing Google Cloud project) and store it in Colab secrets before making any model calls.

Briefing Cornell Notes

Briefing

Gemini Pro is now broadly available, and getting started is mostly a matter of creating an API key in Google AI Studio, pasting it into a Colab notebook, and then wiring up two text models—Gemini Pro for text generation and Gemini Pro Vision for image understanding. The practical payoff is immediate: developers can test prompts in the AI Studio console, then reproduce the same behavior in code with generation settings, streaming, and safety controls.

After accepting terms in Google AI Studio, the key step is generating an API key. Users can either create a new key tied to a fresh project or create one inside an existing Google Cloud project. Once the key is copied, it’s used in Google Colab via the notebook’s secrets configuration. With the key in place, the AI Studio interface lets users try Gemini Pro and Gemini Pro Vision directly using freeform prompts, structured prompts, or chat prompts. A simple prompt—like drafting an email announcing Gemini Pro availability—returns a completion, while chat mode supports multi-turn conversation where follow-up prompts continue the context.

Beyond prompting, the console provides controls that also map cleanly into code. Temperature and other generation parameters can be adjusted, and safety settings can be edited to decide what content categories to block and how strongly to block them. Categories mentioned include harassment, hate speech, sexually explicit content, and dangerous content. The interface also provides prompt feedback, flagging whether a prompt violated any configured safety rules.

The Colab walkthrough then demonstrates the same workflow programmatically. It starts by listing available model options, including legacy bison models (text bison and chat bison), Gemini Pro, Gemini Pro Vision, and an embedding model, plus an AQA model mentioned for a future video. For text generation, the notebook uses model.generate_content with a basic question (e.g., “biggest planet”), then reads the result from response.text and prints it with Markdown. It also shows streaming by setting stream=True, which returns the output in chunks.

For more control, the notebook configures generation parameters such as temperature, top P, top K, and maximum output tokens. Safety settings are passed as a list of category dictionaries—harassment, hate speech, sexually explicit, and dangerous content—along with thresholds (for example, blocking hate speech at “medium and above”). The code then instantiates the model with both generation config and safety settings, and again checks response.prompt_feedback for objectionable content.

Chat mode is handled separately: model.startchat creates a chat session, optionally seeded with history, and chat.send_message sends new user turns while maintaining conversation state. The notebook shows how chat history is stored as role/text pairs.

Finally, Gemini Pro Vision is demonstrated using images sourced from NASA. When only an image is provided, the model returns general information about the pictured planet (Saturn). When text is added alongside the image—such as requesting the planet name and movies featuring it—the output becomes conditioned on both modalities. With two images (Earth and Saturn), the model can compare them even without explicitly naming which planet each image contains, then responds with differences and contextual facts for each.

The walkthrough closes by pointing to next steps: using Gemini Pro with LangChain and adding function calling, with those topics scheduled for follow-up videos.

Cornell Notes

Gemini Pro on Google AI Studio becomes usable once an API key is created and placed into a Colab notebook’s secrets. Developers can test Gemini Pro in the console using freeform, structured, or chat prompts, then replicate those behaviors in code with model.generate_content and model.startchat. Generation control comes from parameters like temperature, top P, top K, and maximum output tokens, while safety is enforced via category-based thresholds for harassment, hate speech, sexually explicit content, and dangerous content. Gemini Pro Vision extends the same workflow to images, supporting image-only descriptions, image-plus-text question answering, and multi-image comparisons (e.g., Earth vs. Saturn).

What are the minimum steps to start using Gemini Pro from Google AI Studio and Colab?

Create an API key in Google AI Studio (either in a new project or an existing Google Cloud project), copy the key, and paste it into the Colab notebook’s secrets. After that, the notebook can access Gemini Pro and Gemini Pro Vision models and run prompts through the configured SDK calls.

How does the notebook perform basic text generation with Gemini Pro?

It calls model.generate_content with a prompt such as asking for the biggest planet. The response content is retrieved via response.text, and the notebook uses Markdown display to print the formatted output. It also checks response.prompt_feedback to see whether the prompt triggered any safety rules.

How do streaming and generation parameters change the way outputs are produced?

Streaming is enabled by passing stream=True to generate_content, which returns output in multiple chunks that can be collected into a final response. Generation behavior is tuned with a generation config that includes temperature, top P, top K, and maximum output tokens, letting developers control randomness and length.

How are safety settings applied in code, and what feedback is available?

Safety settings are provided as a list of dictionaries keyed by harm category (harassment, hate speech, sexually explicit, dangerous content) with thresholds such as blocking hate speech at “medium and above.” After generation, response.prompt_feedback indicates whether the prompt violated any configured safety category.

What’s the difference between using Gemini Pro for single-turn generation versus chat?

Single-turn generation uses model.generate_content. Chat uses model.startchat to create a session (optionally with history), then chat.send_message to send each user turn. The notebook shows chat history stored as role/text entries for both user and model messages.

How does Gemini Pro Vision handle images, and what kinds of tasks does it support?

With an image-only input (no text), it returns general information about the pictured object (e.g., Saturn). When text is included alongside the image, it answers targeted questions like naming the planet and listing movies featuring it. With two images (Earth and Saturn), it can compare them and provide differences, even when the prompt doesn’t explicitly label which image corresponds to which planet.

Review Questions

How would you modify generation behavior using temperature, top P, top K, and maximum output tokens, and what effect would you expect from each change?
What information does response.prompt_feedback provide, and how could you use it to debug safety-related prompt blocks?
In what ways do model.generate_content and model.startchat differ in how conversation context is maintained?

Key Points

1
Create an API key in Google AI Studio (new project or existing Google Cloud project) and store it in Colab secrets before making any model calls.
2
Use Gemini Pro in the AI Studio console to test freeform, structured, and chat prompts, then mirror those prompts in code.
3
Tune output with generation config settings including temperature, top P, top K, and maximum output tokens.
4
Apply safety controls by setting thresholds for harassment, hate speech, sexually explicit content, and dangerous content, and check response.prompt_feedback for violations.
5
Enable streaming by setting stream=True to receive generated text in chunks rather than a single response.
6
Use Gemini Pro Vision by sending image inputs (image-only for general descriptions, image-plus-text for targeted Q&A, and multiple images for comparisons).

Highlights

Google AI Studio provides safety settings and prompt feedback in the console, and the same category/threshold logic can be implemented in code.

Streaming output is handled by stream=True, which returns multiple chunks that can be assembled into a final response.

Gemini Pro Vision can answer questions conditioned on both images and accompanying text, and it can compare two unlabeled images (Earth vs. Saturn) by inferring which is which.

Topics

API Key Setup
Gemini Pro Text
Safety Settings
Streaming Responses
Gemini Pro Vision

Mentioned

Sam Witteveen