Getting Started with Gemini Pro on Google AI Studio
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Create an API key in Google AI Studio (new project or existing Google Cloud project) and store it in Colab secrets before making any model calls.
Briefing
Gemini Pro is now broadly available, and getting started is mostly a matter of creating an API key in Google AI Studio, pasting it into a Colab notebook, and then wiring up two text models—Gemini Pro for text generation and Gemini Pro Vision for image understanding. The practical payoff is immediate: developers can test prompts in the AI Studio console, then reproduce the same behavior in code with generation settings, streaming, and safety controls.
After accepting terms in Google AI Studio, the key step is generating an API key. Users can either create a new key tied to a fresh project or create one inside an existing Google Cloud project. Once the key is copied, it’s used in Google Colab via the notebook’s secrets configuration. With the key in place, the AI Studio interface lets users try Gemini Pro and Gemini Pro Vision directly using freeform prompts, structured prompts, or chat prompts. A simple prompt—like drafting an email announcing Gemini Pro availability—returns a completion, while chat mode supports multi-turn conversation where follow-up prompts continue the context.
Beyond prompting, the console provides controls that also map cleanly into code. Temperature and other generation parameters can be adjusted, and safety settings can be edited to decide what content categories to block and how strongly to block them. Categories mentioned include harassment, hate speech, sexually explicit content, and dangerous content. The interface also provides prompt feedback, flagging whether a prompt violated any configured safety rules.
The Colab walkthrough then demonstrates the same workflow programmatically. It starts by listing available model options, including legacy bison models (text bison and chat bison), Gemini Pro, Gemini Pro Vision, and an embedding model, plus an AQA model mentioned for a future video. For text generation, the notebook uses model.generate_content with a basic question (e.g., “biggest planet”), then reads the result from response.text and prints it with Markdown. It also shows streaming by setting stream=True, which returns the output in chunks.
For more control, the notebook configures generation parameters such as temperature, top P, top K, and maximum output tokens. Safety settings are passed as a list of category dictionaries—harassment, hate speech, sexually explicit, and dangerous content—along with thresholds (for example, blocking hate speech at “medium and above”). The code then instantiates the model with both generation config and safety settings, and again checks response.prompt_feedback for objectionable content.
Chat mode is handled separately: model.startchat creates a chat session, optionally seeded with history, and chat.send_message sends new user turns while maintaining conversation state. The notebook shows how chat history is stored as role/text pairs.
Finally, Gemini Pro Vision is demonstrated using images sourced from NASA. When only an image is provided, the model returns general information about the pictured planet (Saturn). When text is added alongside the image—such as requesting the planet name and movies featuring it—the output becomes conditioned on both modalities. With two images (Earth and Saturn), the model can compare them even without explicitly naming which planet each image contains, then responds with differences and contextual facts for each.
The walkthrough closes by pointing to next steps: using Gemini Pro with LangChain and adding function calling, with those topics scheduled for follow-up videos.
Cornell Notes
Gemini Pro on Google AI Studio becomes usable once an API key is created and placed into a Colab notebook’s secrets. Developers can test Gemini Pro in the console using freeform, structured, or chat prompts, then replicate those behaviors in code with model.generate_content and model.startchat. Generation control comes from parameters like temperature, top P, top K, and maximum output tokens, while safety is enforced via category-based thresholds for harassment, hate speech, sexually explicit content, and dangerous content. Gemini Pro Vision extends the same workflow to images, supporting image-only descriptions, image-plus-text question answering, and multi-image comparisons (e.g., Earth vs. Saturn).
What are the minimum steps to start using Gemini Pro from Google AI Studio and Colab?
How does the notebook perform basic text generation with Gemini Pro?
How do streaming and generation parameters change the way outputs are produced?
How are safety settings applied in code, and what feedback is available?
What’s the difference between using Gemini Pro for single-turn generation versus chat?
How does Gemini Pro Vision handle images, and what kinds of tasks does it support?
Review Questions
- How would you modify generation behavior using temperature, top P, top K, and maximum output tokens, and what effect would you expect from each change?
- What information does response.prompt_feedback provide, and how could you use it to debug safety-related prompt blocks?
- In what ways do model.generate_content and model.startchat differ in how conversation context is maintained?
Key Points
- 1
Create an API key in Google AI Studio (new project or existing Google Cloud project) and store it in Colab secrets before making any model calls.
- 2
Use Gemini Pro in the AI Studio console to test freeform, structured, and chat prompts, then mirror those prompts in code.
- 3
Tune output with generation config settings including temperature, top P, top K, and maximum output tokens.
- 4
Apply safety controls by setting thresholds for harassment, hate speech, sexually explicit content, and dangerous content, and check response.prompt_feedback for violations.
- 5
Enable streaming by setting stream=True to receive generated text in chunks rather than a single response.
- 6
Use Gemini Pro Vision by sending image inputs (image-only for general descriptions, image-plus-text for targeted Q&A, and multiple images for comparisons).