Get AI summaries of any video or article — Sign up free
Building & Testing YOUR Open AI GPTs! thumbnail

Building & Testing YOUR Open AI GPTs!

MattVidPro·
5 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Prompting for copyrighted characters often requires describing visual traits and settings rather than naming characters, but guardrails still frequently block or censor outputs.

Briefing

OpenAI GPTs can be built and tested quickly, but real-world experimentation is often throttled by usage caps and inconsistent feature behavior—especially when trying to generate copyrighted-character images or run image generation inside GPTs. During a live, interactive session, MattVidPro tested several community GPTs (including an “upscaler,” a “jailbreak inator,” and a “Pokey GPT” that appears to query a Pokémon API) and then pushed one GPT into a long, chat-driven “lemon army” battle that ultimately ended when GPT-4 usage limits kicked in.

The most concrete takeaway came from the “Jailbreakinator” GPT, designed to produce uncensored-style AI art prompts. In practice, it could sometimes get close to recognizable branding (for example, Mario-like characters in McDonald’s settings), but OpenAI’s guardrails still frequently censored or blocked the most direct requests for copyrighted characters. The workaround that repeatedly surfaced was prompt engineering that avoids explicit names—describing “video game icons” through visual traits (hat color, mustache, overalls, setting details) rather than saying the character’s name. Even then, results varied: some prompts produced “somewhat close” images, while others were blocked or returned generic substitutes.

The session also highlighted a practical limitation: document “scanning” inside GPTs doesn’t behave like vector-based retrieval. When asked about forcing a GPT to read uploaded augmentation docs before answering, the response was that GPTs tend to search documents with keyword matching rather than true semantic retrieval, making thorough “scan then answer” behavior finicky.

Beyond censorship, the stream showcased how GPTs can chain capabilities. A voice-generation GPT appeared to call an external API (with OpenAI voice options like “alloy” and “Nova” being tested), and then a second GPT (“Pokey GPT”) was used to answer matchup-style questions by pulling structured data—suggesting an API-backed Pokedex rather than a static text dump. The “multiverse” style GPT turned text into interactive, branching story prompts, with the audience voting on actions in real time.

However, the most dramatic moment wasn’t a technical failure—it was the system’s limits. The “lemon battle” GPT ran through multiple rounds of narration and image-generation attempts, including a “juicer” weapon scenario narrated by 11 Labs Turbo. Image generation repeatedly failed with server errors inside GPTs, while narration often worked. Eventually, GPT-4 usage caps cut off further testing, forcing a switch to other models or accounts and ending the session’s ability to explore more community GPTs.

By the end, the stream delivered a clear, practical message for builders and tinkerers: GPTs are powerful for rapid prototyping and interactive demos, but reliable results depend on understanding guardrails, retrieval behavior, and—most importantly—how usage caps and backend load affect image generation and long sessions. The community Discord remained the hub for sharing GPT links and tips, with viewers also discussing future directions like GPT actions, API-based “chatGPT-like” websites, and better retrieval (vector embeddings) for uploaded knowledge.

Cornell Notes

The session tested multiple community-built GPTs and showed both what’s feasible and what breaks under real constraints. A “jailbreak inator” could sometimes approximate copyrighted characters using visual descriptions instead of names, but OpenAI guardrails still blocked or censored many requests. Document handling inside GPTs was described as keyword search rather than vector-based retrieval, making “scan all docs before answering” unreliable. Chained GPTs demonstrated API-backed features like voice narration and a Pokémon “Pokedex” that appears to query an API. The biggest limiter was operational: GPT-4 usage caps and intermittent image-generation/server errors repeatedly cut off experimentation mid-story.

Why did the “jailbreak inator” sometimes get close to copyrighted characters but still fail on others?

It relied on prompt workarounds that describe characters by distinctive visual traits (e.g., red hat, thick mustache, blue overalls) and settings (e.g., golden arches) rather than using explicit character names. That approach occasionally produced “somewhat close” outputs, but the model still triggered policy enforcement when requests were too directly tied to copyrighted characters or when the prompt crossed guardrail thresholds. The results were inconsistent across attempts (some images were blocked; others were partially successful).

What’s the practical difference between “scanning documents” and true retrieval in GPTs?

When asked how to force a GPT to read uploaded augmentation docs before answering, the response emphasized that GPTs often search documents with keywords rather than using vector-based semantic retrieval. That means the model may miss relevant sections if the user’s prompt doesn’t include the right keywords, and it can be “finicky” to get consistent “look through everything first” behavior.

How did the stream demonstrate GPTs using external capabilities beyond plain text generation?

Two examples stood out: (1) a voice generator GPT that produced narrated audio by calling an external voice API (OpenAI voices were tested, and 11 Labs Turbo was later used to narrate the story); and (2) “Pokey GPT,” which appeared to query a Pokémon API to answer questions like who would win between Pikachu and Mewtwo, including factual details such as Pokémon counts and matchup reasoning.

Why did image generation fail even when narration worked?

During the “lemon army” multiverse story, narration continued, but image generation frequently returned errors like “error creating images” or “encountering issues generating the image,” and the streamer noted backend/server problems. The pattern suggested that text generation and voice narration were more reliable than image generation inside GPTs at that moment, likely due to load, quotas, or separate image-generation pipelines.

What ultimately stopped further experimentation during the session?

GPT-4 usage caps. After a long interactive story with multiple rounds and attempts to generate images, the account hit a GPT-4 limit (“usage cap for GPT 4” / “GPT usage cap”), which prevented continuing with additional GPT testing. The streamer discussed switching models/accounts and noted that caps can be separate from API speed/behavior, making long live testing difficult.

Review Questions

  1. When prompt engineering avoids explicit names, what kinds of visual details are most likely to help produce recognizable character-like outputs—and why does this still not guarantee success?
  2. How would keyword-based document search change the way you design instructions for a GPT that must answer using specific uploaded materials?
  3. What operational constraints (usage caps, server load, image-generation reliability) should a builder plan for when designing an interactive GPT demo?

Key Points

  1. 1

    Prompting for copyrighted characters often requires describing visual traits and settings rather than naming characters, but guardrails still frequently block or censor outputs.

  2. 2

    Uploaded-document “reading” inside GPTs may behave like keyword search instead of vector/semantic retrieval, making “scan everything first” instructions unreliable.

  3. 3

    GPTs can chain external capabilities—voice narration and API-backed data lookups—so complex experiences can be built from multiple GPTs.

  4. 4

    Image generation reliability can lag behind text/voice generation; server errors can interrupt demos even when narration continues.

  5. 5

    Long interactive sessions are constrained by GPT-4 usage caps, which can abruptly end testing and require model/account workarounds.

  6. 6

    For builders, the most important engineering work is often around retrieval quality, policy constraints, and quota/latency planning—not just prompt writing.

  7. 7

    Community sharing (e.g., via Discord) is a practical way to discover and test GPTs, since many experiments depend on third-party builds.

Highlights

The “jailbreak inator” could sometimes approximate copyrighted characters using trait-based prompts, but OpenAI policy enforcement still blocked many attempts—showing partial workarounds rather than a full bypass.
Document use inside GPTs was described as keyword search, not vector-based retrieval, making “read all docs before answering” behavior inconsistent.
A multiverse-style GPT became an audience-driven story with real-time voting, but image generation repeatedly failed while narration worked—then GPT-4 usage caps ended the session.
“Pokey GPT” appeared to query a Pokémon API (not just a static text Pokedex), enabling structured matchup reasoning like Pikachu vs. Mewtwo.
The biggest practical limiter was operational: GPT-4 usage caps and backend load repeatedly interrupted experimentation mid-stream.

Topics

Mentioned