AI Frontiers: Jesper Hvirring Henriksen (OpenAI DevDay)

TL;DR

Be My Eyes launched “Be My AI” on GPT-4V to give blind and low-vision users an always-available alternative to volunteer video calls.

Briefing Cornell Notes

Briefing

Be My Eyes has launched “Be My AI” on GPT-4V, giving blind and low-vision users an always-available visual assistant that can describe images in real time—without requiring a volunteer video call. The core shift is independence: users can get help 24/7 when they don’t want to rely on others, don’t feel like talking to a stranger, or simply need quick answers at inconvenient times. The service is built on Be My Eyes’ existing model—over half a million users supported by more than 7,000,000 volunteers—but adds an AI option for moments when human assistance isn’t ideal.

The practical impact centers on image understanding across both everyday life and the web. Many apps and websites are accessible at a basic level, yet their images often lack meaningful alt text, leaving screen reader users unable to interpret photos and graphics. With Be My Eyes and Be My AI, users can request thorough descriptions of images encountered online or inside apps, including group chat photos that are otherwise confusing and inaccessible website graphics that screen readers can’t fully convey. A cited example from a US government site describes a chart whose visible text is generic (“Bar chart showing rising global temperatures since 1880”) while the AI-generated description provides a more usable interpretation—framing the difference as “another world” for users who previously had no meaningful access to the visual content.

The rollout also emphasizes that the descriptions are not only accurate but often conversational and lightly witty, which matters for usability. Beta testing results are presented through user outcomes and usage metrics: one long-time user (Caroline) moved from about two calls per year to becoming a beta tester in March and then completing more than 700 image descriptions. Another beta tester (Lucy Edwards) demonstrates the workflow in daily tasks—checking eggs for shells, reading an oat milk expiry date, sorting laundry, identifying food on a plate, interpreting product packaging, and even getting captions for Instagram photos that lack audio description.

Operationally, Be My Eyes describes a staged deployment: a small beta group in March when GPT-4 launched, followed by a broader iOS rollout about six weeks before the talk. Usage has reached roughly a million image descriptions per month, with satisfaction ratings around 95% after accounting for downtime and system errors. Language support has expanded to about 36 languages, attributed largely to prompting the model to respond in the user’s language.

Beyond consumer use, Be My AI is also deployed in enterprise support workflows, including Microsoft’s Disability Answer Desks. Users can start with a chatbot for computer troubleshooting rather than escalating to a call; the service reports that 9 out of 10 users who begin with chat do not escalate. The broader conclusion is that AI systems that can see, hear, and respond in human-like ways are expected to materially improve accessibility in assistive technologies—turning visual understanding into a practical, scalable support layer for people who need it most.

Cornell Notes

Be My Eyes added “Be My AI” built on GPT-4V to provide blind and low-vision users an always-on alternative to volunteer video calls. The service focuses on turning images—both in daily life and on the web—into detailed, usable descriptions, especially where websites and apps lack meaningful alt text. Early deployments show high engagement (about a million image descriptions per month) and strong satisfaction (around 95% after discounting downtime/system errors). Language support expanded to roughly 36 languages, largely by prompting the model to answer in the user’s language. The same capability is also used in enterprise support, including Microsoft’s Disability Answer Desks, where most users resolve issues via chat without escalating to calls.

What problem does Be My AI solve that volunteer video calls can’t always?

Users want independence and don’t always want to rely on a stranger via video. Be My AI provides 24/7 visual assistance so someone can get help at any time—like checking an item before cooking or interpreting an image in a group chat—without scheduling a call or feeling like a burden.

Why does image description matter for accessibility on the web and in apps?

Basic screen-reader accessibility often fails when images lack meaningful alt text. That leaves users unable to interpret photos, graphics, and charts. Be My AI lets users request thorough descriptions of images they encounter online or inside apps, including inaccessible graphics where the visible text is too generic to be useful.

What does the transcript suggest about the quality of GPT-4V descriptions?

The descriptions are described as highly accurate and sometimes witty or human-like. A specific comparison is made between an inaccessible chart description (“Bar chart showing rising global temperatures since 1880”) and a richer AI-generated interpretation, presented as a major improvement in what users can actually understand.

How quickly did Be My AI scale, and what usage/satisfaction numbers were reported?

After a small beta rollout in March, Be My AI reached all iOS users about six weeks before the talk. Reported scale is about a million image descriptions per month, with satisfaction around 95% after discounting downtime and other system errors.

How did language support expand, and what was the approach?

Language coverage reached about 36 languages. The transcript attributes much of this to prompting the model to respond in the same language the user is using, rather than building separate language-specific pipelines.

How is Be My AI used outside consumer assistance?

Be My AI is deployed in enterprise customer support, including Microsoft’s Disability Answer Desks. Users can start with a chatbot instead of a call; the transcript reports that 9 out of 10 users who begin with chat don’t escalate to a call, because the model answers troubleshooting questions about computers.

Review Questions

How does the lack of meaningful alt text affect accessibility, and how does Be My AI address that gap?
What metrics were used to describe Be My AI’s early performance (usage volume and satisfaction), and what do they imply about user adoption?
In what ways does the transcript connect visual AI capabilities to both consumer accessibility and enterprise support workflows?

Key Points

1
Be My Eyes launched “Be My AI” on GPT-4V to give blind and low-vision users an always-available alternative to volunteer video calls.
2
The service targets situations where independence matters—users can get help without scheduling or relying on others.
3
Be My AI improves access to images online and in apps, especially when images lack meaningful alt text for screen readers.
4
Reported early results include about a million image descriptions per month and roughly 95% satisfaction after discounting downtime/system errors.
5
Language support expanded to about 36 languages by prompting the model to respond in the user’s language.
6
Be My AI is also used in enterprise support, including Microsoft’s Disability Answer Desks, where most users resolve issues via chat without escalating to calls.

Highlights

Be My AI turns images—photos, charts, and app graphics—into detailed descriptions, filling the alt-text gap that often blocks screen-reader users.

A beta user went from about two calls per year to more than 700 image descriptions after joining the AI beta.

The rollout reached about a million image descriptions per month on iOS, with satisfaction around 95% after accounting for system downtime/errors.

In Microsoft’s Disability Answer Desks, 9 out of 10 users who start with chat don’t escalate to a call. 

Topics

Be My Eyes
GPT-4V
Image Description
Accessibility
Enterprise Support

Mentioned

Be My Eyes
OpenAI
Microsoft
Jesper Hvirring Henriksen
Lucy Edwards
Caroline
GPT-4V