HUGE AI News! ChatGPT Update & Leak, Gov Regulation, AI Music & Video

TL;DR

ChatGPT is rolling out multimodal capability that supports both image/document input and multimodal output within a single chat experience.

Briefing Cornell Notes

Briefing

A major shift is underway in how people will use ChatGPT: multimodal input and output is rolling out, letting users work with images and files inside a single chat instead of bouncing between separate tools. Early reports show a file-upload icon appearing on the ChatGPT phone app for some users, and the change is expected to expand what can be uploaded—PDFs and other document types—while keeping tasks consolidated. The practical payoff is less friction: one interface for vision-style prompts, document understanding, and image transformations.

Alongside that product change, a leaked or reported technical detail is fueling debate about model efficiency. Microsoft-linked research claims the free ChatGPT tier (described as “ChatGPT 3.5”) is roughly 20 billion parameters—far smaller than the 180 billion parameter size many had assumed. If accurate, the gap implies OpenAI’s smaller model is performing disproportionately well for its scale, suggesting more efficient architecture and training rather than brute-force size.

The biggest policy development comes from the U.S. government, which released an executive order aimed at establishing federal control over advanced AI. The framework centers on requiring developers of “very strong and powerful” AI systems to provide critical information to the government, including safety test reports. Those safety checks are described as “red team tests,” a method used to probe how systems behave under adversarial conditions. The order also assigns roles across agencies: the National Institute of Standards and Technology is set to define AI safety testing standards, while Homeland Security will create an AI safety board focused on critical infrastructure vulnerabilities.

The order extends beyond safety testing into discrimination and privacy. It calls for guidance to avoid algorithmic discrimination and references “equity,” a point likely to spark controversy. It also directs attention to privacy-preserving techniques, with the stated goal of protecting data even as AI capabilities accelerate. Worker protections are another pillar, with principles intended to mitigate AI-related harms and job displacement—though the feasibility of keeping pace with rapid technological change remains an open question.

A key uncertainty is how much the rules will affect open-source models. Several commentators argue the requirements may hinge on substantial compute thresholds—on the order of training runs using massive GPU resources—meaning smaller open-source efforts could largely escape immediate reporting burdens. Still, the government reserves the right to lower thresholds later, so open-source developers may not feel fully insulated.

Legal and industry updates add pressure from multiple directions. A lawsuit involving DeviantArt and Midjourney saw claims dismissed, with the reasoning tied to difficulty proving infringement and the lack of substantial similarity to protected works. Meanwhile, Stability AI’s CEO posted that a previously discussed six-month pause in AI activity appears to have ended, and Midjourney has rolled out a faster website experience and moved away from relying on Discord.

Finally, the entertainment side of AI keeps accelerating. “Gen music AI,” tied to the Gen-1 music generation research, is launching as a public-facing business with high-fidelity, stereo audio demos. Open-source AI video generation is also advancing, including work aimed at producing longer clips—up to 512 frames—addressing one of the field’s biggest limitations: maintaining consistency over time as video length grows.

Cornell Notes

ChatGPT is moving toward true multimodal use, with reports of image and file upload support appearing in the same chat experience—reducing the need to switch between separate tools. Technical claims tied to Microsoft research suggest the free ChatGPT tier may be far smaller than expected (about 20B parameters rather than 180B), implying strong efficiency rather than sheer scale. The U.S. executive order introduces federal oversight for advanced AI, including mandatory sharing of critical information and safety test results using red-team testing, plus standards-setting by NIST and review by a Homeland Security AI safety board. The rules may focus on very large training runs, which could limit immediate impact on open-source models, though thresholds could change. Legal outcomes and rapid product updates—from Midjourney to AI music and video—show the pace of change is not slowing.

What does “multimodal on both input and output” change for everyday ChatGPT use?

It shifts ChatGPT from being primarily text-focused (with separate vision/image workflows) toward handling images and other files within the same conversation. Reports mention a file-upload icon appearing for some users on the ChatGPT phone app, which implies users can upload PDFs and other file types directly. The practical result is fewer context switches: users can keep tools and tasks in one place while still doing vision-style transformations and document understanding.

Why does a claim of ~20B parameters for the free ChatGPT tier matter?

Parameter counts are often used as a rough proxy for model capacity. If the free tier is around 20 billion parameters—much smaller than an assumed 180 billion—it suggests the model is achieving strong performance without needing massive scale. That points to efficiency in model design and training, and it reframes expectations about what “small” models can do.

What are the core mechanisms in the U.S. AI executive order related to safety?

Developers of very strong AI systems are required to share critical information with the government, including safety test reports. Those tests are described as “red team tests,” which are adversarial evaluations meant to stress systems and uncover risky behaviors. NIST is tasked with setting AI safety testing standards, while Homeland Security will establish an AI safety board to review threats to critical infrastructure.

How might the executive order affect open-source AI?

The concern is whether reporting and oversight thresholds will capture open-source models. One counterpoint is that the requirements appear to target models trained with extremely large compute—described in terms of massive GPU usage and very high flops—so smaller open-source efforts may not be immediately covered. However, the government retains the ability to lower thresholds later, meaning open-source developers could face future compliance pressure.

What legal development suggests AI image generators may face fewer immediate shutdown risks?

A lawsuit involving DeviantArt and Midjourney had infringement claims dismissed. The dismissal hinged on the difficulty of proving infringement and the lack of substantial similarity to protected works, with emphasis on the need to show generated outputs reference or contain protected elements from specific copyrighted works. The takeaway is that, at least in this case, derivative-theory copyright claims struggled without stronger similarity allegations.

What’s new in AI music and AI video that signals where the market is heading?

On music, “Gen music AI” is launching as a business tied to Gen-1 music generation, with demos emphasizing high-fidelity, stereo audio (including 48 kHz audio). On video, open-source generation is improving in quality and consistency, and a new method for text-to-video aims to extend clip length up to 512 frames (roughly 20+ seconds at 24 fps), targeting a major limitation: keeping coherence over longer sequences.

Review Questions

Which specific change to ChatGPT’s interface is most likely to reduce user friction when working with documents and images?
What role do red team tests play in the U.S. executive order’s approach to AI safety?
Why might a parameter-count claim (20B vs 180B) shift expectations about model performance?

Key Points

1
ChatGPT is rolling out multimodal capability that supports both image/document input and multimodal output within a single chat experience.
2
A reported technical claim suggests the free ChatGPT tier may be far smaller (~20B parameters) than widely assumed (~180B), implying efficiency gains.
3
The U.S. executive order introduces federal oversight requiring developers to share critical information and safety test results using red-team testing.
4
NIST is set to define AI safety testing standards, while Homeland Security will create an AI safety board focused on critical infrastructure risks.
5
The order addresses discrimination guidance, privacy-preserving techniques, and worker-protection principles, but the practicality of keeping pace with rapid AI progress is uncertain.
6
Compliance impact on open-source may depend on compute thresholds, with the government reserving the right to lower requirements later.
7
Recent legal outcomes and fast-moving product updates (Midjourney, AI music, AI video) indicate the AI ecosystem is changing quickly on both technical and regulatory fronts.

Highlights

ChatGPT’s multimodal rollout is expected to bring image and file workflows into one interface, including uploads like PDFs for some users.

A Microsoft-linked claim pegs the free ChatGPT tier at about 20B parameters—dramatically lower than prior assumptions—suggesting strong efficiency.

The U.S. executive order requires safety information sharing and red-team testing reports, with NIST setting standards and Homeland Security reviewing critical infrastructure risks.

A lawsuit involving DeviantArt and Midjourney was dismissed, with courts emphasizing difficulty proving infringement without substantial similarity to protected works.

Gen music AI is launching Gen-1 music generation as a public business, while open-source video work pushes toward longer, more consistent clips (up to 512 frames).

Topics

ChatGPT Multimodal
U.S. AI Regulation
Red Team Testing
AI Music Gen-1
Open-Source AI Video

Mentioned

ChatGPT
Microsoft
OpenAI
NVIDIA
Midjourney
DeviantArt
Stability AI
Adobe Stock
123RF
imigo
Dolly 3
Mid Journey
SDXL
DALL·E 3
Gen music AI
Gen-1
Alex Volov
Felix
Kristoff Hanel
Franklin Graves
Eman
AI
PDF
GPU
LLM
NIST
flops
VR
ASL
SDXL
48 kHz