Get AI summaries of any video or article — Sign up free
Everything New in The World of AI! thumbnail

Everything New in The World of AI!

MattVidPro·
6 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

OpenAI made DALL·E 3 available to ChatGPT Plus users inside the ChatGPT mobile app, enabling on-phone image generation.

Briefing

OpenAI has rolled out DALL·E 3 across ChatGPT Plus, and the biggest practical change is that DALL·E 3 is now available directly inside the ChatGPT mobile app. That means subscribers can generate images on their phones without switching tools—though some users complain the ChatGPT interface is more restrictive, blocking requests involving famous characters unless users bypass safeguards. For beginners, the convenience still makes ChatGPT a low-friction entry point, even if power users often prefer Microsoft Bing’s comparatively looser environment for image generation.

Midjourney is responding on two fronts: higher-resolution output and a push toward a broader web and mobile experience. A new upscaler with 2X and 4X options is live, targeting up to 16 megapixels—described as “photograph size.” Early results look detailed at a distance and natural overall, but close-up inspection shows the upscaling can behave more like a detail enhancer than a perfect recreation of fine textures, and background elements sometimes remain inconsistent. Midjourney is also testing a redesigned website in beta that loads faster and improves browsing, but generation on the site is not yet available in that beta. Access requires at least 1,000 generations on an account, suggesting Midjourney is gating features while it stabilizes the new platform.

The most notable Midjourney-related move may be the launch of a mobile app ecosystem under the name “Nii Journey,” built with Spellbrush and positioned as an AI anime generator. The app supports prompt-based generation, image-to-image, and community-style live feeds where users can watch others’ generations in real time. Midjourney images can also be generated inside the app, and the social, feed-driven design hints at a strategy to reduce reliance on Discord—where Midjourney’s workflow has historically lived.

Beyond image generation, the roundup highlights open-source progress in “audio understanding.” A model called SALMON (Speech Audio Language Music Open Neural network) can analyze audio to produce text descriptions and interpretations, including recognizing background sounds such as gunfire and explosions. The same system can also respond to uploaded music—turning a piano-and-vocal track into an interpretive description of mood and structure. The practical implication is accessibility: audio-to-text systems could help people who are deaf by describing what’s happening around them without needing other people as intermediaries.

A separate AI art development, “PixArt Alpha,” is pitched as a fast-training text-to-image diffusion transformer that can train in about 10% of the time compared with Stable Diffusion-family approaches, with claims of competitive image quality against models like Google’s IM and Midjourney. The efficiency angle matters because it could make home experimentation more feasible, especially since the project plans to co-release code and weights.

On the autonomous-agent front, Hyperight’s assistant (run by CEO Matt Schumer) received an update that improves task completion and adds source citation. A demo shows the assistant searching for restaurants, then making an OpenTable reservation by navigating the booking flow and filling in details automatically. Finally, Meta shared research toward near-real-time decoding of visual perception from brain activity using a system called “MEG,” producing rough reconstructions of what participants viewed after only one second—accurate enough to identify broad categories and colors, while still missing fine details like faces.

Cornell Notes

OpenAI’s DALL·E 3 is now available to ChatGPT Plus users inside the ChatGPT mobile app, making image generation more convenient—though some requests are blocked more often than in other interfaces. Midjourney is upgrading output with a new 2X/4X upscaler aimed at up to 16 megapixels, testing a faster website redesign, and launching a mobile app ecosystem (Nii Journey) that emphasizes community-style live generation. Open-source SALMON pushes audio understanding by converting audio (including background events and music) into text descriptions, with potential accessibility benefits. PixArt Alpha claims much faster training for text-to-image diffusion models and plans to release code and weights, lowering barriers to experimentation. Hyperight’s autonomous assistant update demonstrates end-to-end task execution like booking an OpenTable reservation, while Meta’s MEG research moves toward decoding visual perception from brain activity.

What changed for DALL·E 3 users, and why does it matter in practice?

DALL·E 3 was rolled out across ChatGPT Plus and made available inside the ChatGPT app on mobile. That removes the need to switch platforms just to generate images, lowering the barrier for casual creation. The tradeoff is stricter content controls in the ChatGPT interface—requests involving famous characters (e.g., Mario or Sonic) may be blocked unless safeguards are bypassed—so some users prefer Microsoft Bing for fewer restrictions.

How does Midjourney’s new upscaler change the quality and workflow of generated images?

Midjourney introduced 2X and 4X upscaling options, targeting outputs up to 16 megapixels. In reported examples, zoomed-out views look photograph-like with convincing natural detail (like cat hairs), while close-up inspection shows limitations: textures can become “detail-enhanced” rather than truly accurate, and background elements sometimes fail to render cleanly. The practical takeaway is that upscaling improves usable resolution, but it’s not a guarantee of perfect micro-detail.

Why is Midjourney’s move toward a mobile app (Nii Journey) strategically important?

Nii Journey, built with Spellbrush, is positioned as an AI anime generator but can also generate Midjourney images. It supports prompt entry, image-to-image, and uploading images, plus a live community feed showing generations in real time. That social, feed-based design suggests Midjourney wants to keep users engaged without relying on Discord, where its traditional workflow has been centered.

What can SALMON do with audio, and what real-world use cases does that enable?

SALMON (Speech Audio Language Music Open Neural network) can analyze audio and generate text outputs, functioning like an audio-to-text system. It can interpret background sounds—such as gunshots and explosions—to infer a context (e.g., a war-zone scenario). It can also respond to music by producing interpretive descriptions of mood and structure. The accessibility angle highlighted is describing environments for people who are deaf, reducing dependence on others for information.

What’s the significance of PixArt Alpha’s claimed training speed?

PixArt Alpha is described as a transformer-based text-to-image diffusion model with image quality competitive with state-of-the-art examples (including Google’s IM, SDXL, and Midjourney). The key differentiator is efficiency: it reportedly trains in about 10% of the time compared with Stable Diffusion-family training, described as roughly 10 times easier to train. That efficiency could make faster iteration and broader experimentation more feasible, especially since code and weights are planned for release.

How does Hyperight’s autonomous assistant update demonstrate real autonomy?

The demo shows the assistant not only generating a list of restaurants but also completing a multi-step task: making an OpenTable reservation for a specific time and party size. It navigates to the reservation flow, fills in fields (name, email, phone), and returns a confirmation link. A second example involves drafting and sending a well-researched marketing email with cited sources, illustrating end-to-end task execution rather than just suggestions.

What does Meta’s MEG research claim to achieve, and what are its limits?

Meta’s MEG system aims at near-real-time decoding of visual perception from brain activity. In examples, participants viewed images for about one second, and the system produced rough visual reconstructions—often capturing broad categories (e.g., a fluffy animal) and color similarities. The reconstructions aren’t identical to the original images and can miss details like faces, but they’re accurate enough to identify what’s being seen at a high level.

Review Questions

  1. Which interface—ChatGPT mobile or Microsoft Bing—was described as having more restrictions for DALL·E 3 requests, and what kinds of requests were mentioned?
  2. What resolution target and upscaling options did Midjourney introduce, and what quality tradeoffs were observed when zooming in?
  3. How do SALMON and MEG differ in input type and output type, and what potential applications were highlighted for each?

Key Points

  1. 1

    OpenAI made DALL·E 3 available to ChatGPT Plus users inside the ChatGPT mobile app, enabling on-phone image generation.

  2. 2

    ChatGPT’s DALL·E 3 workflow may block requests involving famous characters more often than Microsoft Bing, pushing some users to alternative interfaces.

  3. 3

    Midjourney added 2X/4X upscaling aimed at up to 16 megapixels, improving usability at higher resolution while sometimes struggling with background detail.

  4. 4

    Midjourney’s redesigned website is in beta with faster loading, but image generation on the site is not yet enabled; access is gated by account generations.

  5. 5

    Midjourney’s Nii Journey mobile app emphasizes community live feeds and supports prompt-based and image-to-image generation, potentially reducing reliance on Discord.

  6. 6

    SALMON is an open-source audio-to-text model that can interpret background events and music, with accessibility implications such as describing environments for people who are deaf.

  7. 7

    Hyperight’s autonomous assistant update demonstrates end-to-end task completion, including making an OpenTable reservation and sending a researched marketing email with citations.

Highlights

DALL·E 3 is now reachable from the ChatGPT mobile app for ChatGPT Plus subscribers, but some famous-character requests are blocked more aggressively than in other interfaces.
Midjourney’s new 2X/4X upscaler targets up to 16 megapixels; close-up detail can look enhanced rather than perfectly faithful, especially in backgrounds.
Nii Journey turns Midjourney-style generation into a social, live-feed experience—prompting a shift away from Discord-centric workflows.
SALMON can convert audio into text interpretations, including inferring context from background sounds like gunfire and explosions.
Meta’s MEG research suggests rough, near-real-time reconstruction of what people viewed from brain activity after only one second.

Topics

Mentioned