Get AI summaries of any video or article — Sign up free
Automate Anything with Make.com, Here’s How thumbnail

Automate Anything with Make.com, Here’s How

David Ondrej·
5 min read

Based on David Ondrej's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Use Airtable as the state tracker for multi-stage AI pipelines: store movie definitions, scene prompts, aspect ratios, and returned task/image/video IDs.

Briefing

A practical, credit-aware workflow for turning a single text prompt into a complete faceless “AI video” pipeline is the centerpiece: Midjourney generates scene images, Go API orchestrates Midjourney calls without direct APIs, Luma Labs animates the upscaled frames, 11Labs adds voice, and a video-editing layer stitches four scenes into a final output with captions. The payoff is less about one-off creativity and more about building a repeatable automation that can scale across prompts while keeping quality under control.

Steven G Pope walks through a Make.com setup built around Airtable as the control plane. A “movies” table defines a project, then scenes are added with prompts and aspect ratios. When a user selects a scene and triggers a checkbox, Make.com sends an API request via Go API to Midjourney. Because Midjourney takes time to render, the workflow relies on webhooks: once the render finishes, Go API hits a webhook, Make.com retrieves the resulting task/image ID, and Airtable updates the record so the generated images appear in the interface.

From there, the pipeline selects one of the generated images for upscaling. Another Midjourney API call produces the higher-resolution frame, again using task IDs and webhooks to return results into Airtable. Only after the upscaled image is ready does the workflow move to video generation: it creates a “video prompt” describing motion and camera behavior (for example, animating cars driving on the Los Angeles freeway, clouds drifting, or acting “like you’re a drone” flying forward). Go API also integrates with Luma Labs, enabling the same orchestration approach even though neither Midjourney nor Luma Labs is presented as a straightforward API-first product.

Video generation introduces a more complex wait strategy. Instead of sleeping for a fixed long interval, Make.com uses a repeater loop that checks status every 60 seconds, then exits the loop once the job is complete and writes the finished video back to Airtable. The automation deliberately avoids fully hands-off generation for every step because video rendering can burn credits quickly; Pope favors approving or curating key outputs to protect quality and cost.

After the four scenes are produced, a separate editing automation assembles them into a single deliverable. The workflow uses an API-based editing platform powered by FFmpeg to stitch scene clips together. In a related example, the system also generates audio and captions automatically—using 11Labs for AI voice—so the final product can be formatted for short-form distribution.

Beyond the build, the conversation turns to strategy: automation shouldn’t chase perfection in one model or one tool. The most durable advantage is designing workflows that benefit from model improvements—so when Midjourney, Luma Labs, or 11Labs updates, the automation can swap in better outputs without rebuilding from scratch. The discussion also emphasizes systematic debugging (change one variable at a time) and using AI tools to accelerate development when no-code platforms hit limits, such as Make.com’s file-size constraints for moving large video assets. Finally, the pipeline is positioned as a foundation for monetization through templates plus coaching/consulting—helping businesses apply AI practically rather than chasing flashy experiments that don’t hold up in production.

Cornell Notes

The workflow described turns a text prompt into a multi-scene faceless AI video by chaining Midjourney image generation, Midjourney upscaling, Luma Labs animation, and 11Labs voice—then assembling everything into one final video with captions. Make.com acts as the orchestrator, while Airtable stores movie and scene definitions and tracks task IDs as jobs complete. Go API bridges the gap by enabling Midjourney and Luma Labs calls through API-like requests plus webhook callbacks, even though direct APIs aren’t available in the usual way. The build uses status-check loops to avoid long idle waits and deliberately limits full automation to control credit costs. The broader lesson: design automations that improve as underlying models improve, and treat iteration and debugging as part of the process.

How does the pipeline coordinate Midjourney renders when Midjourney doesn’t provide a simple “wait for result” API flow?

Make.com triggers Midjourney through Go API and then relies on webhooks to resume the workflow when rendering finishes. Airtable stores the “movies” and “scenes” records, including aspect ratio and prompts. After Go API completes the Midjourney job, it calls a webhook endpoint; Make.com receives the resulting task/image ID, updates the Airtable record, and the generated images appear for selection (e.g., choosing “image 2” for the next step).

Why does the build separate image generation, upscaling, and video prompting instead of going straight from prompt to animation?

The workflow first generates multiple Midjourney scene images, then lets the user pick the best one before spending credits on upscaling. Upscaling produces a higher-resolution frame that becomes the input for Luma Labs animation. Only after seeing the upscaled image does the system generate a video prompt describing motion and camera behavior (e.g., cars driving on the Los Angeles freeway, clouds drifting, or a drone-like forward fly). This sequencing helps maintain quality because video prompts are easier to tune once the visual baseline exists.

What credit-cost problem does the automation try to avoid, and how?

Video generation can be expensive in credits, especially if every scene is fully automated without review. The workflow therefore uses partial human-in-the-loop decisions—such as approving which images to upscale and which outputs to animate—rather than running every possible branch automatically. The goal is to spend credits on the most promising inputs while still benefiting from automation for the repetitive orchestration work.

How does Make.com handle variable Luma Labs render times without wasting time?

Instead of a single long sleep, Make.com uses a repeater loop that checks job status every 60 seconds. This reduces idle time when renders finish in 5 minutes rather than 10, while still polling until the status becomes “okay.” Once complete, Make.com writes the finished video back to Airtable and breaks out of the loop using a variable condition.

What role does Airtable play beyond storing prompts?

Airtable functions as the orchestration ledger. It defines movies and scenes (including aspect ratios), stores prompts, and—critically—tracks task IDs and completion results as each stage finishes. When Go API and webhooks return outputs, Make.com updates Airtable records so downstream steps (upscaling, video generation, and final assembly) can reliably pull the correct assets.

Why is the “model-agnostic” mindset treated as a durability strategy?

The workflow is designed so that improvements in underlying models automatically improve outputs. When a new version of Midjourney, Luma Labs, or 11Labs arrives, the automation can swap to the updated model without rebuilding the whole pipeline. That approach aims to survive the fast churn of AI tooling, avoiding the risk of betting on a single model that quickly becomes outdated.

Review Questions

  1. If you had to redesign this automation from scratch, which components would you treat as the “control plane” (state tracking) versus the “execution plane” (API calls and webhooks), and why?
  2. Where in the pipeline would you insert quality gates to control credit usage, and what signals (e.g., which Airtable fields) would you use to decide?
  3. How would you modify the status-check loop if render times became highly variable (e.g., sometimes 30 seconds, sometimes 30 minutes) while still avoiding excessive polling costs?

Key Points

  1. 1

    Use Airtable as the state tracker for multi-stage AI pipelines: store movie definitions, scene prompts, aspect ratios, and returned task/image/video IDs.

  2. 2

    Bridge non-API-first tools with an orchestration layer (Go API) plus webhooks so Make.com can resume exactly when renders finish.

  3. 3

    Split the workflow into stages—image generation, upscaling, then video prompting—so video motion prompts can be tuned after seeing the visual output.

  4. 4

    Control credit burn by adding approval steps (human-in-the-loop) for the most expensive stages like video generation and upscaling.

  5. 5

    Use Make.com repeater loops with frequent status checks (e.g., every 60 seconds) instead of long fixed waits to reduce idle time.

  6. 6

    As models update, design the automation so it benefits immediately from better Midjourney/Luma Labs/11Labs outputs without rebuilding the entire system.

  7. 7

    Treat systematic debugging as a core skill: change one variable at a time and use AI tools to diagnose errors quickly when integrations fail.

Highlights

Airtable + webhooks turn slow, asynchronous renders into a reliable pipeline: Midjourney finishes, Go API pings a webhook, and Make.com updates the exact Airtable record tied to that job.
The workflow’s “quality-first” choice is deliberate: it generates multiple scene images, then upscales and animates only the selected best candidates to avoid wasting credits.
Luma Labs rendering is handled with a polling repeater loop that checks status every 60 seconds, exiting as soon as the job is “okay.”
Durability comes from model-agnostic automation: when underlying models improve, the same pipeline can swap in better outputs rather than starting over.
Final assembly uses an FFmpeg-based editing API to stitch four scene clips into a single upload-ready video, with captions and AI voice layered in from 11Labs.

Topics

Mentioned

  • Steven G Pope
  • David Ondrej
  • API
  • FFmpeg
  • AI
  • GPT
  • LLM
  • JSON
  • CPU
  • UI
  • API calls