Get AI summaries of any video or article — Sign up free
How I Use AI to take perfect notes...without typing thumbnail

How I Use AI to take perfect notes...without typing

Thomas Frank Explains·
5 min read

Based on Thomas Frank Explains's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Set up a Pipedream workflow that triggers on new audio uploads to a specific Google Drive folder, not your entire Drive.

Briefing

A hands-off workflow can turn spoken voice notes into structured Notion pages—complete with a transcript, a concise summary, and actionable lists—by chaining OpenAI’s Whisper transcription with ChatGPT-style summarization inside an automation triggered by new audio uploads.

The core setup uses four building blocks: a Notion account with a notes database, an OpenAI account for API access, a cloud storage folder (Google Drive in the tutorial, with Dropbox as an alternative) to hold incoming audio, and Pipedream to connect everything. The automation watches a specific “audio upload” folder. When a new audio file lands there, Pipedream downloads it into a temporary directory, sends the audio to Whisper to produce text, then feeds that transcript into ChatGPT to generate a title, a summary, and lists such as main points and action items. The final step creates a new page in the chosen Notion database, so the voice note becomes a searchable record inside the user’s “second brain.”

Behind the scenes, the workflow is built as a sequence of steps inside Pipedream. First comes a Google Drive trigger: “emit a new event anytime a new file is added” to a selected folder. After testing the trigger, the workflow extracts the uploaded file’s ID and—crucially—its file extension so the system can handle different audio formats (the tutorial highlights M4A from iPhone Voice Notes, but the same logic supports other Whisper-supported types). Next, a Google Drive “download file” action pulls the audio into Pipedream’s /tmp storage, because Whisper can’t access Google Drive directly.

The Whisper step uses OpenAI’s transcription capability with the audio file path pointing to the downloaded temp file. Two practical issues are flagged: temp files may disappear if testing takes too long, and Pipedream’s default execution timeout (30 seconds) can cut off longer transcriptions. The workaround is to re-upload a test file if the temp directory expires, and to raise the workflow timeout to 180 seconds in execution control.

For summarization, a ChatGPT API step is configured with a carefully designed prompt. The tutorial emphasizes that output quality depends heavily on prompt structure, and it uses a delimiter-based format so the response can be parsed into separate fields. The system instructions force responses in Markdown, with example formatting that includes headings and bullet lists. A temperature setting around 0.2 keeps results consistent and straightforward.

One additional “formatter” step—implemented with a small Node.js code block—splits the ChatGPT output into distinct Notion-ready properties (title, summary, transcript, and additional lists). It also reformats the transcript into short paragraphs so Notion receives it as readable blocks rather than a single wall of text. Finally, a Notion step creates a database page from the database, sets the page title and properties (including a “type” value like AI transcription), and inserts the formatted Markdown content.

Once deployed, the system runs automatically: upload a voice note to the watched folder and Notion receives a fully structured page. Limitations remain—Whisper’s file size cap is noted (25 MB)—and the tutorial points to a code-heavy variant for longer audio. The result is a faster bridge between real-time thinking and long-term capture, replacing slow thumb-typing with spoken input that lands in Notion already organized for review and follow-up.

Cornell Notes

The workflow turns voice notes into structured Notion pages by automating three steps: (1) watch a cloud folder for new audio, (2) transcribe the audio with OpenAI Whisper, and (3) summarize the transcript with ChatGPT into a title, summary, and lists like main points and action items. Pipedream orchestrates the process, downloading the audio to /tmp so Whisper can access it, then creating a new page in a Notion database with Markdown-formatted content. Prompt design matters: delimiter markers and Markdown-only system instructions make the output parseable and consistent. A small formatter code step splits the model output into separate Notion fields and formats the transcript into short paragraphs for readability.

How does the automation decide when to start, and what exactly triggers it?

Pipedream uses a Google Drive trigger set to “emit a new event anytime a new file is added” to a specific watched folder (e.g., an “audio upload test” folder). The workflow starts when a new audio file is uploaded into that folder, producing an event object that includes the file ID and file extension. Those values are then referenced dynamically in later steps.

Why does the workflow download audio into a temp directory before calling Whisper?

Whisper can’t directly read from Google Drive. Pipedream first downloads the uploaded file into its /tmp storage using a “download file” action. The Whisper step then points to the local temp file path (e.g., /tmp/recording.<extension>) so transcription can run.

What two common problems can break transcription during setup, and how are they handled?

First, temp files may expire while testing, causing errors like “recording no longer exists.” The fix is to upload another test file and re-run the download/transcription steps. Second, Pipedream workflows default to a 30-second timeout; longer audio can exceed that. The fix is to raise the execution timeout to 180 seconds in workflow settings.

How does the prompt structure make the ChatGPT output usable for Notion fields?

The prompt asks for a title under 15 words and then uses delimiter markers (e.g., “---summary---” and other labeled sections) so the response can be split into components. It also requests specific headings and lists (main points, action items, follow-up questions, etc.). System instructions require Markdown-only output, which Pipedream can parse into Notion-friendly formatting.

Why add a formatter step even though ChatGPT already returns structured text?

The tutorial notes that Notion page layout needs separate fields (like page title vs. page content). A small Node.js formatter step splits the ChatGPT response into properties such as title, summary, transcript, and additional info. It also converts the transcript into short paragraphs (max ~3 sentences each) so Notion displays it as readable blocks rather than one long wall of text.

What practical limitation affects how long an audio note can be, and what’s the workaround?

Whisper has a 25 MB file limit. For longer recordings (like hour-long voice notes or long podcasts), the tutorial points to a code-heavy variant of the workflow that can help get around the limitation, rather than relying on the simplified no-code setup.

Review Questions

  1. What information from the Google Drive trigger (ID, file extension) must be carried into later steps, and why?
  2. How do delimiter markers and Markdown-only system instructions improve the reliability of downstream parsing into Notion fields?
  3. What changes would you make if your transcriptions frequently time out or fail during testing?

Key Points

  1. 1

    Set up a Pipedream workflow that triggers on new audio uploads to a specific Google Drive folder, not your entire Drive.

  2. 2

    Download each uploaded audio file into Pipedream’s /tmp storage before sending it to OpenAI Whisper.

  3. 3

    Dynamically use the uploaded file’s extension (e.g., M4A) so the workflow works across different audio formats.

  4. 4

    Raise Pipedream’s execution timeout (e.g., to 180 seconds) to handle longer Whisper transcriptions.

  5. 5

    Use a delimiter-based prompt plus Markdown-only system instructions so ChatGPT output can be parsed into title, summary, and lists.

  6. 6

    Add a formatter step to split ChatGPT output into separate Notion properties and to format the transcript into short paragraphs.

  7. 7

    Deploy the workflow so new voice notes automatically create structured Notion pages with transcripts, summaries, and action items.

Highlights

A watched Google Drive folder can act as the “watch folder” trigger: upload audio, and Pipedream starts transcription and summarization automatically.
The workflow dynamically extracts the uploaded file extension (like M4A) so Whisper receives the correct file type without manual edits.
Markdown-only system instructions plus delimiter markers make ChatGPT responses reliably separable into Notion-ready sections.
A small Node.js formatter step improves Notion layout by splitting title vs. content and breaking transcripts into short paragraph blocks.
Whisper’s 25 MB limit is a real constraint; longer audio requires a different (code-heavy) approach.

Topics

Mentioned