Skip to content

science communicators and educators

From research paper to narrated explainer video

Turn a dense academic paper into a short, narrated, captioned explainer video.

5 stepsVerified

The hard part of "paper to video" isn't the video — it's not garbling the science along the way, because an error baked into rendered audio or a finished avatar clip is painfully expensive to fix. This recipe front-loads accuracy: Undermind first runs a citation-chasing search to surface the actually-relevant paper(s), Elicit condenses those findings into a tight, plain-language explainer script you can read and correct, and only then does the media pipeline begin — ElevenLabs voices the *finalized* script, HeyGen turns that script-plus-voice into a talking-head avatar clip, and Submagic burns in social-ready captions. Each output is the next step's literal input: paper → script → voiceover → avatar video → captioned cut. The decisive move is treating the script as the verification gate — lock and fact-check it before step 3, because regenerating a HeyGen avatar render or re-voicing a wrong number costs credits and time, whereas editing text is free. It breaks down on papers whose core contribution is a proof or a single figure (a talking head can't carry a diagram — add a screen-share or B-roll for those), and on very new papers that the discovery step hasn't indexed yet, in which case you summarize from the PDF directly.

Prerequisites

  • An arXiv or PDF link to the source paper
  • Accounts for each app in the chain
  • A target length and audience for the final clip

The workflow

  1. UndermindLiterature search

    Pose your topic as a natural-language question; let the agent search, read, and rank the most relevant paper(s) along citation trails.

    Picking the right paper up front is the whole game — its citation-chasing search surfaces the genuinely relevant study a keyword search would miss, so the explainer is built on real literature.

    Swap this step(20)

    Top 5 of 20 · ranked by license, cost, and platform footprint

  2. ElicitSummarization

    Summarize the chosen paper's contribution, headline result, and one limitation into a tight, plain-language explainer script.

    Elicit summarizes paper takeaways against your specific question with sentence-level citations, so the script you hand to the voice step is condensed yet traceable — not a vibe-based rewrite.

    Swap this step(78)

    Top 5 of 78 · ranked by license, cost, and platform footprint

  3. ElevenLabsSpeech synthesis (TTS)

    Synthesize the finalized, fact-checked script into a natural-sounding voiceover narration.

    Best-in-class TTS naturalness is what makes the explainer listenable rather than robotic; doing it after the script is locked means you only ever voice verified copy, never a draft.

    Swap this step(44)
    • Sesameopen source · free · single platform
    • Big-AGIopen source · bring your own key · single platform
    • Leonopen source · bring your own key
    • LiveKitopen core · free tier
    • Neuphonicopen core · free tier

    Top 5 of 44 · ranked by license, cost, and platform footprint

  4. HeyGenText-to-video

    Feed the script and the ElevenLabs voiceover into HeyGen to render a lip-synced talking-head avatar explainer video.

    HeyGen turns a script-plus-voice into a finished presenter clip with lip-sync, giving the explainer a face and pacing without a camera, studio, or editor.

    Swap this step(45)

    Top 5 of 45 · ranked by license, cost, and platform footprint

  5. SubmagicSubtitle generation

    Upload the rendered video to auto-generate and burn in styled captions sized for TikTok, Reels, and Shorts.

    Most social viewers watch muted, so burned-in captions are non-optional for reach; Submagic auto-captions in dozens of languages and formats the cut for vertical feeds in one pass.

    Swap this step(25)

    Top 5 of 25 · ranked by license, cost, and platform footprint

References