science communicators and educators

From research paper to narrated explainer video

Turn a dense academic paper into a short, narrated, captioned explainer video.

5 stepsVerified 2026-06-26

The hard part of "paper to video" isn't the video — it's not garbling the science along the way, because an error baked into rendered audio or a finished avatar clip is painfully expensive to fix. This recipe front-loads accuracy: Undermind first runs a citation-chasing search to surface the actually-relevant paper(s), Elicit condenses those findings into a tight, plain-language explainer script you can read and correct, and only then does the media pipeline begin — ElevenLabs voices the *finalized* script, HeyGen turns that script-plus-voice into a talking-head avatar clip, and Submagic burns in social-ready captions. Each output is the next step's literal input: paper → script → voiceover → avatar video → captioned cut. The decisive move is treating the script as the verification gate — lock and fact-check it before step 3, because regenerating a HeyGen avatar render or re-voicing a wrong number costs credits and time, whereas editing text is free. It breaks down on papers whose core contribution is a proof or a single figure (a talking head can't carry a diagram — add a screen-share or B-roll for those), and on very new papers that the discovery step hasn't indexed yet, in which case you summarize from the PDF directly.

Prerequisites

An arXiv or PDF link to the source paper
Accounts for each app in the chain
A target length and audience for the final clip

The workflow

1
UndermindLiterature search
Pose your topic as a natural-language question; let the agent search, read, and rank the most relevant paper(s) along citation trails.
Picking the right paper up front is the whole game — its citation-chasing search surfaces the genuinely relevant study a keyword search would miss, so the explainer is built on real literature.
Swap this step(20)
- Astaopen source · free
- FutureHousefree
- Semantic Scholarfree
- OpenEvidencefree
- AnswerThisfree tier · single platform
Top 5 of 20 · ranked by license, cost, and platform footprint
2
ElicitSummarization
Summarize the chosen paper's contribution, headline result, and one limitation into a tight, plain-language explainer script.
Elicit summarizes paper takeaways against your specific question with sentence-level citations, so the script you hand to the voice step is condensed yet traceable — not a vibe-based rewrite.
Swap this step(78)
- Glarityopen core · free tier
- AFFiNEopen core · free tier
- Semantic Scholarfree
- Yahoo Scoutfree
- Qwen Chatfree
Top 5 of 78 · ranked by license, cost, and platform footprint
3
ElevenLabsSpeech synthesis (TTS)
Synthesize the finalized, fact-checked script into a natural-sounding voiceover narration.
Best-in-class TTS naturalness is what makes the explainer listenable rather than robotic; doing it after the script is locked means you only ever voice verified copy, never a draft.
Swap this step(44)
- Sesameopen source · free · single platform
- Big-AGIopen source · bring your own key · single platform
- Leonopen source · bring your own key
- LiveKitopen core · free tier
- Neuphonicopen core · free tier
Top 5 of 44 · ranked by license, cost, and platform footprint
4
HeyGenText-to-video
Feed the script and the ElevenLabs voiceover into HeyGen to render a lip-synced talking-head avatar explainer video.
HeyGen turns a script-plus-voice into a finished presenter clip with lip-sync, giving the explainer a face and pacing without a camera, studio, or editor.
Swap this step(45)
- Wanopen core · free tier
- Genmoopen core · free tier
- Civitaiopen core · free tier
- Odysseyfree · single platform
- Google AI Studiofree · single platform
Top 5 of 45 · ranked by license, cost, and platform footprint
5
SubmagicSubtitle generation
Upload the rendered video to auto-generate and burn in styled captions sized for TikTok, Reels, and Shorts.
Most social viewers watch muted, so burned-in captions are non-optional for reach; Submagic auto-captions in dozens of languages and formats the cut for vertical feeds in one pass.
Swap this step(25)
- TurboScribefree tier · single platform
- Maestrafree tier · single platform
- Perso AIfree tier · single platform
- Vizardfree tier · single platform
- VEED.IOfree tier · single platform
Top 5 of 25 · ranked by license, cost, and platform footprint

References

All recipes Undermind Elicit ElevenLabs HeyGen Submagic