How to Make Quran Recitation Videos with Synced Captions
Recitation videos with Arabic text and a translation on screen are the most-shared format in Islamic social media. The recitation is the easy part — you already have that. The captions are what take people hours. This guide covers the three ways to do them, what each actually takes, and the styling details that separate clean videos from amateur ones.
Why Quran captions are harder than normal subtitles
If you have ever tried captioning a recitation by hand, you have hit at least one of these problems:
- The script itself. The Quran is written in Uthmani script, which differs from standard typed Arabic. Copy ayahs from a random website into a video editor and you often get missing harakat, broken ligatures, or a font that silently substitutes the wrong glyphs. The text must come from a proper digital mushaf and be rendered with a Quran-specific typeface such as KFGQPC Uthmanic Hafs — the font used in the printed Madinah Mushaf.
- Timing. Recitation is not speech. A single word can stretch over several seconds of madd, and there are long silent pauses at the end of each ayah. Subtitles need to land exactly when each word is recited, or the video feels off in a way viewers notice immediately.
- Where to break the text. Splitting an ayah mid-thought looks wrong to anyone who knows the Quran. Lines should break at waqf marks — the small letters above the text marking where a reciter pauses — not after an arbitrary number of words.
- Translation pairing. Each Arabic segment needs the matching portion of a real published translation, not a machine translation of the transcript.
Generic auto-caption tools fail on all four points at once, because speech-to-text models are trained on conversational Arabic, not tajwid-governed classical recitation. We wrote a separate breakdown of why auto-captioning fails on Quranic Arabic if you want the technical detail.
First: record a recitation that captions well
Whatever method you use, the recording matters. A few habits make every later step easier:
- Quiet room, close mic. Background noise is the main cause of bad automatic alignment and muddy audio on small phone speakers. Record within arm's length of the phone or mic.
- Recite continuously. Pauses between ayahs are fine — alignment handles them — but avoid speaking, coughing, or restarting mid-recording. If others say "ameen" in the background (for example in taraweeh recordings), expect to trim or accept slightly looser timing.
- Film vertical if the destination is vertical. A 9:16 phone recording fills TikTok, Reels, and Shorts without cropping. Landscape works too, but the crop will cut the sides.
- Keep takes under your plan's limit. Most short-form videos are 30–90 seconds; one or two ayahs recited well outperform a rushed page.
Method 1: Automatic captioning (minutes)
AyahFlow automates the entire caption layer. The flow:
- Upload your video (MP4/MOV) or audio file (MP3, WAV, M4A and others) in the browser.
- Detection. The AI listens to the recitation, identifies the surah, then verifies word-by-word against the Uthmani text to find the exact ayah range — including partial ayahs and repeated verses. You never type verse numbers.
- Alignment. A speech model specialized for Quranic recitation matches every word of the known text to its exact timestamp in your audio. Because the text is known in advance, this is forced alignment rather than transcription — it doesn't guess words, only timing.
- Review and style. A segment editor shows each caption with its timing. Captions split at waqf marks by default. You pick fonts, sizes, colors, text position, background dimming, and one of 16 translation languages. The preview matches the final render exactly.
- Render. The final video renders in the cloud in about a minute and downloads ready to post in 9:16, 1:1, 4:5, or 16:9.
Three videos are free (no card), so the honest advice is to test it on your own recitation rather than take our word for accuracy. Start here.
Method 2: Manual editing in CapCut or Premiere (2–4 hours)
The traditional route, and still what most tutorials on YouTube teach. Summarized honestly:
- Find your recited passage on Quran.com or Tanzil and copy the Uthmani text ayah by ayah.
- Install an Uthmani-compatible font (KFGQPC Uthmanic Hafs). In CapCut mobile this means importing the font file; in Premiere, installing it system-wide. Verify the harakat render correctly — many fonts quietly break them.
- Create a text layer per segment, pasting the Arabic, deciding yourself where to break lines (use the waqf marks in the mushaf as your guide).
- Add a second text layer per segment for the translation, copied from a published translation of your choice.
- Scrub the audio waveform and set in/out points for every segment by ear. This is the slow part — expect several minutes of fiddling per ayah to get word-level precision, and most people settle for ayah-level timing instead.
- Add a dim layer between the footage and text so the text stays readable, then export.
Done carefully this produces excellent results, and full manual control is real: any font, any animation, any layout. The cost is time — creators we've talked to report 2–4 hours per video — and the timing precision rarely matches forced alignment because human scrubbing works at the segment level, not the word level.
Method 3: QuranCaption (free desktop app)
QuranCaption is a free, open-source desktop app for Windows, macOS, and Linux built for exactly this task. You install it, load your recitation, and it assists with subtitle timing and translations in many languages, then exports a styled video. It is a genuinely good free option, particularly if you produce long-form videos at a desk and want to keep everything local.
The trade-offs against a web tool: you need a computer (no phone workflow), the pipeline involves more manual steps, and rendering happens on your own hardware. We compare the options in more depth in the best Quran video makers, compared.
Styling that looks right
These defaults come from rendering thousands of recitation videos. They hold regardless of which tool you use:
- Arabic prominent, translation secondary. A ratio of roughly 2:1 works — for 1080×1920 output, Arabic around 64 px and the translation around 34 px is a proven starting point. Oversized Arabic (90 px+) forces ugly line breaks on longer ayahs.
- Dim the footage behind the text. A 25–35% black overlay keeps white text readable over any background without making the video feel dark.
- Center the text block vertically. Platform UI (captions, buttons, progress bar) eats the top ~11% and bottom ~16% of a 9:16 video. Text near the edges gets covered. Our platform guide has the exact safe zones.
- Break at waqf marks. If an ayah is long, split where the mushaf marks a pause. Viewers who recite along will feel the difference.
- Subtle fades. 300–500 ms fade per caption segment reads as polished; hard cuts read as slideshow.
- A serif translation font. Crimson Text, Cormorant Garamond, or similar pair well with Uthmani script. Default sans-serif fonts fight the Arabic visually.
Mistakes that mark a video as amateur
- Arabic text with broken or missing harakat (wrong font, or text from a non-mushaf source).
- Captions that lag or lead the voice by half a second.
- Ayah text split mid-phrase instead of at a pause mark.
- Translation visibly machine-generated instead of a published edition like Saheeh International.
- Text sitting under the TikTok caption area or Instagram UI.
- Exporting once and reposting the platform-watermarked file to other platforms.
Caption your next recitation in about a minute
Upload a clip — AyahFlow detects the ayahs, syncs every word, and renders it ready to post.
Try AyahFlow Free3 free videos · No credit card required
Common questions
Do I need to know which ayahs I recited?
Not with automatic detection — AyahFlow identifies the surah and exact ayah range from the audio alone, including partial ayahs. For manual methods, yes: you'll look the passage up yourself.
Can I caption someone else's recitation?
Technically yes — the tools don't care whose voice it is. Get the reciter's permission before publishing, and credit them. Many famous reciters' recordings are rights-managed.
What if the detection gets an ayah wrong?
Review before you render. AyahFlow shows every detected segment in the editor where you can correct text and timing; never publish a Quran video without checking the text against a mushaf.
