How to Add Quran Subtitles to a Video Automatically

By the AyahFlow team · Updated June 2026 · 7 min read

You have a recitation video. You want the ayahs on screen in proper Arabic with a translation, synced to the voice. Here is why the auto-caption button in your editor can't do this, what actually works, and the exact steps.

Why auto-captions fail on Quran recitation

CapCut, Submagic, Zeemo, Captions.ai, Premiere's speech-to-text — all of them work the same way: a speech recognition model listens to the audio and guesses the words. That approach breaks down on recitation for four reasons:

Wrong training data. Arabic speech models are trained on modern conversational Arabic — news, podcasts, dialects. Quranic recitation is classical Arabic delivered under tajwid rules: elongated madd vowels, nasalized ghunnah, words stretched across seconds of melody. To a conversational model this barely looks like speech, and the transcript comes out mangled.
No diacritics. Even when a model gets the words right, ASR output is plain undiacritized Arabic. The Quran must display full tashkeel in Uthmani orthography — وَلَقَدْ يَسَّرْنَا الْقُرْآنَ لِلذِّكْرِ — and the Uthmani spelling of many words differs from standard typed Arabic, so even a perfect transcript would display the wrong script.
No verse structure. A transcript is a wall of text. It doesn't know where ayah 5 ends and ayah 6 begins, which surah it came from, or where the waqf pause marks fall — so it can't break captions in places that make sense.
No translation pairing. Auto-translating a garbled transcript produces a garbled translation. Quran videos need a published translation (Saheeh International and others) matched to the right ayah.

One transcription mistake in ordinary content is a typo. In Quranic content it changes the words of the Quran — which is why serious creators never publish unreviewed auto-captions on recitation.

The approach that works: alignment, not transcription

The text of the Quran is already known, word for word. So the right tool doesn't transcribe at all. It does two narrower jobs:

Identify the passage. Work out which surah and which ayahs are being recited. AyahFlow does this with an AI model that listens to the audio, predicts the surah, then verifies its guess word-by-word against the actual Uthmani text — so the output is an exact ayah range, not a guess.
Align the known text to the audio. A technique called forced alignment takes the known words and finds each word's exact start and end time in the recording, using a speech model specialized for Quranic recitation. Silence, pauses between ayahs, and stretched syllables are absorbed cleanly because the model only decides when each word happens, never what the words are.

The result is the canonical mushaf text — correct script, full tashkeel, verse boundaries, waqf marks — with word-level timestamps. That's the data a proper Quran subtitle needs, and it's what no transcription pipeline can produce.

Step by step: subtitle a recitation video

Upload the file. Go to ayahflow.ai, create a free account, and upload your MP4 or MOV. Audio-only files (MP3, WAV, M4A, FLAC) work too. Free accounts get 3 videos up to 5 minutes each.
Wait for detection and alignment. For a typical 1-minute clip this takes well under a minute. You'll land in the editor with every caption segment laid out on the timeline.
Review the segments. Check the detected ayah range against a mushaf. Segments split at waqf marks by default; merge or re-split them if you prefer longer or shorter captions.
Pick the translation. Choose from 16 languages — English (Saheeh International), Urdu, Indonesian, Bengali, Turkish, Hindi, Persian, French, German, Malayalam, Tamil, Chinese, Spanish, Albanian, Sindhi, Divehi — or turn the translation off for Arabic-only captions.
Style it. Arabic font (Uthmanic Hafs), sizes, colors, vertical position, background dim, fades, ayah-number medallions. The preview is identical to the final render.
Render and download. Pick 9:16, 1:1, 4:5, or 16:9. The video renders in the cloud in about a minute and downloads as an HD MP4 with the subtitles burned in.

Burned-in subtitles vs. SRT files

A common question: can I just generate an SRT subtitle file and upload it alongside the video? For Quran content, burned-in (hardcoded) subtitles win in almost every case:

	Burned-in captions	SRT / platform captions
Uthmani script & fonts	Exact mushaf rendering	Platform default font, harakat often break
Word-by-word highlighting	Possible	Not supported
Arabic + translation together	Both on screen, styled separately	One track at a time
TikTok / Reels / Shorts	Always visible	Vertical platforms ignore SRT uploads
Accessibility / search indexing	Pair with a text caption in the post	Machine-readable

If you publish long-form on YouTube, doing both is ideal: burned-in Arabic and translation for presentation, plus a clean SRT for accessibility.

If you'd rather do it manually

The manual route — copying Uthmani text from Quran.com, installing the KFGQPC font, and scrubbing the waveform to time each segment in CapCut or Premiere — still works and gives total control. Budget 2–4 hours per video. We walk through it honestly in the full recitation video guide, alongside free tools like QuranCaption.

Subtitle your recitation in about a minute

Accurate Uthmani text, word-level sync, translations in 16 languages.

Try AyahFlow Free

3 free videos · No credit card required