Download YouTube Subtitles as SRT or VTT — Professional Timing, Free
Downloading YouTube subtitles sounds simple. But open the SRT file from any basic subtitle downloader in Premiere Pro or DaVinci Resolve and you immediately see the problem: hundreds of two-second blocks, text flickering on and off before anyone can read it. YouTube's auto-caption system creates subtitle entries every 2–4 seconds, optimized for caption display during live playback — not for editors importing subtitle tracks.
INDXR.AI resegments the output before you download. The result follows broadcast subtitle standards: 3–7 seconds per block, maximum 42 characters per line, no mid-sentence cuts. Import it into your editor and it's clean enough to use without manual cleanup.
The Problem with Raw YouTube Subtitle Files
YouTube generates captions at the granularity of its speech recognition — short bursts of 2–4 seconds, usually 5–15 words each. This produces SRT files like:
1
00:00:02,000 --> 00:00:04,200
so one of the most important things
2
00:00:04,200 --> 00:00:06,100
to understand about this topic
3
00:00:06,100 --> 00:00:08,400
is that it changes depending onThree subtitle blocks for one sentence. In a video player, the rapid switching is visually jarring. In a video editor, it creates a cluttered timeline and requires manual merging before the file is usable.
Professional subtitle standards (BBC Subtitle Guidelines, Netflix Timed Text Style Guide, EBU Tech 3264) call for blocks of 3–7 seconds, a maximum of two lines, and 42 characters per line. These standards exist because human readers need time to read and comprehend text before it disappears.
What INDXR.AI Exports
After resegmentation, the same transcript looks like:
1
00:00:02,000 --> 00:00:08,400
So one of the most important things
to understand about this topicOne block. Complete sentence. Readable. Ready to import.
The resegmentation algorithm respects sentence boundaries — it doesn't merge segments across full stops or question marks. A sentence that ends at 5.2 seconds won't be forced into the previous block just to hit a duration target.
The resegmentation strategy depends on the transcript source. For AI Transcription (AssemblyAI), segments are merged until a sentence boundary is detected — a block closes on a period, question mark, or exclamation point, producing semantically complete subtitle units of 3–7 seconds. For auto-captions, which have no punctuation, a time-based merge is used instead: segments accumulate until the block reaches 3 seconds. Both approaches are a significant improvement over raw 2-second YouTube segments, but AI Transcription produces cleaner sentence-aligned blocks.
VTT output follows the same resegmentation and adds a header comment with the video title and language — useful for LMS platforms (Canvas, Moodle, Articulate 360) that use the header to associate subtitle files with source content.
UTF-8 BOM encoding is included by default for both SRT and VTT. This matters for editors and systems that may misinterpret UTF-8 text without the BOM — particularly for non-Latin script content.
When Auto-Captions Don't Exist
About 20% of YouTube videos have no auto-generated captions — non-English content YouTube hasn't processed, videos from smaller creators, older uploads, content with poor audio quality (YouTube Help). Basic subtitle downloaders return empty files or errors for these videos.
INDXR.AI detects this upfront and offers AI Transcription as a fallback. Enable the toggle, confirm the credit cost (1 credit per minute), and the audio is transcribed by AssemblyAI Universal-3 Pro. The resulting SRT/VTT is higher quality than auto-caption output — proper punctuation, accurate word boundaries, and clean segment timing.
For audio files you've already downloaded, the Audio Upload tab accepts MP3, MP4, WAV, M4A, OGG, FLAC, and WEBM files up to 500MB and produces the same resegmented SRT/VTT output.
Compatibility with Video Editors
All major non-linear editors import SRT directly:
- DaVinci Resolve: File → Import → Subtitles. Supports SRT for timeline caption tracks.
- Premiere Pro: Captions workspace → Import captions from file. SRT imports as a caption track.
- Final Cut Pro: Import → Captions. Supports SRT with CEA-608 compatibility.
- CapCut: Captions → Import. SRT and VTT both accepted.
- Kdenlive: Project → Add Clip → subtitle file.
VTT is the correct format for HTML5 <video> elements and web-based players that don't accept SRT natively.
LMS platforms that require VTT: Canvas, Moodle, Articulate 360, and most SCORM-compliant platforms accept VTT for accessibility compliance. INDXR.AI exports both formats from the same extraction.
Pricing
SRT and VTT export from auto-caption YouTube videos is free — no account required for a single extraction, no credit cost for the export itself. For videos without auto-captions, AI Transcription costs 1 credit per minute (a 30-minute video: 30 credits, approximately €0.36 at Plus pricing).
There is no additional charge for the SRT or VTT export format itself — all export formats are included at no credit cost once you have a transcript. See the pricing page for credit packages.
For videos where no captions are available at all, see YouTube Transcript Not Available for a complete troubleshooting guide, or how INDXR.AI works for a full overview of the extraction and resegmentation pipeline.
Frequently Asked Questions
- Why don't raw YouTube SRT files work well in video editors?
- YouTube creates subtitle entries every 2–4 seconds for display synchronization during playback. Editors need longer segments — 3–7 seconds — for readable on-screen text. The difference is between a subtitle file designed for watching and one designed for editing. INDXR.AI resegments to the editing standard.
- What's the maximum characters per line in INDXR.AI's SRT output?
- 42 characters per line, maximum two lines per block — the broadcast industry standard recommended by the BBC Subtitle Guidelines and Netflix Timed Text Style Guide. Lines that would exceed 42 characters are wrapped to a second line rather than truncated.
- Can I get both SRT and VTT from the same extraction?
- Yes. After extracting a transcript, you can export in any format from the export menu — SRT, VTT, and all other formats are available without re-extracting.
- Does the SRT output work for videos in non-Latin scripts?
- Yes. UTF-8 BOM encoding handles Arabic, Chinese, Japanese, Korean, Hebrew, and other scripts correctly in both SRT and VTT. The encoding is set automatically — no manual configuration required.
- What if the video has no auto-captions?
- INDXR.AI detects this before extraction and offers AI Transcription as a fallback. Enable it, confirm the credit cost for that video's length, and the audio is transcribed. The resulting SRT and VTT follow the same resegmentation as caption-based exports.



