Subtitle Translation Service: What It Includes & Pitfalls

Most content managers who need subtitle translation know they need something more than captions. What they're often less clear on is how many distinct steps that involves, how many of them a low-cost vendor will skip, and how the failures from skipped steps show up in ways that are invisible in a quote but obvious to a viewer.

A subtitle translation service converts the spoken audio of a video from one language into timed, readable text displayed on screen in a different language. It works by combining source transcription, translation adapted for on-screen reading constraints, re-timing for the target language, QA by a native speaker, and format delivery. It is most commonly used by EdTech platforms, OTT distributors, corporate L&D teams, and any organisation distributing video content across language markets where viewers need to read rather than hear the content.

What it actually includes — step by step

A full subtitle translation service covers six distinct stages. The price difference between a good vendor and a cheap one almost always comes down to which of these six are genuinely included and which are treated as optional or skipped entirely.

Source transcription is the first step: converting the spoken audio into accurate text in the original language. Everything downstream is built on this. An error here — a misheard product name, a wrong number, a garbled technical term — compounds through every subsequent stage. Good vendors check the source transcript against the audio before the translation step begins. Cheaper vendors move straight from automated transcription to translation without this check.

Translation with condensation is where subtitle translation diverges most sharply from document translation. Spoken speech runs at 100 to 150 words per minute. Subtitle translation for video content has to fit within the on-screen display time for each line while remaining readable at a standard pace of 180 to 200 words per minute. A word-for-word translation of a sentence that runs 12 seconds in English may need to be condensed to communicate the same meaning in 8 seconds in Hindi — because Hindi written text is longer than English. A translator doing document translation doesn't think in these constraints; a specialist subtitle translator does.

Cultural adaptation goes beyond vocabulary. Idioms, measurements, currency references, cultural touchpoints, and humour that work in the source language frequently don't map directly into the target. A subtitle that translates a British idiom literally into Tamil is technically accurate and practically confusing. Cultural adaptation means rewriting these moments so they land in the target language, not just converting them word for word.

Re-timing is necessary because a translated subtitle takes a different amount of time to read than the original. A line timed to 2.5 seconds of English audio may need 3.2 seconds as Marathi text, because Devanagari characters take more horizontal space and require more display time. Re-timing adjusts each subtitle's in-point and out-point for the target language, not the source. This step is skipped by a meaningful proportion of budget vendors, producing subtitles that appear and disappear at the wrong moment — viewers register it as something being "off" even when they can't name it.

Native-speaker QA is a review pass by someone fluent in the target language who was not involved in the translation. This catches what the translator normalises: unnatural phrasing that's grammatically correct but wouldn't be said that way, terminology inconsistency across the video, register mismatches between formal and colloquial speech, and errors in proper nouns. For Indic language subtitle translation, this step is particularly critical — the gap between what a general translation model produces and what a native Tamil or Telugu speaker would actually write is wide enough to undermine the viewer's trust in the content entirely.

Format delivery is the final output: SRT for most video platforms, VTT for web delivery, TTML or EBU-STL for broadcast and OTT, embedded captions burned into the video file for platforms that don't accept separate subtitle tracks. Not every vendor delivers all formats, and some charge separately for format conversions that should be standard.

The failure modes that only appear in the output

These are the problems buyers discover after the work has been delivered, not before they commission it. Knowing them in advance changes what you ask for in a brief.

Timing drift: subtitles that appear half a second before or after the corresponding speech. The most common cause is re-timing not being done after translation. Viewers register this immediately even when they can't articulate it, and it makes the content feel lower quality regardless of how accurate the translation is.
Overly literal translation: text that is grammatically correct but reads as translated. The specific tell is unnatural word order, formal register used in casual conversational content, or idioms rendered literally. Indian EdTech content delivered in stiff, formal Hindi sounds like an official government document — learners disengage.
Reading speed violations: too many characters on screen for the display time available. The broadcast standard maximum is approximately 17 characters per second. Above this, viewers can't read the subtitle and watch the video simultaneously, so they choose one. This is especially common in translations from English into Indic languages, which run longer in text form.
Terminology inconsistency: the same product name, technical term, or concept translated differently at different points in the video. Usually happens when no glossary was maintained across the project. For OTT series and course content spanning many episodes, terminology drift becomes noticeable and erodes the professional quality of the output.
Code-switching errors: for Hinglish and Indian content where speakers mix Hindi and English mid-sentence, a vendor without specific Indic language capability will often mistranscribe or mistranslate the English fragments within a Hindi sentence — because their model treats the code-switch as an error rather than normal speech.

Pricing structures and what they signal

Subtitle translation is most commonly priced per minute of finished video, since the work is anchored to video length rather than word count alone. Globally, rates for video subtitle translation run from $5 to $25 per minute depending on language pair, quality tier, and whether the service includes human QA. In India, managed subtitle translation services for common Indic language pairs run roughly ₹500 to ₹1,200 per minute for a service that includes re-timing and native-speaker QA.

A quote that comes in significantly below this range almost always signals that one or more of the six steps described above is missing or automated without human review. The cost of correcting poor output — re-doing re-timing, running a second translation pass, or starting over with a better vendor — typically exceeds the saving on the original quote.

Approach	Quality level	Typical India cost per minute	Re-timing included	Native-speaker QA
Generic AI tool (self-serve)	Variable; often poor on Indic content	Near zero	Rarely	No
Freelancer (India)	Variable by individual	₹150 to ₹500	Sometimes	Rarely
Mid-tier managed service	Consistent; QA included	₹500 to ₹1,200	Yes	Yes
AI-native managed pipeline (Indic-specialist)	High on trained languages; fast iteration	Competitive at volume	Yes	Yes — native-speaker QA built in

What to ask any vendor before committing

Do you re-time subtitles for the target language after translation, or do you use the source language timing?
Who performs QA — a native speaker of the target language who wasn't involved in the translation, or an automated grammar check?
Do you maintain a glossary for brand and product terminology across the project?
What file formats are included in the quoted price, and which cost extra?
Can you provide a short sample translated and re-timed from our actual content before we commit to the full project?

Where it works and where it doesn't

Where it works

EdTech course content distributed to regional Indian language markets where learner comprehension depends directly on reading accuracy and natural phrasing
OTT content across Indian languages where viewer experience — and platform reputation — is tied to subtitle quality across every episode
Corporate training and compliance content where a mistranslated instruction has real operational consequences

Where it doesn't

Same-language captioning — if you only need subtitles in the original language, you need a subtitling service without translation, which is a meaningfully different and cheaper scope
Very short one-off videos where the vendor briefing overhead doesn't justify a managed engagement
Content where the video quality itself is so poor that subtitle accuracy won't materially affect the viewer experience

FAQ

Is subtitle translation the same as video translation?

Subtitle translation is one form of video translation. The other main form is dubbing, which replaces the original audio. Subtitle translation is generally faster and cheaper and preserves the original speaker's voice, which matters for content where the speaker's identity is part of the value.

What's the difference between a subtitle translator and a document translator?

Subtitle translators work within on-screen time and reading speed constraints that document translators don't encounter. Condensation, re-timing, and character-per-second limits are specific to subtitle work — a document translator doing subtitle work without this training produces technically correct text that doesn't work on screen.

How do I know if a vendor handles Indic language subtitle translation well?

Ask for a sample from your actual content in the target language before committing. Specifically check: does the timing feel right, does the phrasing sound like a native speaker, and does the translator handle code-switching correctly if your content mixes Hindi and English? Those three things reveal more about capability than any credentials list.

Should I provide a source script to the vendor?

Yes, where you have one. A clean source transcript reduces transcription error risk, shortens turnaround, and typically reduces cost. If you don't have a script, most vendors will transcribe from audio — factor in additional time and a slightly higher rate for this step.

A subtitle translation service includes six steps: source transcription, translation with condensation for reading speed, cultural adaptation, re-timing for the target language, native-speaker QA, and format delivery. The failure modes that matter — timing drift, literal translation, reading speed violations, and terminology inconsistency — are invisible in a quote and only appear in the finished output. A vendor who skips re-timing and native-speaker QA is not offering a cheaper version of the same service. They are offering a different and significantly inferior service at a lower price.

If you are distributing video content across Indian languages and want to see what a managed subtitle translation pipeline built for Indic accuracy looks like on your actual content, ButterCut handles re-timing, native-speaker QA, and Indic language depth as standard. Book a free demo to compare on your own material.

What a Subtitle Translation Service Actually Includes (And What to Watch Out For)

The six steps a full subtitle translation service covers, the failure modes hidden in low quotes, and the five questions to ask any vendor before you commit.