The platform that's been burned by bad Hindi subtitles doesn't need to be convinced the problem is real. What they need is a precise vocabulary for the failure modes so they can identify them in vendor samples before committing, articulate the specific problem to a new vendor so it gets fixed, and build a QA checklist that doesn't rely on having a senior native-speaker reviewer on every batch.
This post names the specific failure modes in Hindi, Tamil, and Marathi subtitle production — the ones that appear in output from generic AI tools and under-qualified vendors — and gives you a practical diagnostic for each one.
Subtitle quality failure in Indian languages falls into four categories: semantic failures (the text says something different from what was spoken), register failures (the text is accurate but sounds wrong for the content context), timing failures (the text appears at the wrong moment relative to the speech), and rendering failures (the text doesn't display correctly in the target script). Each category has language-specific manifestations in Hindi, Tamil, and Marathi, and each requires a different correction approach. A QA process that doesn't check all four categories will miss at least one class of errors in every batch.
Category 1: Semantic failures — the text says something wrong
These are the errors that get noticed immediately by any native speaker and generate the most urgent corrections. They're also the easiest for a non-native reviewer to miss entirely.
Hindi idiom mistranslation. Hindi is rich with idioms whose literal translation produces a sentence that is grammatically correct but means something entirely different from the original. "Haath khade karna" (literally "to raise hands") means to give up, not to raise one's hands physically. "Naak mein dum karna" (literally "to put breath in the nose") means to harass someone. A subtitle that translates these literally is producing factually wrong content that will confuse a native-speaking viewer and undermine trust in the subtitle quality across the entire piece. Generic AI translation models handle common idioms adequately; less frequent idioms, regional expressions, and conversational slang are where errors cluster.
Tamil false cognates and transliteration errors. Tamil has a significant number of words that share phonetic similarity with English or Hindi words but carry different meanings. Transliteration errors — where the AI renders a Tamil word based on how it sounds rather than what it means — produce sentences that are phonetically plausible but semantically wrong. Tamil also uses different registers for formal written text, spoken dialogue, and colloquial speech, and a model trained on formal Tamil text will render spoken dialogue in a register that sounds artificially elevated and unnatural to a viewer watching contemporary Tamil content.
Marathi dialect and vocabulary variation. Marathi spoken in Mumbai differs meaningfully from Marathi spoken in Pune, Nagpur, or the Vidarbha and Konkan regions. A model trained on standard written Marathi or on Pune-dialect data will misrender regional vocabulary, produce wrong transcriptions of regional pronunciations, and miss colloquial terms entirely. Marathi subtitle production at platform scale requires either a model specifically trained on the dialect distribution of the content being subtitled, or a native-speaker QA pass that catches dialect-specific errors before the output is approved.
Category 2: Register failures — the text is accurate but sounds wrong
Register failures are the most common quality complaint from content teams who have moved from poor to average vendors. The transcription is correct, the translation is accurate, but something feels wrong — the subtitle sounds like a formal document or a translation rather than something a real person said.
Formal Hindi in casual content. Standard Hindi ("Shuddh Hindi") and colloquial conversational Hindi are significantly different. A model that produces subtitle text in formal Hindi for a Hinglish-speaking reality TV cast, a casual OTT drama, or an influencer-style EdTech presenter is technically accurate and tonally wrong. Viewers watching casual content that has been subtitled in formal Hindi register it as a quality failure immediately, even if they can't name the specific problem. The correct register for Hindi subtitle output is the register the speaker uses, not the register the model was trained on.
Tamil diglossia. Tamil has one of the world's most significant diglossic divides — the gap between formal written Tamil (centamil) and spoken colloquial Tamil (kotuntamil). These are sufficiently different that a subtitle produced in the written register for spoken dialogue can be difficult to follow for a viewer reading at normal pace, because the vocabulary and grammar structures are meaningfully different. A vendor who produces Tamil subtitles in written-Tamil register for spoken-Tamil content is producing output that is correct for academic text and wrong for platform subtitle use. This is one of the most common complaints from OTT platforms subtitling Tamil content through general-purpose vendors.
Marathi honorific misuse. Marathi uses a complex system of honorifics — "tum" versus "aap" versus "tu," with various verb forms corresponding to each level of formality — and content that mixes registers between characters (for example, a character speaking respectfully to an elder while the elder responds casually) requires the subtitle to track and reproduce those relationships accurately. Models that default to a single register across all speakers in a scene produce subtitles where the social dynamics of the dialogue are flattened or reversed, which affects viewer comprehension of the relationship being depicted.
Category 3: Timing failures — the text appears at the wrong moment
Timing failures are the category most immediately visible to viewers and the least often caught in textual QA reviews, because they require watching the video rather than reading the subtitle file.
Devanagari display time calibration. Hindi and Marathi subtitles in Devanagari script typically require more display time than their English equivalents, because Devanagari characters are visually more complex than Roman letters and require more processing time for a reader scanning at normal pace. The standard for subtitle reading speed — approximately 17 characters per second maximum for broadcast, 180-200 words per minute for platform content — was developed for Roman-script languages. When a Hindi subtitle is timed to the English reading speed standard, it appears and disappears faster than a native Devanagari reader can comfortably process it. Vendors who use the same timing calibration for Hindi and English output are producing Devanagari subtitles that feel rushed to native readers.
Tamil script timing complexity. Tamil script involves complex glyph rendering with combined characters (uyirmey eluttukkal), where vowel-consonant combinations render as single visual units that differ significantly from their component characters. A subtitle timed for character count based on the raw code point count of Tamil text may display significantly less text than it appears to contain at code level, because several "characters" combine into single rendered glyphs. The visual character count that determines display time is different from the underlying code point count, and timing calibration that ignores this renders Tamil subtitles with timing that doesn't match the visual content on screen.
Code-switching timing drift. When a speaker moves between Hindi and English mid-sentence — the Hinglish pattern described in SNL115 — the subtitle output may correctly capture both language segments but time them to the wrong segment of the audio if the model's word-boundary detection is calibrated for monolingual speech. This produces subtitles where the English words appear half a second before or after the speaker says them, creating a noticeable desynchronisation that viewers register as a quality failure even when the text itself is accurate.
Category 4: Script rendering failures — the text doesn't display correctly
Devanagari conjunct consonant errors. Devanagari script uses conjunct consonants (sanyukta akshar) — combinations of two or more consonants that render as a single fused glyph rather than as sequential characters. Common conjuncts include "ksha" (क्ष), "tra" (त्र), and "jnya" (ज्ञ). A subtitle renderer that doesn't support correct Zero Width Joiner handling or doesn't have the required font glyphs available will display these conjuncts as disconnected separate characters rather than as the intended fused form. This is immediately visible to native Hindi and Marathi readers and signals that the subtitle system was not built for Devanagari output.
Tamil vowel marker rendering. Tamil uses vowel markers (uyir mei) that attach to consonant glyphs to form vowel-consonant combinations, and the correct rendering of these combinations depends on the font and rendering engine having the correct glyph data for each combination. Incorrect rendering produces Tamil text where vowel markers appear displaced, attached to the wrong consonant, or rendered as separate characters — visually broken text that impedes reading comprehension regardless of whether the underlying text data is correct.
Romanised Hindi for forced narrative. Netflix's Hindi Timed Text Style Guide specifies specific rules for Romanised Hindi forced narratives — the on-screen text labels used for text that appears in the visual frame rather than as dialogue. The guide specifies that Urdu terms that have been normalised in everyday Hindi speech should not be forced into Romanised rendering, and that English dialogue on screen should not be covered by Romanised Hindi forced narratives. Vendors unfamiliar with the Netflix Hindi style guide produce forced narrative output that fails these specifications, triggering platform rejection during QC.
The pre-engagement QA checklist
Before committing to any vendor for Hindi, Tamil, or Marathi subtitle production, run this checklist on a sample from your actual content — not a demo clip the vendor provides:
- Semantic check: Have a native speaker of the target language watch 5 minutes of subtitled content and flag any phrase where the subtitle says something materially different from what was spoken. If the rate is above 2 to 3 per 5 minutes, the vendor's model needs significant correction overhead on your content type.
- Register check: Ask the native speaker whether the subtitle language sounds like the way the speaker actually talks, or like a formal document. Any consistent register mismatch indicates a training data problem that won't self-correct.
- Timing check: Watch a 2-minute segment and count how many times a subtitle appears or disappears at a moment that feels wrong relative to the speech. More than 2 to 3 instances in 2 minutes indicates a timing calibration problem.
- Rendering check: View the subtitle on the actual playback device and platform it will appear on — not in the vendor's editor. Check specifically for conjunct consonant rendering in Devanagari and vowel marker alignment in Tamil. Rendering issues visible in the editor may differ from rendering issues visible in the delivery environment.
- Correction retention check: After corrections are returned, request a second sample of similar content and check whether the same errors recur. If the correction rate on the second sample is the same as the first, the vendor's pipeline is not incorporating corrections — each project starts from the same baseline error rate regardless of how much feedback has been provided.
What a pipeline that learns looks like
The difference between a vendor and a pipeline is whether corrections are incorporated. A vendor who receives a correction file applies those corrections to the current batch. A pipeline that's built to learn applies those corrections to improve the model's output on subsequent batches of similar content. The practical difference for a content platform producing subtitle output at volume is that the correction rate on batch six is lower than on batch one — which means the total ops cost of subtitle production goes down over time rather than remaining flat.
For a platform producing 20 hours of Hindi subtitle output per month, the difference between a 15% correction rate and a 5% correction rate — the range that separates a generic vendor from a learning pipeline — is approximately 6 hours of reviewer time per month. Over a year, that's 72 hours of reviewer time that either gets spent on corrections or gets redirected to higher-value work. ButterCut's Indic subtitle pipeline is built around exactly this compounding: client vocabulary, brand guidelines, and correction patterns are maintained and applied to every subsequent batch rather than being discarded between engagements.
FAQ
How many errors per minute is acceptable in Hindi subtitle output?
For production use on a content platform, a semantic error rate above 1 per minute of content is too high for publication without correction. A timing error visible to a casual viewer more than once per 2 minutes indicates a calibration problem. Rendering errors should be zero on a correctly configured delivery pipeline — any visible Devanagari or Tamil script rendering failure is a platform-configuration issue that needs to be fixed before the content goes live.
How do I tell if a Tamil subtitle is in the wrong register?
Ask a Tamil-speaking reviewer to read the subtitle alongside the audio and note any phrase where the subtitle uses a word or grammatical structure that the speaker didn't use. In Tamil, the most common tell is verb forms — spoken Tamil uses specific colloquial verb conjugations that written Tamil replaces with formal equivalents. If the subtitle consistently uses the formal form when the speaker uses the colloquial form, the model is in the wrong register.
Can timing calibration for Devanagari be fixed after the fact?
Yes, but it requires retiming the entire subtitle file rather than editing individual lines. If a vendor's output is consistently displaying Devanagari text faster than the reading speed standard for Devanagari, the fix is a system-level timing adjustment, not line-by-line editing. A vendor who doesn't support this adjustment cannot fix timing calibration errors on an existing file.
What should a brand vocabulary glossary include for Hindi subtitle production?
At minimum: all proper names (characters, locations, brand names) with their preferred spelling and script, recurring technical or domain-specific terms with their preferred translation, and any brand-specific phrases or slogans with the approved Hindi rendering. A glossary that's maintained across all projects with a vendor, rather than rebuilt each time, is what prevents the terminology inconsistency that accumulates across a content library over time.
Hindi, Tamil, and Marathi subtitle quality failures fall into four categories: semantic errors where the text says something different from what was spoken, register errors where the text sounds wrong for the content context, timing errors where the text appears at the wrong moment, and rendering errors where the Devanagari or Tamil script doesn't display correctly. A QA process that only catches semantic errors — the most visible kind — is missing the three categories that are harder to spot and just as damaging to viewer experience. The vendor whose correction rate doesn't improve between the first and fifth batch is not a pipeline. They're a vendor resetting your quality baseline on every project.
If your platform is managing Indic subtitle quality through a correction cycle that doesn't improve over time, ButterCut is built to change that — a managed pipeline that incorporates your corrections, maintains your brand vocabulary, and produces less rework with each successive batch. Book a free demo to run the pre-engagement checklist on your own content.
Sources
- Netflix Partner Help Center, Hindi Timed Text Style Guide — forced narrative rules, Romanised Hindi specifications, Urdu term handling
- GhostCut, Subtitle Generator for Marathi Video — rich consonant clusters, retroflex sounds, conjunct consonants, regional dialect mixing in Marathi ASR
- Caller Digital, India's Voice AI Accuracy Problem — code-switching timing and word boundary detection failures in Indian language ASR
- Auto Interview AI, Vernacular AI Voice Agents India 2026 — dialect variation: Mumbai Hindi vs UP Hindi, Chennai Tamil vs Madurai Tamil
- Amberscript, Quality Assurance in Subtitling — reading speed standards, CPS limits for subtitle QA

