Chulho Baek

November 10, 2024

6 min read

SyncSub: Automating Subtitle Generation for Edited Videos

Background

With the explosive growth of video consumption across platforms, subtitles have evolved from an accessibility feature to a critical tool for global reach, SEO optimization, and viewer retention.
However, when videos are re-edited—cut, shortened, or rearranged—manually syncing subtitles to the new version remains a time-consuming and error-prone task.

To solve this, we developed SyncSub, a solution that automatically generates subtitles for edited videos using existing subtitle and audio data from the original version.

‍

Tech Stack

Python + ffmpeg: For media processing and audio extraction
Whisper + Audio Embedding Matching: For speech-based segment alignment
SRT Processing Pipeline: For generating synced subtitle files
AWS S3: For hosting prototype and demo interfaces
(Planned) Streamlit: For internal UI and SaaS integration

After evaluating multiple approaches (text-based, video-based), we chose the audio-based subtitle syncing approach as the most robust and scalable solution.

‍

Problem Definition

Manually syncing subtitles for re-edited content is resource-intensive
Text-based or video-based syncing often fails with noisy or unscripted content
Existing subtitle files become unusable if the video sequence changes
Media companies (CJENM, SBS) require subtitle reusability across edited content

‍

‍Solution Process

1) Evaluated Approaches

‍

2) Adopted: Audio-Based SyncSub

Extract .wav audio from original and edited videos
Use embedding models to find matching segments
Re-map original subtitle timestamps to the edited video

# Sample Code: Extract audio and compute similarity
import ffmpeg
import librosa
from sentence_transformers import SentenceTransformer

# Extract audio
ffmpeg.input('original.mp4').output('original.wav').run()

# Load audio and compute embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
audio_original, _ = librosa.load('original.wav')
audio_edited, _ = librosa.load('edited.wav')
emb_original = model.encode(audio_original)
emb_edited = model.encode(audio_edited)

# Compare embeddings to find alignment
from sklearn.metrics.pairwise import cosine_similarity
score = cosine_similarity([emb_original], [emb_edited])

‍

Results & Achievements

100x speed improvement
- Previous method: ~20–30 min for 30 min video
- New method: ~15–25 seconds
Improved accuracy
- Fine-tuned segment separation (e.g., 3-second pause) leads to more precise matches
Successfully tested on real-world content
- Applied to SBS’s 7인의 부활 Ep.5 edited cut
- Delivered production-ready subtitles using the original transcript
Web Interface & SaaS potential
- Public prototype: syncsub.s3-website
- XLSX-to-SRT utility also available

‍

Lessons Learned & Next Steps

Audio is more reliable than visuals for syncing
- Visual changes don’t necessarily alter dialogue
STT output variance makes text-based comparison unreliable
- Use text similarity only for fallback logic
Next: Multilingual subtitle syncing
- Match Korean original with multilingual subtitle sets for global reuse

‍

References & Links

SyncSub Demo: http://syncsub.s3-website.ap-northeast-2.amazonaws.com/
XLSX to SRT Converter: http://syncsub-xlsx2srt.s3-website.ap-northeast-2.amazonaws.com/
Whisper by OpenAI: https://github.com/openai/whisper
Sentence Transformers: https://www.sbert.net/

‍

SyncSub is more than a tool—it’s a scalable AI workflow that transforms how edited video subtitles are produced.
If your team reuses or edits video content regularly, this could cut subtitle costs and time by over 90%, while increasing subtitle consistency and SEO reach.

Let us know if you want to test it on your content. We're actively evolving SyncSub into a full SaaS offering.

‍

SyncSub: Automating Subtitle Generation for Edited Videos

Background

Tech Stack

Problem Definition

‍Solution Process

1) Evaluated Approaches

2) Adopted: Audio-Based SyncSub

Results & Achievements

Lessons Learned & Next Steps

References & Links

Related Blogs

Get Real-World Tech Insights. Straight to Your Inbox.

Join the LETR LABS Community — Stay Ahead with AI Content Insights

SyncSub: Automating Subtitle Generation for Edited Videos

Background

Tech Stack

Problem Definition

‍Solution Process

1) Evaluated Approaches

2) Adopted: Audio-Based SyncSub

Results & Achievements

Lessons Learned & Next Steps

References & Links

Related Blogs

Get Real-World Tech Insights. Straight to Your Inbox.