About This Tool

How to Use
  1. Upload a video or audio file (MP4, WebM, MP3, WAV, OGG, M4A)
  2. Choose a model — Whisper Tiny is faster, Whisper Base is more accurate
  3. Click 'Load Model & Transcribe' — the model downloads once and is cached
  4. Wait for transcription to complete (time depends on file length and device speed)
  5. Review and edit the generated subtitle entries if needed
  6. Click a subtitle to seek the video preview to that moment
  7. Download as SRT or VTT for use in video editors or players
Common Use Cases
  • Adding accessibility captions to your videos
  • Creating subtitles for YouTube, Vimeo, or social media content
  • Transcribing interviews, podcasts, or meeting recordings
  • Generating subtitle files for offline video players
  • Quickly producing a first-draft transcript for editing
Tips & Tricks
  • The AI model (~40–75MB) is downloaded once and cached in your browser for future visits
  • Whisper Tiny is recommended for quick results; Whisper Base gives noticeably better accuracy
  • Audio quality greatly affects accuracy — reduce background noise before transcribing
  • Files over 100MB may take a long time; consider trimming the video first
  • You can edit any subtitle text, start time, or end time directly in the list below the video

You might also like

Audio Noise Remover

Clean up noisy audio recordings directly in your browser using Web Audio API filtering. Apply high-pass and low-pass filters, a dynamics compressor, and a noise gate to remove hum, hiss, and background noise from voice recordings, podcasts, and more.

Video Trimmer

Trim a video to a specific start and end time directly in your browser using the MediaRecorder API. Upload an MP4 or WebM, define your clip, and download the trimmed result.

Screen Recorder

Record your screen directly in the browser using the Screen Capture API. Optionally overlay your webcam, pause and resume, then download the recording as a WebM file.

Speech Studio

Convert text to speech with voice selection and speed controls, or transcribe speech to text in real-time. Uses built-in browser APIs — zero dependencies, total privacy.