The WebVTT format, explained

WebVTT (Web Video Text Tracks) is the W3C standard for captions on the web. It is the only subtitle format the HTML5 <track> element accepts, and it can do far more than SubRip.

Reference · updated June 2026

Unlike SRT, WebVTT has a real specification — a living standard maintained by the W3C. It was designed for HTML5 video, so it is the format you must use with the <track> element, and it underpins captioning in Video.js, Plyr, hls.js and most browser-based players. It keeps SRT's simple cue model but adds positioning, styling, comments, named cues, voices, chapters and metadata.

The header

Every WebVTT file begins with the signature line:

WEBVTT
WEBVTT - Optional title or description after a hyphen

The word WEBVTT must be the first thing in the file (optionally after a UTF-8 BOM). A file that doesn't start with it is not valid WebVTT — which is exactly why renaming a .srt to .vtt doesn't work, and why the SRT to VTT converter adds this line.

The cue

intro                              ← optional cue identifier
00:01:14.800 --> 00:01:17.200 line:90% align:center   ← time + settings
<v Alex>Are you there?</v>          ← payload, with a voice tag

The differences from SRT are deliberate and important:

  • Period decimal separator00:01:14.800, not a comma. This is mandated by the spec.
  • Optional hours — the hour field may be omitted for short content: 01:14.800 is valid.
  • Cue identifier — an optional label on the line before the timestamp. It can be a name (a hook for styling) rather than a number; numbers are optional in VTT.
  • Cue settings — everything after the timestamp positions the cue (see below).
  • UTF-8 is mandatory — WebVTT is always UTF-8, full stop.

Cue settings: positioning

The space-separated tokens after the timestamp control where and how a cue is drawn:

  • line: — vertical position (a line number or percentage; line:0 is the top).
  • position: — horizontal position as a percentage.
  • size: — the width of the cue box as a percentage.
  • align: — text alignment: start, center, end, left, right.
  • vertical: — vertical writing mode (rl / lr) for languages like Japanese.
  • region: — attaches the cue to a named REGION block for scrolling roll-up captions.

None of this has an SRT equivalent, so converting VTT to SRT necessarily drops cue settings — our converter tells you which cues it affected rather than discarding them silently.

STYLE, NOTE and REGION blocks

Between cues, WebVTT allows three kinds of block:

STYLE
::cue { color: #fff; background: rgba(0,0,0,.6); }
::cue(.warn) { color: #f2c100; }

NOTE This is a comment. It never appears on screen.

REGION
id:speaker width:40% lines:3 scroll:up
  • STYLE — embeds CSS using the ::cue pseudo-element, so you can style captions directly in the file.
  • NOTE — a comment, ignored by the player; useful for translator notes and timing markers.
  • REGION — defines a scrolling area for roll-up captions, common in live and broadcast-style output.

Inline payload tags

Inside a cue, WebVTT supports a richer tag set than SRT:

  • <b> <i> <u> — the familiar bold/italic/underline.
  • <v Speaker> — a voice tag identifying who is speaking (also used for styling and accessibility).
  • <c.classname> — a class span, targetable from a STYLE block or external CSS.
  • <ruby> / <rt> — ruby annotations for East Asian typography.
  • Timestamp tags — inline <00:01:15.500> markers that reveal text word-by-word, the basis of karaoke-style captions.

Because cue text is parsed as markup, literal & and < characters must be escaped as &amp; and &lt; — another thing the SRT→VTT converter handles for you.

Chapters and metadata

WebVTT isn't only for captions. A <track kind="chapters"> uses the same cue syntax to define chapter markers, and kind="metadata" carries arbitrary timed data (cue points, ad markers, lyrics) that JavaScript reads but the browser never renders. The single, simple cue model is reused across all of these roles.

Serving WebVTT correctly

A valid file can still fail to load for two server-side reasons worth knowing:

  • MIME type — the file must be served as text/vtt. A misconfigured server sending text/plain will be ignored by some browsers.
  • Same-origin / CORS — the <track> element treats subtitles as a cross-origin resource; the file must come from the same origin or be sent with the right CORS headers.

If your converted file is valid but the player ignores it, the problem is almost always one of these — not the subtitles themselves. For everything else, the SRT vs VTT comparison lays out when to use which format, and the editor opens and exports both.

Tools for this