Engineering · Dev Diary

How we taught Chrome to export video

Apr 19, 2026 · Miguel Ángel

The browser on your laptop renders three billion pixels a second. It composites video and CSS and WebGL and fonts and shadows with sub-millimeter precision, sixty times a second, on every tab you've ever opened. We've spent thirty years using it to sell shoes. It is the most capable rendering engine humans have ever built, and until this year nobody had seriously asked it for a file you could keep.

htmlthe smallest HyperFrames composition that still counts

<div
  data-composition-id="hello"
  data-start="0"
  data-duration="3"
  data-width="1920"
  data-height="1080"
>
  <h1>hello, video.</h1>
</div>

Seven lines of HTML. Three seconds of video. It renders to an MP4 you can upload to Instagram. No editor, no timeline UI, no proprietary format. Just the web, coerced into standing still.

composition/compositions/ignite/index.html

loading hyperframes player…

played live via @hyperframes/player · tap the scrubber

That little scene above isn't a GIF, and it isn't a baked MP4. It's the exact same HTML file you'd feed the producer — being interpreted, right now, by a 4 KB web component that boots the same runtime our render pipeline does. Edit the file, refresh, see it move. Render it, upload it, the frames are identical to what you just watched. No parallel universes.

Browsers are sloppy timekeepers. They paint when they feel like it, drop frames under load, and time drifts by microseconds every refresh. Fine for a UI. Catastrophic for video.

Nine hundred frames at 30 fps have to land on exactly the right tick. Every time you press render. On every machine. The browser is sloppy in precisely the way video cannot tolerate. So we stole the clock.

Chrome has a piece of its DevTools Protocol called BeginFrame. It isn't in any docs you'd bump into. It's the internal seam the compositor uses to keep time, left exposed for a handful of us who ask politely. BeginFrame lets you pry the clock out of Chrome's hands and drive it yourself. "Pretend 33.33 milliseconds has passed. Run every animation to that instant. Paint the frame. Hand me the pixels." Then again. And again. Nine hundred times in a row, at whatever speed your CPU can keep up with. The browser renders in wall time. We render in frame time. They have nothing to do with each other anymore.

tsthe entire runtime contract, in two lines

window.__player.seek(time)       // put the world at t seconds
window.__player.renderReady      // the frame at t is ready to capture

That is the whole surface. Everything else — a GSAP timeline, a React component tree, a WebGL shader, a Lottie file, a <video> element holding a 4K clip — negotiates through seek(t). If your animation library can sit still at an arbitrary moment, it works with HyperFrames. If it can't, we adapt it, or we use GSAP, which has been obsessively, almost religiously correct about seeking for two decades.

Once time is a pure function, the render is embarrassingly parallel. A sixty-second composition at 30 frames per second is 1,800 independent seeks. No frame depends on the one before it. So we spawn six Chrome processes, give each a slice of the timeline, and let them race. A streaming encoder eats the frames as they land. What used to be a coffee break is now a shrug. I still catch myself grinning every time a render that used to take eleven minutes finishes in forty seconds.

tsparallelCoordinator.ts — worker count auto-tunes to your CPU

const workers = calculateOptimalWorkers(totalFrames, requested)
const chunks  = splitFramesAcrossWorkers(totalFrames, workers)

await Promise.all(
  chunks.map((slice, workerId) =>
    captureSlice({ workerId, ...slice, serverUrl, outputDir })
  )
)

And then the part I actually care about. Claude, Cursor, Gemini, Codex — the most advanced pattern matchers the species has ever built — have read a hundred billion lines of HTML. None of them have read After Effects. None of them ever will. So when you ask an agent for a ten-second product intro with a fade-in title and background music, a real one, ready to render, it can finally reply in a language it speaks natively. No new format. No new DSL. No new API. Just HTML, written by a model that has always known HTML, turned into video by a pipeline that takes HTML at its word.

I'm not going to tell you this is finished. WebGPU shader transitions work on Chromium and nowhere else. Lottie adapters still wobble when you scrub fast. A font swap can ruin your week. We are a small team with a to-do list that grows faster than we do. But the bet is proving out. Every Monday I ship another piece on top of this engine, and every Monday I'm surprised the web was already the best video stack nobody had bothered to finish. HyperFrames just wrote the last mile.

If you use Claude Code, Cursor, Gemini CLI, or Codex, one command teaches your agent the whole framework: the composition grammar, the runtime contract, GSAP house style, the rules that keep renders deterministic. Paste this into a terminal and your assistant starts speaking HyperFrames:

bashone command, every agent learns the framework

npx skills add heygen-com/hyperframes

Go write a video. Write a dumb one. Write the one you couldn't be bothered to open After Effects for. Then send it to me.

read the source →← all writing