How we increased our rendering speeds by 70x using the WebCodecs API
We started exploring the Webcodecs API to enable faster rendering - within two releases in a single week, we were able to increase our speeds by 70x!
Revideo is an open-source typescript framework for programmatic video editing. It lets you create video templates using code and render them via an API.

Two weeks ago, rendering speeds were one of our biggest issues with Revideo. Without parallelizing your rendering jobs across serverless functions, the time it took to render a video was often too long to put into production. Luckily, a comment on our Show HN encouraged us to look closer into the Web Codecs API. Doing so, we were able to achieve up to 70x faster rendering speeds!

Alt text

What were the bottlenecks in our Rendering Pipeline?

Revideo is based on Motion Canvas, which lets people define animations using generator functions and the HTML Canvas API. Each yield in the generator function corresponds to a frame in a video - if your generator function contains 300 yields, the resulting video will be 10 seconds long (if you specify it to use 30 fps). To export a video to mp4, we loop through all yield statements in your generator function, draw the corresponding frame to an HTML Canvas and then pass it to an encoder. Previously, two parts of this pipeline took up the majority of time:

Decoding videos used in <Video/> tags: In most cases, our users use <Video/> elements inside their videos. To render these videos, we have to extract frames from the source video files to draw them to the canvas. During rendering, Motion Canvas loads videos as HTMLVideoElements, sets the .currentTime attribute to the required time and waits for the browser to seek to the correct position. This approach is quite slow, which is why in v0.4.2 of Revideo, we had already replaced this approach with our Ffmpeg Extractor. The Ffmpeg extractor used a seperate NodeJS process to extract video frames with Ffmpeg and sent them to the browser through a websocket. Despite getting a big speedup compared to the naive seeking approach, we still encountered cases where it took 100s of milliseconds to extract individual frames. Especially when the source videos were big (4K), the extractor often timed out.

Passing frames to the Ffmpeg encoder via canvas.toBlob(): In v0.4.2, we used Ffmpeg to encode the individual canvas frames to an mp4 video. Again, this Ffmpeg process ran inside a NodeJS process, and we communicated (streamed frames to it) via a websocket connection. To do so, we first needed to convert the canvas frame data to a fitting format, which is why we used .toBlob(). It turns out that the .toBlob() call took a long time - again, we had to wait up to 500ms for a single 4K frame. This clearly needed to be faster.

Speedup 1 (v0.4.3): Decode Videos using the Webcodecs API 💨

We were able to fix the issue of slow decoding by replacing our Ffmpeg frame extractor with an extractor based on the WebCodecs API (you can take a look at the PR here), concretely using the mp4 parser from mp4box.js. The Webcodecs API enables very low-level control over videos, which is why we initially kept running into edge cases we hadn't foreseen. For instance, we had to account for metadata fields that contained information about time skips within the video, and had to implement custom logic to account for these fields. Ffmpeg, on the other hand, handles things like these out of the box comfortably.

Using the new decoder, we already achieved 10x speedups - additionally, our project structure became a lot simpler, as we were able to remove the "hacky" websocket connection and move more work to the browser. Now, we wanted to see if we could even move encoding to the browser.

Speedup 2 (v0.4.4): Encode Videos with WASM 🏎️

Now that video decoding was fixed, we still faced the issue of the slow .toBlob() operation - we fixed this by encoding our videos directly in the browser using a WebAssembly implementation (Pull Request). We found that using the WebCodecs API, it was possible to create a VideoFrame object directly from a canvas element, and this frame object can be used with the WebCodecs VideoEncoder class - this would save us the long waiting times for the .toBlob() operation!

The VideoEncoder class alone lets you create a video stream - to create a full mp4 file, we still needed a muxer. Luckily, we found the mp4-wasm module, which implemented exactly this in C with WebAssembly. We implemented it, our tests worked fine, so we decided to ship it! This gave us another massive speedup.

Can we get to fully client-side rendering?

Both of the mentioned changes moved operations that were previously done in a NodeJS process with Ffmpeg to the browser. Therefore, some of our users asked us if this would enable them to do fully client-side rendering. Currently, this is not yet possible for two reasons:

Audio processing is still handled with Ffmpeg: As part of our rendering pipeline, we need to generate and merge audio tracks with the resulting video. We have not yet tried to move this into the browser, as it doesn't really affect our rendering speeds. Technically, it should however be possible to build and not present a huge hurdle.

Browser compatibility: The WebCodecs API is only fully supported in Chrome. This is not an issue for server-side rendering, as we can just use Chrome with Puppeteer regardless of our users' choice. However, if you want to render on the client device, you will run into issues if your users use Safari or Firefox, or have to force them to use Chrome. Getting support for such features takes time, but at some point it will probably happen for most browsers - in Firefox, the Webcodecs API is already supported as an experimental feature.

Last modified: Mon 1. Jul 2024