Revideo

Engineering

Seeking faster video rendering with HTMLVideoElement

How we got rid of the bottleneck in our video rendering pipeline.

💡

Revideo is a typescript framework to create videos with code. People use Revideo to create Tiktoks/ Youtube Shorts, ads or demo videos programmatically. Revideo is MIT licensed and will stay that way. We plan to monetize it down the line by building additional tooling around the open-source framework.

We’re building Revideo on top of Motion Canvas, an animation library, which in turn wraps the canvas API of the browser. This gives us a strong foundation to build on top of but also comes with its downsides. While the canvas API is a very powerful way to compose different elements into a scene, there are some things that canvas is not meant for, which we need to work around.

When making videos with Revideo, a common use-case is to take an already existing video file, and lay it down as part of a scene. The way this works is through an HTMLVideoElement which can then fairly easily be projected onto the canvas. This is highly optimized and works great for normal playback, which we use when previewing a scene. When rendering the final videos, however, we noticed an increasing render-time per frame as the render progressed. Few seconds into the render we went from 30ms per frame all the way up to 700ms!

Charting the time taken per frame we can see just how regular the increase in time per frame is over time. This is the render-time per frame for a video that is 30s long in which we show a Minecraft jump-and-run video without any additional objects inside the scene.

Chart showing the render time per frame in ms.

To understand where this lag comes from, we dove into the code to identify what we’re spending all this time on. Looking at the flame chart of the render process, we can see that Javascript is not actually doing that much between frames. This told us that what we’re looking for is an IO process which taking all this time to wrap up, so JS can go back to work.

Chrome debugging flame-chart showing a large pause between JS invocations.

Digging deeper into the call stack, we identified the culprit. Unlike playback inside the web-preview, we don’t just play the video back when rendering to an mp4. Some frames might take longer to process than others, and for simple scenes without video-in-video, the process usually even surpasses real-time. In order to keep the video lined up with the rest of the scene, we instead set the current time of the HTMLVideo tag for each frame. An operation that is incredibly slow as it seeks through the entire mp4 file for every (!!!) frame, making it more and more costly as the video goes on. Bingo!

After trying different settings for our video tag, we were left stumped. The browser seems to discard all of its prior decoding work every time the currentTime value is reassigned. This obviously doesn’t need to be the case. After all, stepping through a video frame by frame is what video encodings are optimized for.

We needed a fix to speed things up. For this we came up with two options:

We could try to find a way to seek through the video faster.
Or we could try to find a way to render the video without seeking.

We were a little scared of the first option as we didn’t want to introduce the additional complexity and stack into our project. We much preferred running with something we know well already.

The solution we landed on was a lot simpler. From our vite-nodejs-backend, we spawn an ffmpeg child process (ffmpeg is already a dependency of the project anyway), and pipe all the frames we need, at the appropriate framerate, back into the node-process. From there, we send it through a web-socket connection to the client, populate image tags with the data, and then project those images onto the canvas.

The biggest downside to this approach is that we don’t know how many frames of the video we will need ahead of rendering, leading to some frames being processed unnecessarily. The way we deal with this is by only spawning an ffmpeg process for 10s of video at a time (or less if the source video ends). This still means that we need to re-seek every time we spawn a new ffmpeg process every 10s, but this is already streets ahead of anything we had before. Benchmarking this approach against the old version shows just how much we’ve sped up the process.

In these 30s of video, we can clearly see the two spikes when the ffmpeg process needs to restart. Not perfect but it’ll do for now. Rendering this particular video went from 6 min, down to real-time, 30 seconds! We were happy with our result and called it a day.

If you have a better idea for how to solve this problem, we would love to hear it!

If you’re interested in the code, feel free to check out our Github repository https://github.com/redotvideo/revideo and consider giving it a ⭐️!

Last modified: Tue 16. Apr 2024