Engineering
Seeking faster video rendering with HTMLVideoElement
How we got rid of the bottleneck in our video rendering pipeline.
đĄ
Revideo is a typescript framework to create videos with
code. People use Revideo to create Tiktoks/ Youtube Shorts,
ads or demo videos programmatically. Revideo is MIT licensed
and will stay that way. We plan to monetize it down the line
by building additional tooling around the open-source
framework.
Weâre building Revideo on top of Motion Canvas, an animation
library, which in turn wraps the canvas API of the browser. This
gives us a strong foundation to build on top of but also comes
with its downsides. While the canvas API is a very powerful way
to compose different elements into a scene, there are some
things that canvas is not meant for, which we need to work
around.
When making videos with Revideo, a common use-case is to take an
already existing video file, and lay it down as part of a scene.
The way this works is through an HTMLVideoElement which can then
fairly easily be projected onto the canvas. This is highly
optimized and works great for normal playback, which we use when
previewing a scene. When rendering the final videos, however, we
noticed an increasing render-time per frame as the render
progressed. Few seconds into the render we went from 30ms per
frame all the way up to 700ms!
Charting the time taken per frame we can see just how regular
the increase in time per frame is over time. This is the
render-time per frame for a video that is 30s long in which we
show a Minecraft jump-and-run video without any additional
objects inside the scene.
To understand where this lag comes from, we dove into the code
to identify what weâre spending all this time on. Looking at the
flame chart of the render process, we can see that Javascript is
not actually doing that much between frames. This told us that
what weâre looking for is an IO process which taking all this
time to wrap up, so JS can go back to work.
Digging deeper into the call stack, we identified the culprit.
Unlike playback inside the web-preview, we donât just play the
video back when rendering to an mp4. Some frames might take
longer to process than others, and for simple scenes without
video-in-video, the process usually even surpasses real-time. In
order to keep the video lined up with the rest of the scene, we
instead set the current time of the HTMLVideo tag for each
frame. An operation that is incredibly slow as it seeks through
the entire mp4 file for every (!!!) frame, making it more and
more costly as the video goes on. Bingo!
After trying different settings for our video tag, we were left
stumped. The browser seems to discard all of its prior decoding
work every time the currentTime value is reassigned. This
obviously doesnât need to be the case. After all, stepping
through a video frame by frame is what video encodings are
optimized for.
We needed a fix to speed things up. For this we came up with two options:
-
We could try to find a way to seek through the video faster.
-
Or we could try to find a way to render the video without
seeking.
We were a little scared of the first option as we didnât want to
introduce the additional complexity and stack into our project.
We much preferred running with something we know well already.
The solution we landed on was a lot simpler. From our
vite-nodejs-backend, we spawn an ffmpeg child process (ffmpeg is
already a dependency of the project anyway), and pipe all the
frames we need, at the appropriate framerate, back into the
node-process. From there, we send it through a web-socket
connection to the client, populate image tags with the data, and
then project those images onto the canvas.
The biggest downside to this approach is that we donât know how
many frames of the video we will need ahead of rendering,
leading to some frames being processed unnecessarily. The way we
deal with this is by only spawning an ffmpeg process for 10s of
video at a time (or less if the source video ends). This still
means that we need to re-seek every time we spawn a new ffmpeg
process every 10s, but this is already streets ahead of anything
we had before. Benchmarking this approach against the old
version shows just how much weâve sped up the process.
In these 30s of video, we can clearly see the two spikes when
the ffmpeg process needs to restart. Not perfect but itâll do
for now. Rendering this particular video went from 6 min, down
to real-time, 30 seconds! We were happy with our result and
called it a day.
If you have a better idea for how to solve this problem, we
would love to hear it!
Last modified: Tue 16. Apr 2024