Hyperlapse – First person videos finally become watchable! 3


We present a method for converting first-person videos, for example, captured with a helmet camera during activities such as rock climbing or bicycling, into hyperlapse videos: time-lapse videos with a smoothly moving camera.

At high speed-up rates, simple frame sub-sampling coupled with existing video stabilization methods does not work, because the erratic camera shake present in first-person videos is amplified by the speed-up.

Source: research.microsoft.com

Ryan Seifert‘s insight:

We have all seen the helmet videos from skydivers (if you haven’t, Jeb Corliss has one of the best) more recently are the emergence of helmet cams for bicyclist, surfers, and even pets! I have even spotted helmet cameras on my jogs around my relatively mundane neighborhood. Normally these videos are watched at an increased speed (who wants to watch a 45 minute ride for the 30 seconds of action) but the speed change is painful to view. Hyperlapse is a newly created method to stabilize and smooth out these videos.

Johannes Koph, Michael Cohen, and Richard Szeliski developed the new method to generate the smoother video. The process (See Technical Video below) is substantially more complicated than the familiar stabilizer functionality commonly used. The new system consists of three stages, Scene Reconstruction, Path Planning, and Image-based Rendering. Scene reconstruction allows the system to build a 3D model of view, leveraging multiple frames from the video to do so. This provides the system the ability to actually change the viewpoint in the resulting rendering, moving from an abrupt viewpoint change to a smoother option. This is one of the key properties that allow the system to generate the silky smooth resulting videos. Path Planning is split into two stages, the first optimizes for smooth transitions, length, and approximation (the path should be near the input frames). The second stage optimizes for rendering quality. The resulting path can be slightly different than the path actually taken by the camera person (or pet!); but will still be approximately the same. The final step of the process is actually rendering the video. Because each new shot can be slightly different than the original video, the system merges multiple frames together; selecting the areas in each frame for the best quality of the resulting video.

The result is quite amazing; but there are still some artifacts you can notice when watching the videos. Watching or stepping through the video frame by frame you will notice that objects can suddenly appear or boundary areas where the images are merged are easily identifiable. These sections are hard to notice when viewing full speed though.

The new technique is very resource intensive. The research paper mentions that it took roughly 305 hours to process a 10 minute video! Most of the computational time is consumed during the source selection with computes at roughly one minute per frame. I suspect that cloud computing (such as Amazon Web Services and Azure) will be strongly utilized to allow even a mobile phone app to be used in the video editing process. It will be interesting to see how this video editing will be used!


Leave a comment

3 thoughts on “Hyperlapse – First person videos finally become watchable!

  • Jody Hampton

    The result is quite simply breathtaking. It looks like something shot for a movie using a stabilised dollycam, the fact they were able to achieve the same thing using nothing but a GoPro, their software, and likely a week of post-processing on a high end desktop PC is simply amazing.

    I hope we see this technology actually become readily available. There might still be work to be done, but in general if they can reproduce the demo videos with other content then they’re on to something people would want.

  • Roderick Mendoza

    It’s striking but it’s far more believable when you realize that they need to play at a much faster speed than the source, so they have tons of extra samples from which to extract information. They basically use all that data in the extra frames (that would otherwise simply get tossed away in a regular time-lapse) to construct a 3D scene. This wouldn’t look nearly as good if they had to play it at normal speed.