Thanks to the use of several dozen action cameras and convolutional neural networks, volumetric videos become even more realistic.
Google experts have developed a technology that makes it possible to show the viewer the most realistic three-dimensional video. Thanks to the polymer hemisphere, on which the action cameras are located, a person can see what is happening in the video from different angles and directions. An article about this development will be presented at the SIGGRAPH 2020 conference, scheduled for mid-July.
A person sees the world as voluminous due to the fact that he has two eyes, and the visual cortex of the brain receives an image from two different points in space. And thanks to head movements, the parallax of movement (projection changes on the retina of the eye when moving in space) is involved, and this gives us the opportunity to evaluate the volume and displacement of objects relative to each other.
In stereo cameras and helmets of virtual reality, binocular vision is reproduced due to the fact that each eye receives frames from a different angle. However, it is impossible to reproduce the parallax of motion with this approach, because when shooting the camera was already at a specific point, and changing it afterward is already impossible.
Google engineers have been working on this problem for years. Thanks to a combination of hardware and software methods, they were able to create a technology that makes it possible to consider what is happening on video as if from different points. Frames from 46 cameras located on a hemispherical surface are transmitted to a convolutional neural network, which, in turn, splits these frames into many layers, depending on the distance to a specific object in the shooting field.
Each video is divided into 160 layers with a resolution of 1800 by 1350 pixels, after which these layers are optimized: every eight layers are combined with the creation of a polygonal mesh, and the image is superimposed as a texture. The resulting textures are combined into a texture atlas, the resolution of which is 3240 by 5760 pixels. Thus, the initial data stream, in which many frames correspond to each frame, turns into two separate streams: the first contains images that can be effectively compressed, the second contains polygonal meshes.
Using this approach, the developers created several videos in which you can change the angle by moving the cursor. The bitrate of the stream (the number of bits used to process and transmit data per unit time) is in the range from 150 to 300 megabits per second. This makes it possible to apply the technology for streaming realistic surround video to users of virtual reality helmets with gigabit Internet.