by Zhi Li, Kyle Swanson, Christos Bampis, Lukáš Krasula and Anne Aaron
Over the past few years, we have been striving to make VMAF a more usable tool not just for Netflix, but for the video community at large. This tech blog highlights our recent progress toward this goal.
VMAF is a video quality metric that Netflix jointly developed with a number of university collaborators and open-sourced on Github. VMAF was originally designed with Netflix’s streaming use case in mind, in particular, to capture the video quality of professionally generated movies and TV shows in the presence of encoding and scaling artifacts. Since its open-sourcing, we have started seeing VMAF being applied in a wider scope within the open-source community. To give a few examples, VMAF has been applied to live sports, video chat, gaming, 360 videos, and user generated content. VMAF has become a de facto standard for evaluating the performance of encoding systems and driving encoding optimizations.
VMAF stands for Video Multi-Method Assessment Fusion. It leans on Human Visual System modeling, or the simulation of low-level neural-circuits to gather evidence on how the human brain perceives quality. The gathered evidence is then fused into a final predicted score using machine learning, guided by subjective scores from training datasets. One aspect that differentiates VMAF from other traditional metrics such as PSNR or SSIM, is that VMAF is able to predict more consistently across spatial resolutions, across shots, and across genres (for example. animation vs. documentary). Traditional metrics, such as PSNR, are already able to do a good job evaluating the quality for the same content on a single resolution, but they often fall short when predicting quality across shots and different resolutions. VMAF fills this gap. For more background information, interested readers may refer to our first and second tech blogs on VMAF.
Recently, we migrated VMAF’s license from Apache 2.0 to BSD+Patent to allow for increased compatibility with other existing open source projects. In the rest of this blog, we highlight three other areas of recent development, as our efforts toward making VMAF a better quality metric for the community.
Improving the speed performance of VMAF has been a major theme over the past several years. Through low-level code optimization and vectorization, we sped up VMAF’s execution by more than 4x in the past. We also introduced frame-level multithreading and frame skipping, that allow VMAF to run in real time for 4K videos.
Most recently, we teamed up with Facebook and Intel to make VMAF even faster. This work took place in two steps. First, we worked with Ittiam to convert from the original floating-point based representation to fixed-point; and second, Intel implemented vectorization on the fixed-point data pipeline.
This work has allowed us to squeeze out another 2x speed gain on average while maintaining the numerical accuracy at the first decimal digit of the final score. The figure above shows the relative speed improvement under Intel Advanced Vector Extension 2 (Intel AVX2) and Intel AVX-512 intrinsics, for video at 4K, full HD and SD resolutions. Also notice that this is an ongoing effort, so stay tuned for more speed improvements.
The new BSD+Patent license allows for increased compatibility with existing open source projects. This brings us to the second area of development, which is on how VMAF can be integrated with them. For historical reasons, the libvmaf C library has been a minimal solution to integrate VMAF with FFmpeg. This year, we invested heavily on revamping the API. Today, we are annoucing the release of libvmaf v2.0.0. It comes with a new API that is much easier to use, integrate and extend.
This table above highlights the features achieved by the new API. A number of areas are worth highlighting:
- It is extensible without breaking the API.
- It is easy to add a new feature extractor. And this can easily support future evolution of the VMAF algorithms.
- It becomes very flexible to allocate memory and incrementally calculate VMAF at the frame level.
The last feature makes it possible to integrate VMAF in an encoding loop, guiding encoding decisions iteratively on a frame-by-frame basis.
One unique feature about VMAF that differentiates it from traditional metrics such as PSNR and SSIM is that VMAF can capture the visual gain from image enhancement operations, which aim to improve the subjective quality perceived by viewers.