Nvidia introduces AI to generate talking heads for video conferencing from 2D images



Nvidia AI researchers have introduced AI to generate talking heads for video conferencing from a single 2D image capable of achieving a wide variety of manipulation, from turning and moving a person’s head to motion transmission and video reconstruction. The AI ​​uses the first frame in a video as a 2D photo and then uses an unattended learning method to collect 3D key points in a video. In addition to outperforming other approaches in benchmark dataset tests, the AI ​​achieves H.264 quality video at one-tenth of the bandwidth previously required.

Nvidia research scientists Ting-Chun Wang, Arun Mallya and Ming-Yu Liu published a paper on the model on the preprint repository arXiv on Monday. The results show that the latest AI model outperforms vid2vid, a GAN with a limited number of images described in a paper published on NeurIPS last year, of which Wang was the lead author and Liu a co-author.

“By just changing the transformation of the key point, we can generate videos for free. By only sending the key point transforms, we can achieve much better compression ratios than existing methods, ”the paper said. “By drastically reducing bandwidth and providing an immersive experience, we believe this is an important step into the future of video conferencing.”

The model’s release follows the October debut of Maxine, an Nvidia video conferencing service. In addition to offering virtual backgrounds like Zoom does, Maxine delivers subtle AI-powered features like facial alignment and noise cancellation with less conspicuous features like a conversational AI avatar or live translation.

Video calls for Microsoft Teams and Zoom also use forms of AI to blur backgrounds and enhance augmented reality animations and effects, for example. A day before Salesforce acquired Slack for $ 27 billion, a paper on the Nvidia AI release was published, news that could shake up the corporate communications landscape and spark the feud between Microsoft Teams and Slack. Microsoft also today introduced an update to the Teams calling experience.

Nvidia is one of the best known companies in the world working on generative adversarial (GANs) models like StyleGan that can warp reality and blur the lines between what is real and what is fake. Such AI models have potential applications for entertainment and gaming, as well as for disinformation or creating fake accounts. While there was a lot of concern – thankfully unmet – about the possibility that deepfakes may accelerate disinformation in the run-up to the US presidential election in November, GANs came into the picture. In one case this fall, Russian state actors used fake profile images generated using GANs as part of an effort to create a fake news outlet manned by real Russian writers to propel propaganda. In another incident in 2019, AI-generated images were used to create a profile for Katie Jones, a fake person with an AI-generated photo who reached out to Washington DC political influencers and think tank researchers.


Source link