Microsoft's VASA-1 Introduction Fuels Deepfake Discussion

Microsoft’s introduction of its VASA-1 AI system that can create highly realistic videos from a single photograph and an audio track is fueling discussions over its potential use in the spread of deepfakes.

Developed and published by Microsoft Research Asia, “VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time,” utilizes sophisticated machine-learning algorithms to analyze a static image and an accompanying audio clip.

It then generates a synchronized high-quality video of the individual talking or singing with remarkable accuracy.

$MSFT Microsoft teases deepfake AI that's too good to release.

VASA-1 framework can turn a still image and a cloned voice file into a plausible video of a person talking.

Microsoft this week demoed VASA–1, a framework for creating videos of people talking from a still image,… pic.twitter.com/px6KtV2I8L
— Canadian Jennifer 🇨🇦 (@cdntradegrljenn) April 21, 2024

Although the Mona Lisa sample video published in Microsoft’s press release is clearly AI-generated and even comical, it still showcases the advanced model’s capabilities.

What truly raised alarms that VASA-1 could be used for many unsavory activities, such as the creation and spread of deepfakes, is that the samples using photographs of people (they were AI-generated, but they look real) talking look so realistic.

Microsoft just introduced VASA-1.#DeepFake pic.twitter.com/IAgNm57LZ7
— JJohnnymeetei (@johnnykeith) April 18, 2024

The nuances, like eye and lip movement, can still use some improvements for them to completely be human-like, but the quality of the generated videos is enough to make many people mistake them as real.

“Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness,” the research’s abstract stated.

On the Threat of Deepfakes

Looking at the sample videos, it is not surprising that the introduction of VASA-1 has sparked a renewed debate on the ethical implications and potential misuse of AI in deepfake creation.

In January, X blocked Taylor Swift searches on its platform after the circulation of deepfake images depicting the pop star in a pornographic way.

In 2023, Graphika published a report about how deepfakes were by pro-China bot accounts to post videos on Facebook and Twitter that show AI-generated newscasters sharing anti-American and pro-Chinese views.

Aside from being used to harass and demean people which may cause significant reputational and emotional damage, deepfakes can also spread misinformation that may be used to sway public opinion.

These are just a couple of examples of the negative consequences of deepfakes, and this is why Microsoft’s VASA-1 is raising concerns.

Deepfakes just got scary real! Microsoft's VASA-1 can create disturbingly lifelike video from a single pic & audio.

While not public yet, this tech could spark a deepfake wildfire.

Get ready for a deluge of fake vids more convincing than ever before. The future of… pic.twitter.com/BF2GezN7YY
— Abhishek Krishna (@abhicris) April 19, 2024

“Microsoft's new AI tool is a deepfake nightmare machine© Microsoft. VASA-1 is an AI image-to-video model that can generate videos from just one photo and a speech audio clip,” user @medicboyc wrote on X.

“It's a new AI model that can turn 1 photo and 1 piece of audio into a fully lifelike human deepfake. Wild to drop this right before the election,” @rowancheung said in an X post.

Risks and Responsible AI Consideration

In its press release, Microsoft acknowledged these concerns and emphasized that VASA-1 is intended for positive applications, such as enhancing educational accessibility, providing companionship, and assisting individuals with communication challenges.

“It is not intended to create content that is used to mislead or deceive. However, like other related content generation techniques, it could still potentially be misused for impersonating humans,” Microsoft wrote.

“We are opposed to any behavior to create misleading or harmful contents of real persons, and are interested in applying our technique for advancing forgery detection.”

The company also clarified that no plans have been made yet about releasing VASA-1.

“Given such context, we have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.”