A video of a foreign girl imitating an NPC (non-player character in a game) went viral. The girl’s facial expressions and body movements were so similar to those of the NPC that people could not tell whether it was a reality or a game.

In contrast to the strength of real people simulating virtual images, with the rise of the metaverse, digital people today achieve virtual replicas of real people.

In addition, a number of digital human figures have already appeared in online meetings, live video streaming and sports and fitness scenarios. The use of cameras, through facial photography and AI algorithms, to recognise facial expressions and then map them to the corresponding usage scenarios, is currently a relatively mainstream solution in the industry.

Also in the XR space, a previously revealed Magic Leap patent document explores a way to recognise facial expressions through camera shots of the eyes only (e.g. eyebrows, changes in eye shape).

More recently, facial recognition was done using ‘headphones’ in a technology study called EarIO, published by Cornell University.

During the demonstration, staff wore a device that resembled an open headset with a built-in battery, microphone, sound producing unit, Bluetooth module and other hardware devices.

The principle of implementation is that the device is connected to the mobile phone by Bluetooth and the deployment settings of the program are completed. After pre-converting the face into an avatar, audio (frequencies inaudible to the human ear) information is emitted towards the face via sound generating units on both sides of the device and the microphone captures the echo.

As the facial muscles change as a real person speaks, smiles, blinks and beeps, the echoes (unique echo contours) received change as well. The deep learning algorithm matches the collected sound data with the 52 facial expression parameters captured by the TruthDepth camera in the database, which is finally transformed into a real-time facial expression.

Unlike the large size and high power consumption of a camera, the EarIO is also naturally superior in this respect in the form of a headset, which can sample at 86Hz and consume only 154mW.

Of course, the EarIO has its problems: it is not ready to use, but requires at least half an hour for data training. At the same time, there is a slight lack of discrimination in the data and a certain error rate. The team says it will continue to optimise and overcome these problems.

It is reported that EarIO has already achieved compatibility with commercially available wireless video conferencing headsets, supporting the use of virtual avatars under video conferencing.