Satellites must be deorbited within five years of completing missions, FCC rules

The US Federal Communications Commission (FCC) has adopted new rules to address the growing risk of “space junk” or abandoned satellites, rockets and other debris. The new “5-year-rule” will require low-Earth operators to deorbit their satellites within five years following the completion of missions. That’s significantly less time than the previous guideline of 25 years. 

“But 25 years is a long time,” FCC Chairwoman Jessica Rosenworcel said in a statement. “There is no reason to wait that long anymore, especially in low-earth orbit. The second space age is here. For it to continue to grow, we need to do more to clean up after ourselves so space innovation can continue to respond.”

Rosenworcel noted that around 10,000 satellites weighing “thousands of metric tons” have been launched since 1957, with over half of those now defunct. The new rule “will mean more accountability and less risk of collisions that increase orbital debris and the likelihood of space communication failures.”

However, some US representatives don’t necessarily agree with the decision. Members of the Committee on Science, Space, and Technology said in a letter that such decisions are often taken by NASA. By acting unilaterally, the FCC “could create uncertainly and potentially conflicting guidance” for the space industry. They asked the FCC to explain the decision to Congress, saying “this would ensure that procedural measures such as the Congressional Review Act are not necessary.”

NASA has said there are “23,000 pieces of debris larger than a softball orbiting the Earth.” It noted that China’s 2007 anti-satellite test “added more than 3,500 pieces of large, trackable debris and many more smaller debris to the debris problem.”

All Facebook and Instagram users in the US can now show off their NFTs

Meta is done rolling out support for non-fungible tokens or NFTs in the US. The company first started giving select creators in the country the option to display their tokens on Facebook and Instagram earlier this year. But now everyone in the US can display their collections on both platforms, whether they’re NFTs they’ve created and are selling or something they’ve purchased from creators. Those who have both social media apps can also cross-post their digital collectibles from either app so they don’t have to share them twice.

In addition, the ability to post NFT collections on Instagram is now live for users in 100 countries. The social network first announced that it was going to expand the feature’s availability back in August to countries in Africa, Asia-Pacific, the Middle East and the Americas. This probably doesn’t sound like good news for people who aren’t fond of NFTs or are concerned about their environmental impact. But it looks like Meta is set on making them a part of its platforms in its journey towards creating the metaverse it envisions. 

To be able to display their collection, a user must have a supported digital wallet, which includes Rainbow, Trust Wallet, Dapper, Coinbase Wallet and MetaMask, installed on their device. They then have to link their wallet to either or both apps by going to the “digital collectibles” tab in their app settings. Once they’ve linked their wallets, they’ll be able to view their collectibles through the Facebook or Instagram app and share their NFTs to their feed. 

New York joins California in aiming to make all auto sales hybrid or EV by 2035

New York is following California’s lead by mandating that all new cars, pickups and SUVs sold in the state must be either EVs or plug-in hybrids, Governor Kathy Hochul announced. To reach that goal, 35 percent of new cars must be zero-emission by 2026 and 60 percent by 2030. New school buses must also be zero emissions by 2035. A public hearing will be held before the rules are put into place.

Hochul ordered the state’s environmental agency to create similar standards to those adopted by California that phases out all fossil-fuel-only car sales by 2035. Those rules went into last month and were designed to reduce passenger vehicle pollution 25 percent by 2037, with 9.5 fewer internal-combustion engine (ICE) only vehicles sold by 2035.

“We had to wait for California to take a step because there’s some federal requirements that California had to go first — that’s the only time we’re letting them go first,” the governor said in a press conference yesterday.

The state is following California’s actions for a reason. The Clean Air Act permits California to set its own pollution rules, but other states aren’t allowed to do that. However, they can follow California once it acts — so California must pave the way for any emissions rules implemented by individual states.

The governor also unveiled a $10 million Drive Clean Rebate Program. That gives residents a $2,000 rebate toward the purchase of over 60 EVs and plug-in hybrids that’s on top of the $7,500 federal tax rebate. The state has spent $92 million on the program to date. The state also announced the installation of its 100th fast charger as part of the EVolve charging network. 

“With sustained state and federal investments, our actions are incentivizing New Yorkers, local governments, and businesses to make the transition to electric vehicles,” Hochul said.

Former eBay execs get prison time in cyberstalking case involving Twitter threats and fetal pig deliveries

Two of the eBay executives who were charged for staging a cyberstalking campaign against the creators of the eCommerceBytes newsletter have been sentenced to prison. The Justice Department says that these execs, along with five other former eBay employees, worked together to intimidate David and Ina Steiner. They apparently hatched a scheme targeting the Steiners shortly after Ina published an article in their newsletter about a lawsuit eBay filed accusing Amazon of poaching its sellers. David said the people involved in their harassment made their lives “a living hell.”

James Baugh, eBay’s former senior director of safety and security, was sentenced to almost five years in prison and was ordered to pay a fine of $40,000. Meanwhile, David Harville, eBay’s former Director of Global Resiliency and the last person in the case who pleaded guilty, got a two-year sentence and was ordered to pay a $20,000 fine. 

According to the DOJ, the group sent disturbing deliveries to the couple’s home, including “a book on surviving the death of a spouse, a bloody pig mask, a fetal pig, a funeral wreath and live insects.” They also sent the couple threatening Twitter messages and posted on Craigslist to invite the public to partake in sexual encounters at the victims’ home. Authorities also said that Baugh, Harville and another eBay employee monitored the couple’s home in person with the intention of attaching a GPS tracker to their car. 

Based on the case’s court documents, David Wenig, who was eBay’s CEO at the time, sent another top exec a message that said “If you are ever going to take her down … now is the time” 30 minutes after Ina’s post was published. In turn, that executive sent Wenig’s message to Baugh, adding that Ina was a “biased troll who needs to get BURNED DOWN.” As The Washington Post notes, Wenig was not charged in the case but is facing a civil lawsuit from the Steiners, who accused him of attempting to “intimidate, threaten to kill, torture, terrorize, stalk and silence them.” He denied any knowledge of the harassment campaign. 

As for Baugh and Harville, both asked the Steiners for forgiveness, according to The Post. “I take 100% responsibility for this, and there is no excuse for what I have done. The bottom line is simply this: If I had done the right thing and been strong enough to make the right choice, we wouldn’t be here today, and for that I am truly sorry,” Baugh said.

Elon Musk’s texts with Jack Dorsey and Parag Agrawal detail tumultuous Twitter negotiations

A tranche of Elon Musk’s private messages have been made public as part of his ongoing lawsuit with Twitter. The messages, revealed in a court filing Thursday, shed new light on Musk’s behind-the-scenes negotiations with Twitter’s leadership, as well discussions with former CEO Jack Dorsey, and how Musk’s talks with CEO Parag Agrawal quickly soured.

The messages include the moment Musk tells Agrawal he wants to acquire Twitter and take it private, rather than join the board. Agrawal confronts Musk about an April 9th tweet questioning if “Twitter is dying.”

Agrawal writes to Musk:

You are free to tweet “is Twitter dying?” or anything else about Twitter – but it’s my responsibility to tell you that it’s not helping me make Twitter better in the current context. Next time we speak, I’d like to you provide you [sic] perspective on the level of the internal distraction right now and how it [sic] hurting our ability to do work. I hope the AMA will help people get to know you, to understand why you believe in Twitter, and to trust you – and I’d like the company to get to a place where we are more resilient and don’t get distracted but we aren’t there right now.

Musk responded less than a minute later. “What did you get done this week? I’m not joining the board. This is a waste of time. Will make an offer to take Twitter private.”

Twitter board chair Bret Taylor followed up with Musk a few minutes later asking to talk. “Fixing Twitter by chatting with Parag won’t work,” Musk tells Taylor. “Drastic action is needed. This is hard to do as a public company, as purging fake users will make the numbers look terrible, so restructuring should be done as a private company. This is Jack’s opinion too.”

The messages also provide a glimpse into the relationship between Dorsey and Musk. Dorsey has publicly said that “Elon is the singular solution I trust,” but hasn’t publicly commented since Musk sued in an attempt to renege on the acquisition.

But in the newly released messages, it’s clear Dorsey has wanted Musk to take on an active role at Twitter for some time. Dorsey tells Musk that he wanted him to join Twitter’s board of directors long before Musk acquired a large stake in the company.

“Back when we had the activist come in, I tried my hardest to get you on our board and our board said no. That’s about the time I decided I needed to work to leave, as hard as it was for me,” Dorsey says. “I think the main reason is the board is just super risk averse and saw adding you as more risk, which I though was completely stupid and backwards, but I only had one vote, and 3% of company, and no dual class shares. Hard set up. We can discuss more.”

Dorsey seemed to be referring to Elliott Management, the activist investor that attempted to oust Dorsey in early 2020.

Notably, this conversation occurred in late March, after Musk had acquired a multibillion-dollar stake in Twitter, but before his stake had been made public. He and Dorsey also discussed the Twitter cofounder’s belief that Twitter “can’t be a company.”

Dorsey writes to Musk:

I believe it must be an open source protocol, funded by a foundation of sorts that doesn’t own the protocol, only advances it. A bit like what Signal has done. It can’t have an advertising model. Otherwise you have surface area that governments and advertisers will try to influence and control. If it has a centralized entity behind it, it will be attacked. This isn’t complicated work, it just has to be done right so it’s resilient to what has happened to twitter.

Musk responds that the idea is “super interesting” and that “it’s worth both trying to move Twitter in a better direction and doing something new that’s decentralized.”

The following month, Dorsey also attempted to play mediator between Musk and Agrawal, at one point arranging a call between the three of them. “You and I are in complete agreement,” Musk tells Dorsey. “Parag is just moving far too slowly and trying to please people who will not be happy no matter what he does.”

“At least it became clear that you can’t work together,” Dorsey later responds. “That was clarifying.”

Intel claims its Arc A770 and A750 GPUs will outperform NVIDIA’s mid-range RTX 3060

Ahead of bringing its Arc desktop GPUs to everyone in a couple of weeks, Intel has revealed more details about what to expect from the graphics cards in terms of specs and performance. The A770, which starts at $329, will have 32 Xe cores, 32 ray-tracing units and a 2,100MHz graphics clock. In terms of RAM, it comes in 8GB and 16GB configurations, with up to 512 Gb/s and 560 Gb/s of memory bandwidth, respectively.

As for the A750, which Intel just announced will start at $289, that has 28 Xe cores, 28 ray-tracing units, a 2,050MHz graphics clock, 8GB of memory and up to 512 Gb/s of memory bandwidth. All three cards, which will be available on October 12th, have 225W of total power.

Intel claims that, based on benchmarking tests, you’ll get more bang for your buck with these cards than NVIDIA’s mid-range GeForce RTX 3060. It says the A770 offers 42 percent greater performance per dollar vs. the RTX 3060, while the A750 is seemingly 53 percent better on a per-dollar basis.

It claims that, in most of the games it tested, the A770’s 16GB configuration delivered better ray-tracing performance than the similarly priced RTX 3060 (which, in fairness, debuted back in early 2021). When it came to Fortnite, Intel says the A770 had 1.56 times the ray-tracing performance of the RTX 3060.

Of course Intel is going to tout its GPUs as being better than the competition. We’ll have to wait for the results of our own Intel Arc benchmarking tests to have a true sense of the performance.

In any case, it’s looking like NVIDIA is about to have more competition on the GPU front. Only this time, it’s from an established brand that just so happens to be behind many of the processors powering the PCs that might very well have used NVIDIA cards otherwise.

Twitter embraces TikTok-style ‘immersive’ video

Videos on Twitter will now look a lot more like TikTok. The company announced that it’s switching to a full-screen “immersive” video player for watching clips. It’s also borrowing the now-familiar “swipe up” gesture that will allow people to endlessly …

MSI Stealth 15M review: Coasting on its good looks

It’s only natural that a person’s tastes and preferences change over time. So after years of thirsting for big, beefy gaming laptops with shiny lights, I’ve started gravitating towards more understated all-rounders that don’t scream “Look at me.” And for the last few generations, MSI’s Stealth 15M line has been one of the best at balancing good performance with a discreet appearance. But unfortunately, it feels like MSI is coasting with the 2022 model. While there aren’t a ton of major flaws, things like the Stealth’s display, battery life and audio just aren’t quite as nice as I would’ve liked.

Design

For a gaming notebook, the Stealth 15M is about as incognito as it gets. It’s got a simple, somewhat boxy build with a matte black finish (which is a bit of a fingerprint magnet, by the way). The only visual flair, at least on the outside, is MSI’s dragon logo, which gets a new holographic treatment for 2022.

Then you open it up and you get MSI’s lovely Spectrum backlit keyboard, a big speaker grill that runs the width of the deck and a smallish touchpad. Along its sides MSI includes a good selection of ports including four USB 3.2 ports (two Type-A and two Type-C, one of which supports DisplayPort), a full-size HDMI 2.1 connector and a combo headphone/mic jack. And with a weight of just under four pounds (3.96 lbs), the Stealth is actually a touch lighter than a lot of other 15-inch gaming laptops (and some 14-inch systems too).

Display, sound and webcam

On paper, the Stealth 15M’s screen looks like a perfect match for its specs. It’s a 15.6-inch IPS panel with a 144Hz refresh rate. It even has a matte finish to help prevent distracting reflections. The issue is that because it has somewhat dull colors and a tested brightness of around 250 nits, movies and games look kind of lifeless. Sure, if you like gaming in darker environments, it’s not a big deal. But its mediocre light output also means that in sunny rooms, it can be difficult to read text, especially if you’re someone who prefers dark mode apps.

The Stealth 15M features a 15.6-inch IPS display with a 1920 x 1080 resolution. Sadly, with a brightness of around 250 nits, it can look a bit dim and lifeless, particularly in sunny rooms.
Sam Rutherford/Engadget

As for audio, the Stealth features dual two-watt speakers that can get pretty loud, though they are lacking a bit of bass. Don’t get me wrong, they’re perfectly fine, I was just hoping for a little more considering the size of its grille. And then perched above the display is a 720p webcam which is serviceable but it doesn’t deliver the kind of quality you’d want for live streaming. It’s more so you can show your face during Zoom meetings, and that’s about it.

But once again, while nothing is egregiously bad, I feel like MSI is doing the bare minimum here. Its speakers are just ok, its webcam doesn’t even capture full HD and that big chin below the display makes the whole laptop look sort of dated.

Performance

The Stealth 15M has a great selection of ports including two USB-A ports, two USB-C ports, a full-size HDMI 2.1 port and a combo headphone/mic jack
Sam Rutherford/Engadget

When it comes to performance the Stealth has plenty of oomph thanks to an Intel Core i7-1280P CPU and an NVIDIA RTX 3060 GPU. Our review unit even comes with a 1TB SSD and 32GB of RAM, the latter of which is arguably overkill given the rest of the system’s specs. However, you’ll want to make sure you figure out where the Stealth’s fan speed settings are in the MSI Center app, because when this thing spins up, you’re in for more than just a subtle whoosh.

In Shadows of the Tomb Raider at 1920 x 1080 and highest settings, the Stealth averaged 106 fps, which is just a tiny bit better than the 102 fps we got from the similarly-sized Alienware x14. Meanwhile, in Metro Exodus, the Stealth tied the Alienware’s performance, with both machines hitting 55 fps at full HD and ultra settings. So not exactly face-melting horsepower, but still more than enough to play modern AAA titles with plenty of graphical bells and whistles enabled.

Keyboard and touchpad

The Spectrum keyboard on the Stealth 15M has a soft, cushy press, though sadly, you can't adjust its color pattern like on a lot of other gaming laptops.
The Spectrum keyboard on the Stealth 15M has a soft, cushy press, though sadly, you can’t adjust its color pattern like on a lot of other gaming laptops.
Sam Rutherford/Engadget

One thing I really like about the Stealth 15M is its Spectrum keyboard. Not only do the keys have a soft, cushy press, they let just the right amount of light to leak out the sides, adding a little razzle dazzle without searing your retinas. And of course, you can turn everything off if you want to go fully undercover. Below that you get a touchpad that measures just four inches wide and two and half inches tall, which can feel a bit cramped at times. That said, having an undersized touchpad isn’t as big of a deal as it might be on a more mainstream notebook. Most gamers will probably carry an external mouse since touchpads really aren’t ideal for gaming.

Battery life

Perhaps the biggest weakness of the Stealth 15M is its battery life. It comes with a 53.8Whr power cell, which feels frustratingly small compared to the Alienware x14, whose battery is 50 percent larger at 80Whr, despite both systems being about the same size. That results in some pretty disappointing longevity, with the Stealth lasting just four hours and 15 minutes on our local video rundown test versus 9:45 for the x14 and 5:42 for the more powerful Razer Blade 15.

Wrap-up

Unfortunately, on the 2022 Stealth 15M it feels like MSI neglected the line, because aside from a new badge on its lid and a refreshed CPU and GPU, it feels like not a ton has changed from the previous model.
Sam Rutherford/Engadget

After using the Stealth 15M for a while, I’m not really mad, I’m just disappointed. I love the general design and aesthetic and the Stealth delivers a great balance of performance and portability. In a lot of ways it feels like a more grown-up take on thin-and-light gaming laptop.

The issue is that it almost feels like MSI has neglected the Stealth line. Compared to previous years, the main upgrades for 2022 are a refreshed CPU and GPU along with a new badge on its lid. That’s not nothing, but I know MSI can do better and I’m really hoping to see the Stealth get a full redesign sometime soon.

Ultimately, assuming you can stomach the short battery life, the value of the Stealth 15M hinges a lot on its price. I’ve seen this thing listed as high as $1,700 from retailers like Walmart, which is simply too much. At that point, you’re much better off going for a notebook with a slightly smaller screen like the Alienware X14 and getting very similar performance, or opting for Asus’ Zephyrous G14 while also saving a couple hundred bucks in the process. But if you can nab the Stealth for under $1,400, a lot of the system’s trade-offs become a lot more palatable. I just wish this version of the Stealth felt more like James Bond and less like Agent Cody Banks.

Meta reportedly suspends all hiring, warns staff of possible layoffs

As with many other industries, the tech sector has been feeling the squeeze of the global economic slowdown this year. Meta isn’t immune to that. Reports in May suggested that the company would slow down the rate of new hires this year. Now, Bloomberg reports that Meta has put all hiring on hold. 

CEO Mark Zuckerberg is also said to have told staff that there’s likely more restructuring and downsizing on the way. “I had hoped the economy would have more clearly stabilized by now, but from what we’re seeing it doesn’t yet seem like it has, so we want to plan somewhat conservatively,” Zuckerberg reportedly told employees. 

The company is planning to reduce budgets for most of its teams, according to Bloomberg. Zuckerberg is said to be leaving headcount decisions in the hands of team leaders. Measures may include moving people to other teams and not hiring replacements for folks who leave.

Meta declined to comment on the report. The company directed Engadget to remarks Zuckerberg made during Meta’s most recent earnings call in July. “Given the continued trends, this is even more of a focus now than it was last quarter,” Zuckerberg said at the time. “Our plan is to steadily reduce headcount growth over the next year. Many teams are going to shrink so we can shift energy to other areas, and I wanted to give our leaders the ability to decide within their teams where to double down, where to backfill attrition, and where to restructure teams while minimizing thrash to the long-term initiatives.”

In an earnings report, Meta disclosed that, in the April-May quarter, its revenue dropped by one percent year-over-year. It’s the first time the company has ever reported a fall in revenue.

Word of the hiring freeze ties in with a report from last week, which suggested that Meta has quietly been ushering some workers out the door rather than conducting formal layoffs. In July, it emerged that the company asked team heads to identify “low performers” ahead of possible downsizing. The company is said to have been cutting costs on other fronts, such as by cutting contractors and killing off some projects in its Meta Reality Labs division. Those reportedly included a dual-camera smartwatch.

AI is already better at lip reading than we are

They Shall Not Grow Old, a 2018 documentary about the lives and aspirations of British and New Zealand soldiers living through World War I from acclaimed Lord of the Rings director Peter Jackson, had its hundred-plus-year-old silent footage modernized through both colorization and the recording of new audio for previously non-existent dialog. To get an idea of what the folks featured in the archival footage were saying, Jackson hired a team of forensic lip readers to guesstimate their recorded utterances. Reportedly, “the lip readers were so precise they were even able to determine the dialect and accent of the people speaking.”

“These blokes did not live in a black and white, silent world, and this film is not about the war; it’s about the soldier’s experience fighting the war,” Jackson told the Daily Sentinel in 2018. “I wanted the audience to see, as close as possible, what the soldiers saw, and how they saw it, and heard it.”

That is quite the linguistic feat given that a 2009 study found that most people can only read lips with around 20 percent accuracy and the CDC’s Hearing Loss in Children Parent’s Guide estimates that, “a good speech reader might be able to see only 4 to 5 words in a 12-word sentence.” Similarly, a 2011 study out of the University of Oklahoma saw only around 10 percent accuracy in its test subjects.

“Any individual who achieved a CUNY lip-reading score of 30 percent correct is considered an outlier, giving them a T-score of nearly 80 three times the standard deviation from the mean. A lip-reading recognition accuracy score of 45 percent correct places an individual 5 standard deviations above the mean,” the 2011 study concluded. “These results quantify the inherent difficulty in visual-only sentence recognition.”

For humans, lip reading is a lot like batting in the Major Leagues — consistently get it right even just three times out of ten and you’ll be among the best to ever play the game. For modern machine learning systems, lip reading is more like playing Go — just round after round of beating up on the meatsacks that created and enslaved you — with today’s state-of-the-art systems achieving well over 95 percent sentence-level word accuracy. And as they continue to improve, we could soon see a day where tasks from silent-movie processing and silent dictation in public to biometric identification are handled by AI systems.

Context matters

it's a statue
Wikipedia / Public Domain

Now, one would think that humans would be better at lip reading by now given that we’ve been officially practicing the technique since the days of Spanish Benedictine monk, Pedro Ponce de León, who is credited with pioneering the idea in the early 16th century.

“We usually think of speech as what we hear, but the audible part of speech is only part of it,” Dr. Fabian Campbell-West, CTO of lip reading app developer, Liopa, told Engadget via email. “As we perceive it, a person’s speech can be divided into visual and auditory units. The visual units, called visemes, are seen as lip movements. The audible units, called phonemes, are heard as sound waves.”

“When we’re communicating with each other face-to-face is often preferred because we are sensitive to both visual and auditory information,” he continued. “However, there are approximately three times as many phonemes as visemes. In other words, lip movements alone do not contain as much information as the audible part of speech.”

“Most lipreading actuations, besides the lips and sometimes tongue and teeth, are latent and difficult to disambiguate without context,” then-Oxford University researcher and LipNet developer, Yannis Assael, noted in 2016, citing Fisher’s earlier studies. These homophemes are the secret to Bad Lip Reading’s success.

What’s wild is that Bad Lip Reading will generally work in any spoken language, whether it’s pitch-accent like English or tonal like Vietnamese. “Language does make a difference, especially those with unique sounds that aren’t common in other languages,” Campbell-West said. “Each language has syntax and pronunciation rules that will affect how it is interpreted. Broadly speaking, the methods for understanding are the same.”

“Tonal languages are interesting because they use the same word with different tone (like musical pitch) changes to convey meaning,” he continued. “Intuitively this would present a challenge for lip reading, however research shows that it’s still possible to interpret speech this way. Part of the reason is that changing tone requires physiological changes that can manifest visually. Lip reading is also done over time, so the context of previous visemes, words and phrases can help with understanding.”

“It matters in terms of how good your knowledge of the language is because you’re basically limiting the set of ambiguities that you can search for,” Adrian KC Lee, ScD, Professor and Chair of the Speech and Hearing Sciences Department, Speech and Hearing Sciences at University of Washington, told Engadget. “Say, ‘cold; and ‘hold,’ right? If you just sit in front of a mirror, you can’t really tell the difference. So from a physical point of view, it’s impossible, but if I’m holding something versus talking about the weather, you, by the context, already know.”

In addition to the general context of the larger conversion, much of what people convey when they speak comes across non-verbally. “Communication is usually easier when you can see the person as well as hear them,” Campbell-West said, “but the recent proliferation of video calls has shown us all that it’s not just about seeing the person there’s a lot more nuance. There is a lot more potential for building intelligent automated systems for understanding human communication than what is currently possible.”

Missing a forest for the trees, linguistically

While human and machine lip readers have the same general end goal, the aims of their individual processes differ greatly. As a team of researchers from Iran University of Science and Technology argued in 2021, “Over the past years, several methods have been proposed for a person to lip-read, but there is an important difference between these methods and the lip-reading methods suggested in AI. The purpose of the proposed methods for lip-reading by the machine is to convert visual information into words… However, the main purpose of lip-reading by humans is to understand the meaning of speech and not to understand every single word of speech.”

In short, “humans are generally lazy and rely on context because we have a lot of prior knowledge,” Lee explained. And it’s that dissonance in process — the linguistic equivalent of missing a forest for the trees — that presents such a unique challenge to the goal of automating lip reading.

“A major obstacle in the study of lipreading is the lack of a standard and practical database,” said Hao. “The size and quality of the database determine the training effect of this model, and a perfect database will also promote the discovery and solution of more and more complex and difficult problems in lipreading tasks.” Other obstacles can include environmental factors like poor lighting and shifting backgrounds which can confound machine vision systems, as can variances due the speaker’s skin tone, the rotational angle of their head (which shifts the viewed angle of the mouth) and the obscuring presence of wrinkles and beards.

As Assael notes, “Machine lipreading is difficult because it requires extracting spatiotemporal features from the video (since both position and motion are important).” However, as Mingfeng Hao of Xinjiang University explains in 2020’s A Survey on Lip Reading Technology, “action recognition, which belongs to video classification, can be classified through a single image.” So, “while lipreading often needs to extract the features related to the speech content from a single image and analyze the time relationship between the whole sequence of images to infer the content.“ It’s an obstacle that requires both natural language processing and machine vision capabilities to overcome.

Acronym soup

Today, speech recognition comes in three flavors, depending on the input source. What we’re talking about today falls under Visual Speech Recognition (VSR) research — that is, using only visual means to understand what is being conveyed. Conversely, there’s Automated Speech Recognition (ASR) which relies entirely on audio, ie “Hey Siri,” and Audio-Visual Automated Speech Recognition (AV-ASR), which incorporates both audio and visual cues into its guesses.

“Research into automatic speech recognition (ASR) is extremely mature and the current state-of the-art is unrecognizable compared to what was possible when the research started,” Campbell-West said. “Visual speech recognition (VSR) is still at the relatively early stages of exploitation and systems will continue to mature.” Liopa’s SRAVI app, which enables hospital patients to communicate regardless of whether they can actively verbalize, relies on the latter methodology. “This can use both modes of information to help overcome the deficiencies of the other,” he said. “In future there will absolutely be systems that use additional cues to support understanding.”

“There are several differences between VSR implementations,” Campbell-West continued. “From a technical perspective the architecture of how the models are built is different … Deep-learning problems can be approached from two different angles. The first is looking for the best possible architecture, the second is using a large amount of data to cover as much variation as possible. Both approaches are important and can be combined.”

In the early days of VSR research, datasets like AVLetters had to be hand-labeled and -categorized, a labor-intensive limitation that severely restricted the amount of data available for training machine learning models. As such, initial research focused first on the absolute basics — alphabet and number-level identification — before eventually advancing to word- and phrase-level identification, with sentence-level being today’s state-of-the-art which seeks to understand human speech in more natural settings and situations.

In recent years, the rise of more advanced deep learning techniques, which train models on essentially the internet at large, along with the massive expansion of social and visual media posted online, have enabled researchers to generate far larger datasets, like the Oxford-BBC Lip Reading Sentences 2 (LRS2), which is based on thousands of spoken lines from various BBC programs. LRS3-TED gleaned 150,000 sentences from various TED programs while the LSVSR (Large-Scale Visual Speech Recognition) database, among the largest currently in existence offers 140,000 hours of audio segments with 2,934,899 speech statements and over 127,000 words.

And it’s not just English: Similar datasets exist for a number of languages such as HIT-AVDB-II, which is based on a set of Chinese poems, or IV2, a French database composed of 300 people saying the same 15 phrases. Similar sets exist too for Russian, Spanish and Czech-language applications.

Looking ahead

VSR’s future could wind up looking a lot like ASR’s past, says Campbell-West, “There are many barriers for adoption of VSR, as there were for ASR during its development over the last few decades.” Privacy is a big one, of course. Though the younger generations are less inhibited with documenting their lives on line, Campbell-West said, “people are rightly more aware of privacy now then they were before. People may tolerate a microphone while not tolerating a camera.”

Regardless, Campbell-West remains excited about VSR’s potential future applications, such as high-fidelity automated captioning. “I envisage a real-time subtitling system so you can get live subtitles in your glasses when speaking to someone,” Campbell-West said. “For anyone hard-of-hearing this could be a life-changing application, but even for general use in noisy environments this could be useful.”

“There are circumstances where noise makes ASR very difficult but voice control is advantageous, such as in a car,” he continued. “VSR could help these systems become better and safer for the driver and passengers.”

On the other hand, Lee, whose lab at UW has researched Brain-Computer Interface technologies extensively, sees wearable text displays more as a “stopgap” measure until BCI tech further matures. “We don’t necessarily want to sell BCI to that point where, ‘Okay, we’re gonna do brain-to-brain communication without even talking out loud,’“ Lee said. “In a decade or so, you’ll find biological signals being leveraged in hearing aids, for sure. As little as [the device] seeing where your eyes glance may be able to give it a clue on where to focus listening.”

“I hesitate to really say ‘oh yeah, we’re gonna get brain-controlled hearing aids,” Lee conceded. “I think it is doable, but you know, it will take time.”