Tag: speech recognition

Microsoft’s AI generates realistic speech with only 200 training samples

Modern text-to-speech algorithms are incredibly capable, and you needn’t look further for evidence than Google’s recently open-sourced SpecAugment or Translatotron — the latter can directly translate a person’s voice into another language while retaining tone and tenor. But there’s always room for improvement. Toward that end, researchers at Microsoft recently detailed in a paper (“Almost […]

Read More

Google’s Live Transcribe is getting sound events and transcription saving

Google today announced new features for its speech recognition transcription tool Live Transcribe: sound events and transcription saving. The news is being shared today to line up with Global Accessibility Awareness Day and the update is “coming next month.” Google released Live Transcribe in February. The tool uses machine learning algorithms to turn audio into […]

Read More

Alexa speech normalization AI reduces errors by up to 81%

Text normalization is a fundamental processing step in most natural language systems. In the case of Amazon’s Alexa, “book me a table at 5:00 p.m.” might be transcribed by the assistant’s automatic speech recognizer as “five p m” and further reformatted to “5:00PM.” Inversely, Alexa might convert “5:00PM” to “five thirty p m” for Alexa’s […]

Read More

IBM’s AI performs state-of-the-art broadcast news captioning

Two years ago, researchers at IBM claimed state-of-the-art transcription performance with a machine learning system trained on two public speech recognition data sets, which was more impressive than it might seem. The AI system had to contend not only with distortions in the training corpora’s audio snippets, but with a range of speaking styles, overlapping […]

Read More

Amazon Alexa scientists retrain an English-language AI model on Japanese

Increasingly, to cut down on both training time and data collection, natural language processing researchers are turning to cross-lingual transfer learning, a technique which entails training an AI system in one language before retraining it in another. For instance, scientists at Amazon’s Alexa division recently employed it to adopt an English language model to German. […]

Read More

ProBeat: Has Google’s word error rate progress stalled?

Back in 2013, Google’s speech recognition technology had a 23% word error rate. At I/O 2015, the company shared it had dropped to an 8% word error rate. At I/O 2017, it had fallen to a 4.9% word error rate, as you can see above. Put another way, Google transcribes every 25th word incorrectly. Deep […]

Read More

Google’s SpecAugment achieves state-of-the-art speech recognition without a language model

Google AI researchers are applying computer vision to sound wave visuals to achieve state-of-the-art speech recognition system performance without the use of a language model. Researchers say the SpecAugment method requires no additional data and can be used without adaption of underlying language models. “An unexpected outcome of our research was that models trained with […]

Read More

Amazon’s AI system could cut Alexa speech recognition errors by 15%

A few months back, Amazon detailed some of the underlying systems that prevent Alexa from responding when someone says the wake word “Alexa” on TV, in an internet ad, or on the radio. But how does Amazon’s voice assistant filter out everyday background noise? A blog post and accompanying research paper (“End-to-End Anchored Speech Recognition“) […]

Read More

Amazon Alexa scientists find ways to improve speech and sound recognition

How do assistants like Alexa discern sound? The answer lies in two Amazon research papers scheduled to be presented at this year’s International Conference on Acoustics, Speech, and Signal Processing in Aachen, Germany. Ming Sun, a senior speech scientist in the Alexa Speech group, detailed them this morning in a blog post. “We develop[ed] a […]

Read More

Alexa researchers develop 2-mic speech recognition system that beats a 7-mic array

It’s well-established fact that two mics are better than one when it comes to speech recognition. It’s an intuitive idea: sound waves reach multiple microphones with different time delays, and these delays can be used to boost the strength of a signal coming from a certain direction while diminishing those from other directions. Historically, however, […]

Read More

Gboard on Pixel phones now uses an on-device neural network for speech recognition

On-device machine learning algorithms afford plenty of advantages, namely low latency and availability —  because processing is performed locally as opposed to remotely on a server, connectivity has no bearing on performance. Google sees the wisdom in this: it today announced that Gboard, its cross-platform virtual keyboard app, now uses an end-to-end recognizer to power […]

Read More

Alexa researchers improve AI error rate up to 30% by reducing data imbalance

Imbalanced training data is a major hurdle for classifiers — that is, machine learning systems which sort inputs into classes. (Think object-detecting security cameras and smart speakers that distinguish among speakers.) When one category of samples disproportionately contributes to a corpus, the classifier naturally encounters it more often than others, and so runs the risk […]

Read More

Microsoft’s Azure Kinect is a $399 dev kit for computer vision and speech solutions

HoloLens 2 wasn’t the only product Microsoft announced during its Sunday afternoon press event ahead of Mobile World Congress in Barcelona. It today took the wraps off of Azure Kinect Developer Kit, an all-in-one perception system for computer vision and speech solutions that appears to be a more polished version of Project Kinect for Azure, […]

Read More

Google AI technique reduces speech recognition errors by 29%

Speech recognition is pretty darn good these days. State-of-the-art models like EdgeSpeechNet, which was detailed in a research paper late last year, are capable of achieving about 97 percent accuracy. But even the best systems sometimes stumble on uncommon and rare words. To narrow the gap, scientists at Google and the University of California propose […]

Read More

Researchers improve robots’ speech recognition by modeling human auditory processing

We rarely think too much about noises as we’re listening to them, but there’s an enormous amount of complexity involved in isolating audio from places like crowded city squares and busy department stores. In the lower levels of our auditory pathways, we segregate individual sources from backgrounds, localize them in space, and detect their motion patterns […]

Read More

Google releases dataset to help AI systems spot fake audio recordings

When Google announced the Google News Initiative in March 2018, it pledged to release datasets that would help “advance state-of-the-art research” on fake audio detection — that is, clips generated by AI intended to mislead or fool voice authentication systems. Today, it’s making good on that promise. The Google News team and Google’s AI research […]

Read More

Google researchers use AI to pick out voices in a crowd

Separating a single person’s voice from a noisy crowd is something most people do subconsciously — it’s called the cocktail party effect. Smart speakers like Google Home and Amazon’s Echo typically have a tougher time, but thanks to artificial intelligence (AI), they might one day be able to filter out voices as well as any […]

Read More