“TV power on.” “Set thermostat to 65 degrees.” “Speaker, play classical music.”
Interacting with the devices in a Smart Home using voice is a trend that has the biggest players in the consumer electronics space taking notice. From speakers that hear and respond to commands to intelligent thermostats and home appliances, a completely voice-controlled home is on its way to becoming a reality.
In order to realize the full potential and capabilities of the voice-controlled Smart Home, all of the background noise that happens in the home everyday – children talking, TV or music playing and home appliances running – needs to be eliminated so the smart device can understand the user’s voice in a noisy environment. There are many technical challenges involved in creating such a solution, including a key one that is oft-overlooked: pre-processing of the speech to remove noise and interference before it reaches the speech recognition engine. Although this may seem obvious, for distant speaker (AKA far-field) speech interface systems, it is one of the most significant challenges.
“Far-field” typically refers to a system where the user’s voice is more than half a meter away from the product’s microphones. For example, a smart phone held next to a user’s face would constitute a “near-field” use case, but talking at arm’s length distance to a PC or tablet or from across the room to a TV, stereo system, light switch, thermostat or Smart Home controller all comprise “far-field” use cases.
There are many challenges in far-field products including unknown direction of speech and noise, the requirement for full duplex voice interaction and a large dynamic range of speech and noise. While traditional beamforming technologies may work well in near-field use cases, they do not fare well in far-field Smart Home applications because they require multiple microphones and have fundamental technical limitations like directional constraints on the speech and noise. To achieve a true distant speaker voice processing solution for consumer electronics and Smart Home applications, new and improved technologies are required, including: a noise suppression technology based on Blind Source Separation (BSS), full-duplex Acoustic Echo Cancellation, and high dynamic range analog to digital conversion.
Unlike traditional beamforming voice processing technology that requires the user’s speech and background noise to emanate from different directions – which is extremely impractical in a far-field, reverberant environment – BSS enables excellent speech recognition performance from up to five meters away, at any angle, and with only two microphones, even when the user’s voice and noise are emanating from the same direction. This not only produces better noise suppression performance in typical Smart Home use cases but also helps to contain product costs.
Also key is the ability to effectively suppress content played from the device’s speakers when circling back through air into the device’s microphones (echo), while accurately capturing the user’s voice commands. A high-performance, full-duplex Acoustic Echo Canceller enables users to control devices with their voice when the device’s speaker is playing loudly, as is the case with voice-controlled speakers, Smart TVs, set-top boxes or appliances with prompts.
A voice capture system for any Smart Home device needs to support a large dynamic range of signals, from soft speech at a distance to loud playback of audio content from the device’s speakers. This typical and challenging far-field use case drives the requirement for high SNR microphones and high dynamic range analog to digital converters in the device.
In summary, conventional near-field speech enhancement methods such as beamforming require significant microphone cost, platform-specific tuning and many constraints on microphone location, matching and directionality of the speech and noise, making them impractical for far-field voice-controlled Smart Home applications. It therefore becomes imperative to look at other solutions that can successfully address and resolve these far-field challenges. The robustness of the alternative solutions described here translates directly into better performance and significant cost savings for new voice-controlled Smart Home products.a
Filed Under: Appliance engineering + home automation, M2M (machine to machine)