We are often asked how do voice biometrics work. Voice biometrics is the science of using a person’s voice as a uniquely identifying biological characteristic in order to authenticate them. Also referred to as voice verification or speaker recognition, voice biometrics enables fast, frictionless and highly secure access for a range of use cases from call center, mobile and online applications to chatbots, IoT devices and physical access.
Massive advances in neural networks over the past 2-3 years have enabled the development of voice biometric algorithms that are faster, more accurate, and can authenticate users with a smaller amount of speech. In fact, ID R&D is now able to exceed the accuracy of a 4-digit PIN in many use cases.
How does voice authentication work? Like other biometric modalities, voice biometrics offer significant security advantages over authentication methods that are based on something you know (like a password or answer to a “secret” question) or something you have (like your mobile phone). Voice biometrics also improves the customer experience by removing frustration associated cumbersome login processes and lost and stolen credentials.
Voice Biometric Advantages
Enhance the customer experience with fast, frictionless authentication
Improve security and minimize breaches due to compromised passwords, phishing, etc.
Reduce threats by identifying known fraudsters
Instantly identify users and personalize the interaction
Free agents from time spent verifying users and resetting passwords
Enable natural login for digital channels, including chatbots and virtual assistants
Use as part of a two-factor authentication process to increase security without adding effort
How do Voice Biometrics work?
There are over 70 body parts– each with a unique size and shape – that contribute to how a person speaks. Voice biometrics relies on the fact that human voice characteristics correlate strongly to the physiological qualities of how a person creates speech. Unlike other methods of authentication, voice biometrics does not rely on a secret such as the person remembering a passphrase. It isn’t what the person says that is being authenticated, it’s who is speaking.
More than 70 body parts contribute to how a person produces speech and each of those parts is unique to them. Voice biometric systems work by extracting the characteristics that distinguish a person’s speech from other people. The result is a “voiceprint” analogous to a fingerprint. A voiceprint is also called a “voice template.”
Voice recognition systems enroll a known person by creating an initial template, often merging several templates from samples of that person’s speech for higher accuracy. The initial template is called the enrollment template or enrollment voiceprint.
To verify an enrolled person’s identity, the biometric voice recognition system captures a new speech sample, creates a template from the sample, and compares it against the enrollment template. A strong match between templates indicates that the same person spoke both samples, thus verifying the person’s identity. This manner of using voice recognition is called Speaker Verification. It is a one-to-one match between the enrollment template and someone claiming to be the enrolled person.
Another way to use voice recognition is to compare a voice sample from an unknown identity against multiple enrollment templates. The goal is to find the person within the set of enrollment templates. This manner of using voice biometrics is called Speaker Identification. There are significant limits to accuracy for Speaker Identification, so businesses should consult with an expert to understand if a one-to-many use case with voice will be practical.
The use of voice biometrics for authentication is increasing in popularity due to improvements in accuracy, fueled largely by advances in AI, and heightened customer expectations for easy and fast access to information. Frequent password-associated data breaches are another reason for broader adoption as companies look for ways to better protect customer data.
When it comes to accuracy, it’s not just about keeping the wrong person out. Companies also have to minimize “false rejects” that cause headaches for existing customers and agents. “Equal Error Rate” (EER) s is the point where the number of false accepts and false rejects is equal. Of course the goal is to make both of these error types extremely small, ideally not allowing any impostors through with only a negligible number of valid people getting rejected.
Types of Voice Authentication
Voice authentication can be accomplished using text-dependent speaker recognition or text-independent voice recognition biometrics.
Text dependent voice verification is where a person speaks a specific passphrase, usually consisting of two to three words, like “My voice is my passphrase.” Learn More
Text independent voice verification is a passive voice biometric approach whereby the user can say anything, enabling authentication to quickly happen in the background during their normal interaction with an agent, IVR, or application. Learn More
IDVoice by ID R&D is a robust AI-driven biometric voice recognition engine that provides both text dependent and text independent voice verification for mobile, web and telephone channels, as well as physical access and IoT device integration. The product is built on an innovative Convolutional Neural Network and advanced modified x-vector approach for feature extraction technology for unmatched accuracy and is ranked #1 in the industry’s leading benchmark challenge.
While voice biometrics offers a secure way to authenticate users, it is not immune to threats. Advances in machine learning, recording technology and synthetic speech are enabling high quality voice spoofing, or voice “deepfakes” that are capable of tricking humans and voice biometrics systems into thinking they are hearing a real person. These attacks can be used to gain unauthorized access to accounts.
Combatting voice spoofing requires liveness detection technology, capable of distinguishing between a live voice and a recorded, synthetic or computer generated version of the voice. You can learn more about voice anti-spoofing here.
Want to learn more?
Unlike other solutions, ID R&D’s core voice authentication technology works in any language without retraining, works across channels with a calibration setting, and is designed from the beginning to be noise-tolerant. Ready to learn more about our voice authentication solutions?
A strong voice recognition biometrics algorithm will continue to work as expected and with high accuracy even if the user has a cold. If the voice sounds entirely different, then by design, the person would not be recognized as a match with the enrollee. In extreme cases like this, the user may be required to fall back to another biometric modality or means of authentication.
These are the same types of limitations that affect other biometric modalities. Some percentage of the population are unable to use fingerprint devices, for example, due to callus-buildup from manual labor.
It’s important to use a product that supports model enrichment, where new voice templates are merged into previous templates in order to adapt to changes in a user’s voice over time although this change is slow, usually requiring several years for material change. This allows continued verification of a voice that changes over time due to disease or age.
No. Knowing what is said is the domain of speech recognition. Voice biometrics recognize the unique characteristics of the user’s speech.
Environmental issues may have an impact on voice authentication. For example, voice can be less reliable if the speaker’s voice level is not high enough compared to the background signal. In a loud environment, the person needs to speak louder than the background noise, but that will have a limit. Likewise, face biometrics may not work if lighting is inadequate. Rare performance issues due to environmental factors are addressed by using a multimodal biometric approach to offer a fallback means of authentication.
Yes. As such, voice biometric systems should be deployed with voice anti-spoofing, also known as “liveness detection,” to prevent bad actors from impersonating a real user. Methods that fraudsters use include voice recordings, computer altered voice, and synthetic or “deepfake” voice. Voice anti-spoofing products can determine if a voice is live.
Voice biometrics can work in the telephone channel or a microphone channel for deployment across a wide range of use cases ranging from the contact center to mobile application and messenger apps, to smart home devices. A modern voice biometric system should offer businesses the ability for a user to enroll in one channel and subsequently authenticate on any channel that supports voice without re-enrolling.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.