A closer look at a Vice.com account of a synthetic spoof
An article called “How I Broke Into a Bank Account with AI-Generated Voice” was posted on Vice.com recently, showing how the author was able to spoof a bank’s voice verification system with a synthetic voice. The article certainly highlights the need for liveness detection in voice verification systems, but there’s much more to the story.
The article ignores the fact that, at least for call centers, the alternatives for authentication are limited by the telephony medium. ID R&D’s research team has spoofed the same system, and we’re well aware of the limitations of voice biometrics over a phone call. But a bank must consider important context when deciding how and when to use voice verification. What is the motivation for attack and the impact of a breach? What are customers’ tolerance for inconvenience? What are the alternative security measures?
We’ve all been subjected to various forms of user authentication during a phone call with a service provider. It’s most commonly some variant of knowledge-based authentication (KBA), and it’s rarely what you’d call a positive user experience. Plus, KBA comes with the responsibility of PII access for the bank. Fraudsters have access to this information and can use it if given the opportunity, and should we really be asking consumers to share personal information during a phone call? These are precisely the habits we’re trying to break.
Compared to alternatives like KBA, voice verification is actually a very effective security measure. Other authentication tools are now available within the call center channel that banks are using in conjunction with voice to increase security while maintaining a low-effort user experience. We can also point to an ideal way to step-up security on demand when needed.
How “good” is voice verification?
Firms wanting to leverage biometrics for security are often misled by vendor claims that voice is on par with fingerprint or face biometrics in terms of security, and this misinformation gets passed on to consumers. We hear these claims as well and try to educate banks about the true level of security that should be expected with voice. ID R&D has decades of voice experience (and the world’s top-performing algorithms as demonstrated in competitions like SASV, VoxCeleb, and SdSV), so we are aware of what’s possible and what are its limitations.
Two factors should be considered to fully evaluate the security of a biometric: 1) false accept rate and 2) liveness and spoof detection performance, explained here:
False accept rate (FAR). The false accept rate (FAR) is the possibility that an impostor can biometrically match to the genuine reference sample. For fingerprint and face biometrics, the FAR can be in the range of 1 in 1 to 10 million. For voice verification over a telephone call, the FAR is in the range of 1 in 500. For voice in higher quality audio channels, such as can be achieved using mobile apps, we see FAR in the range of 1 in 50,000; five times better than a 4-digit PIN code. Clearly, voice biometrics over a phone call has limitations in comparison to fingerprints.
Note that it’s easy to claim an FAR of a biometric system of 0% by simply rejecting every genuine attempt. Security must be balanced with convenience for legitimate users. So to be clear, the FAR rates quoted here are for false reject rates (FRR)–the rate at which genuine users are rejected–in the range of 5-10%.
Spoof detection. The second factor determining biometric security is spoof detection (or liveness detection); the possibility that a synthetic voice or audio recording can be used to impersonate a live human being. Its performance is measured by its attack presentation classification error rate, or APCER. A high APCER means that a fraudster can easily defeat the security measure.
The FAR and APCER performance achievable in call center applications is limited by the bandwidth constraints of the telephone channel. Today’s digital systems still provide just enough bandwidth for intelligible speech, and filter out high frequencies that would be useful to more accurately detect attacks.
Why use voice verification for call centers and IVRs?
Voice is a valuable security measure when considered in the context of a) what is the risk and cost of a breach, and b) what other options are out there? In an IVR, a customer has access to balance information, the ability to transfer money within the accounts, and pay bills. There is little incentive for a bad actor to break into the IVR.
Furthermore, even with a 1/500 FAR, the effort needed to defeat the system is not trivial. It’s difficult to get five minutes of good audio from a target person. Thus, the cost/reward ratio for the bad actor is high. As is typically the case with biometrics, it’s key to align the level of convenience with the security risk.
What is the alternative?
Voice verification reduces the effort to authenticate by replacing the only alternative banks have today in the call center: KBA. Voice verification also reduces costs for the bank by reducing agent handle time, making it better for the customer and better for the bank. Voice verification is easier for the customer and saves money for the bank.
Is voice verification more secure than KBA? Yes; even with its limitations, it is more difficult to defeat voice verification than KBA. It is well-known that bad actors are easily beating KBA using private information that is openly available on the dark web thanks to years of breaches. So while voice verification is not as strong as fingerprint, it is better–and much more convenient–than the KBA alternative. In fact, some fraudsters know to try to bypass voice verification by claiming “dogs are barking in the background” in order to force the agent to switch to KBA. And fingerprints are obviously of little use on a phone call.
High-risk transactions require a different approach
Banks generally understand that voice verification and KBA have limitations. When a customer requests a higher-risk transaction, like wiring funds, then other identity checks are added to the overall authentication process.
The risk of more serious fraud does increase in the case where a bad actor is authenticated in the IVR, and is then transferred to an agent as a verified person. This is a scenario where banks must ensure that the level of security matches the level of access; that there is a chain of trust that ensures the level of access does not increase without an increase in security applied.
The relatively weak security and difficulty of the fraud prevention challenge in the call center will likely drive more innovation to create better experiences with stronger security. Some examples:
One-time passwords (OTP) provide an additional security layer with moderate inconvenience to the customer (and some new vulnerabilities).
Passive voice verification. Rather than ask for a passphrase, simply perform biometric authentication using audio collected while the customer talks to an agent naturally. Much more voice is used for this process, increasing accuracy, and it is difficult for a bad actor to impersonate a victim while in a full conversation.
“Out-of-band” strong authentication. In this case, the call center pushes a text message to the caller’s device, which when clicked initiates a facial recognition process within a browser or mobile app. In this process, the device is verified to be in possession of the customer, and the customer proves their identity with a biometric face match.
Note that using the device’s biometrics only proves possession of the device and is not a true biometric factor. For a true second factor, you must capture the face in the mobile app with anti-spoofing detection and compare it against the face template stored on file at the bank.
Conclusion
Although voice biometrics are not as accurate as fingerprint or face biometrics, this is not a useful comparison; biometrics are all very different and have advantages and disadvantages for any use case. The limitation in the call center use case is imposed by the telephony medium, not biometric technology. A fair comparison is KBA; voice verification is more convenient and more secure, providing an effective barrier to bad actors while making authentication faster and easier for legitimate customers.
Strong authentication is also an option when higher security is needed. Pushing an authentication request to a device and then performing facial recognition within the mobile app represents a true two-factor solution that requires relatively little effort from the customer and is arguably unmatched by other authentication methods in terms of security.
A closer look at a Vice.com account of a synthetic spoof
An article called “How I Broke Into a Bank Account with AI-Generated Voice” was posted on Vice.com recently, showing how the author was able to spoof a bank’s voice verification system with a synthetic voice. The article certainly highlights the need for liveness detection in voice verification systems, but there’s much more to the story.
The article ignores the fact that, at least for call centers, the alternatives for authentication are limited by the telephony medium. ID R&D’s research team has spoofed the same system, and we’re well aware of the limitations of voice biometrics over a phone call. But a bank must consider important context when deciding how and when to use voice verification. What is the motivation for attack and the impact of a breach? What are customers’ tolerance for inconvenience? What are the alternative security measures?
We’ve all been subjected to various forms of user authentication during a phone call with a service provider. It’s most commonly some variant of knowledge-based authentication (KBA), and it’s rarely what you’d call a positive user experience. Plus, KBA comes with the responsibility of PII access for the bank. Fraudsters have access to this information and can use it if given the opportunity, and should we really be asking consumers to share personal information during a phone call? These are precisely the habits we’re trying to break.
Compared to alternatives like KBA, voice verification is actually a very effective security measure. Other authentication tools are now available within the call center channel that banks are using in conjunction with voice to increase security while maintaining a low-effort user experience. We can also point to an ideal way to step-up security on demand when needed.
How “good” is voice verification?
Firms wanting to leverage biometrics for security are often misled by vendor claims that voice is on par with fingerprint or face biometrics in terms of security, and this misinformation gets passed on to consumers. We hear these claims as well and try to educate banks about the true level of security that should be expected with voice. ID R&D has decades of voice experience (and the world’s top-performing algorithms as demonstrated in competitions like SASV, VoxCeleb, and SdSV), so we are aware of what’s possible and what are its limitations.
Two factors should be considered to fully evaluate the security of a biometric: 1) false accept rate and 2) liveness and spoof detection performance, explained here:
False accept rate (FAR). The false accept rate (FAR) is the possibility that an impostor can biometrically match to the genuine reference sample. For fingerprint and face biometrics, the FAR can be in the range of 1 in 1 to 10 million. For voice verification over a telephone call, the FAR is in the range of 1 in 500. For voice in higher quality audio channels, such as can be achieved using mobile apps, we see FAR in the range of 1 in 50,000; five times better than a 4-digit PIN code. Clearly, voice biometrics over a phone call has limitations in comparison to fingerprints.
Note that it’s easy to claim an FAR of a biometric system of 0% by simply rejecting every genuine attempt. Security must be balanced with convenience for legitimate users. So to be clear, the FAR rates quoted here are for false reject rates (FRR)–the rate at which genuine users are rejected–in the range of 5-10%.
Spoof detection. The second factor determining biometric security is spoof detection (or liveness detection); the possibility that a synthetic voice or audio recording can be used to impersonate a live human being. Its performance is measured by its attack presentation classification error rate, or APCER. A high APCER means that a fraudster can easily defeat the security measure.
The FAR and APCER performance achievable in call center applications is limited by the bandwidth constraints of the telephone channel. Today’s digital systems still provide just enough bandwidth for intelligible speech, and filter out high frequencies that would be useful to more accurately detect attacks.
Why use voice verification for call centers and IVRs?
Voice is a valuable security measure when considered in the context of a) what is the risk and cost of a breach, and b) what other options are out there? In an IVR, a customer has access to balance information, the ability to transfer money within the accounts, and pay bills. There is little incentive for a bad actor to break into the IVR.
Furthermore, even with a 1/500 FAR, the effort needed to defeat the system is not trivial. It’s difficult to get five minutes of good audio from a target person. Thus, the cost/reward ratio for the bad actor is high. As is typically the case with biometrics, it’s key to align the level of convenience with the security risk.
What is the alternative?
Voice verification reduces the effort to authenticate by replacing the only alternative banks have today in the call center: KBA. Voice verification also reduces costs for the bank by reducing agent handle time, making it better for the customer and better for the bank. Voice verification is easier for the customer and saves money for the bank.
Is voice verification more secure than KBA? Yes; even with its limitations, it is more difficult to defeat voice verification than KBA. It is well-known that bad actors are easily beating KBA using private information that is openly available on the dark web thanks to years of breaches. So while voice verification is not as strong as fingerprint, it is better–and much more convenient–than the KBA alternative. In fact, some fraudsters know to try to bypass voice verification by claiming “dogs are barking in the background” in order to force the agent to switch to KBA. And fingerprints are obviously of little use on a phone call.
High-risk transactions require a different approach
Banks generally understand that voice verification and KBA have limitations. When a customer requests a higher-risk transaction, like wiring funds, then other identity checks are added to the overall authentication process.
The risk of more serious fraud does increase in the case where a bad actor is authenticated in the IVR, and is then transferred to an agent as a verified person. This is a scenario where banks must ensure that the level of security matches the level of access; that there is a chain of trust that ensures the level of access does not increase without an increase in security applied.
The relatively weak security and difficulty of the fraud prevention challenge in the call center will likely drive more innovation to create better experiences with stronger security. Some examples:
One-time passwords (OTP) provide an additional security layer with moderate inconvenience to the customer (and some new vulnerabilities).
Passive voice verification. Rather than ask for a passphrase, simply perform biometric authentication using audio collected while the customer talks to an agent naturally. Much more voice is used for this process, increasing accuracy, and it is difficult for a bad actor to impersonate a victim while in a full conversation.
“Out-of-band” strong authentication. In this case, the call center pushes a text message to the caller’s device, which when clicked initiates a facial recognition process within a browser or mobile app. In this process, the device is verified to be in possession of the customer, and the customer proves their identity with a biometric face match.
Note that using the device’s biometrics only proves possession of the device and is not a true biometric factor. For a true second factor, you must capture the face in the mobile app with anti-spoofing detection and compare it against the face template stored on file at the bank.
Conclusion
Although voice biometrics are not as accurate as fingerprint or face biometrics, this is not a useful comparison; biometrics are all very different and have advantages and disadvantages for any use case. The limitation in the call center use case is imposed by the telephony medium, not biometric technology. A fair comparison is KBA; voice verification is more convenient and more secure, providing an effective barrier to bad actors while making authentication faster and easier for legitimate customers.
Strong authentication is also an option when higher security is needed. Pushing an authentication request to a device and then performing facial recognition within the mobile app represents a true two-factor solution that requires relatively little effort from the customer and is arguably unmatched by other authentication methods in terms of security.