The challenge of voice recognition and the need for multiple modalities to the question of authentication

A Good Mimic Can Bypass Voice Recognition Authentication, Research Suggests

The idea of voice many see as one of the more interesting biometric solutions as seen from an ergonomic perspective and something that can readily enhance the call center consumer experience and related security.  The user simply needs to say something into a microphone (telephone) and presto they can be identified or authenticated.    

But is it a safe and secure approach or simply the starting point for the identification and therefore associated with additional authentication processes. 

Personally I am not convinced a voice is a good solution to the challenge of authentication.  Yes, as one element of a multi-factor multimodal approach it is an excellent modality.  But not as the only biometric modality.  My fear emerged from a conversation with a sound engineer.  She told me they could, at the level of a single vowel, splice and change the intonation of a word in a movie sound track.

The above article clearly identifies real world examples of voice biometrics being fooled and concludes by remind us that a multimodal solution is essential. 

Classic Multi-Factor Authentication wants to pair multiple unique and none replicable elements together.

  • Some thing you have
  • Some thing you know
  • Something you are

When I think about multi-factor authentication I wonder what would happen if the object “what you have” can be stolen.  This therefore means the second factor must to assure that only the legitimate user is presenting the object.  If a mime can replicate a voice, after stealing the object, then, this combination of factors can be compromised.

EMV, when implemented as Chip and PIN, matches a unique chip card (what you have) with a PIN (what you know).  Apple Pay is EMV and stores the secrets and executes the cryptographic functions, inside hardware, the Secure Enclave (what you have) and combines this with a sensor to capture the Biometric (what you are).  The electronic passport ICAO use similar chips and carries within it a facial image.  The US PIV & CAC cards uses the same style Chip and are paired it with a fingerprint and sometimes also requires the user to enter their PIN.   

Yet are they truly secure?  We know  Apple X’s, facial recognition, as currently implemented, can be fooled.  We know that Touch ID  was spoofed.  Without liveness testing, most if not all biometrics, will accept a clone or replica of the biometric it employs. 

The challenge is establishing the appropriate benchmarks for the various biometric implementations such that enterprises, governments, merchants and corporations can select and implement a consumer experience that satisfies the needs of security and convenience.

Acronyms like FRR, FAR and PAD become critical to selecting the appropriate implementation of a biometric solution.

  • The False Reject Rate or FRR is all about convenience and not refusing the legitimate user. Perfection is a ratio of 0 in 
  • The False Accept Rate or FAR is all about not approving a transaction or event by an imposter. Perfection is a ratio of 0 in 1
  • The Presentation Attack Detection or PAD is all about addressing the reality that anything can be duplicated; therefore it is essential to make sure the biometric presented in alive and genuine. Perfection is a ratio of 0 in 1.

The challenge is establishing  a balance between the cost and the acceptable FRR, FAR and PAD.

Measuring and establishing the test results of a particular element of a multi-factor solution is not cheap.  EMV, PIV, ICAO software and “Secure enclave” / “Chip Card” / “Secure Element” suppliers spend 100’s of thousands of dollars developing and certifying the functional and security characteristics of the “what you have” element of these solutions.  We know that passwords and PIN can and have been compromised with Phishing attacks and hidden cameras.

When we think about  biometrics there is complexity in the read and match processes.  When the user established their identity and their biometric the reference template is create.  This reference template is then used in the matching process to identify if template resulting from the biometric just presented, is the same.  Unfortunately reality dictates that each presentation of the user’s biometric will generate a unique result.  This unique result will never absolutely match the reference template.  Hence the need to understand and test the sensor and establish its FRR, FAR and PAD.   The more foolproof the match must be, dictates the complexity of the solution and the number of different individual needed during the test process to establish the sensors FRR, FAR and PAD.

Therefore selecting the most appropriate solutions means quantify the risk of the event or transaction and measuring it against the cost and certified characteristics of the authentication mechanisms.

A layered approach that combines two or more factors must also considered including multiple modalities for at least the “what you are modality” is what we must consider.  Using cryptography and hardware to address what you are, Passwords and demographic information to match what you know and layering various elements like location, behavior and some set of biometrics to understand who you are, will offer the highest level of security with the lowest degree of inconvenience.

Bottom Line Multi-Modal & Multi Factor

Authentication of Identification is what we must implement

Always mindful a modality will lose its ability to assure uniqueness

Over time.