Facial Recognition Accuracy: A Worked Example   Recently updated !


Much has been written of late on Facial Recognition Accuracy. Many laud it as highly accurate; others as highly inaccurate. Both of these opinions often will be based on the same set of stats.

There is always room for significant improvement. Also, there needs to be open, comprehensive and inclusive debate on the appropriate use of the technology.

However, often those who claim the technology is wildly inaccurate, citing figures of 50-90% inaccuracy, do so on the basis of incorrectly assessing the statistics.

Significantly, the false positive rate is a measure of the number of false positives relative to the total number of comparisons made, not a measure of the number of false positives relative to the total number of matches made.

Facial Recognition Digital Image

An Example

There are 1,000 people in front of you. 6 dangerous criminals loiter amongst them:

  • You deploy an automated system to ASSIST.
  • The system instantly inspects all 1,000 people and selects 10 for you to manually assess.
  • 5 of the selected 10 people are from the 6 criminals. The other 5 are not criminals.
  • YOU (NOT the system) assess these 10 people to make a determination.
  • There are 994/1,000 non-criminals.  995 are deemed non-criminals. 1 slipped through the net.
  • The system’s false reject rate was 1 / 1,000 (the missed criminal) = 0.1% *
  • The system’s false accept rate was 5 / 1,000 (the non-criminals it selected for you to assess) = 0.5% *
  • The accuracy of the system was NOT 50% (5 genuine matches out of 10 total matches.)
  • You only needed to assess 10 people, not 1,000.
  • You caught 5 of 6 criminals, which you otherwise would not have. 

If the system was 81% inaccurate, as reported, it would have selected over 800 people from the 1,000 for you to manually assess.

Consider if there did not happen to be any criminals in the 1,000. That does not mean the system is 100% inaccurate because there were no positive matches.

So, with respect to facial recognition accuracy, is this an accurate or an inaccurate system?

* The FAR and FRR would actually be lower as there would be multiple people in the watchlist. The total number of comparisons is 1,000 time the number of people in the watchlist.

Same Example, Different Words

Let’s use exactly the same example, but let’s talk about marbles instead of people.

There are 1,000 marbles in a bucket. Amongst them are 6 red marbles. The other 994 are blue. You must find as many red marbles as quickly as you can.

  • You deploy an automated system to ASSIST.
  • The system instantly inspects all 1,000 marbles and selects 10 for you to manually assess.
  • 5 of the selected 10 marbles are red. The other 5 are blue.
  • YOU (NOT the system) assess these 10 marbles to make a determination.
  • In the bucket, 994/1,000 of the marbles were blue marbles.  995 are deemed to be blue. One red marble was missed.
  • The system’s false reject rate was 1 / 1,000 (the missed red marble) = 0.1%.
  • The system’s false accept rate was 5 / 1,000 (the blue marbles it selected for you to assess) = 0.5%.
  • The accuracy of the system was NOT 50% (5 genuine matches out of 10 total matches.)
  • You only needed to assess 10 marbles, not 1,000.
  • You found 5 of 6 red marbles, which you otherwise would have taken you much longer.
  • The 5 blue marbles caught in your sweep you place back in the bucket.
  • There is no way of later distinguishing these blue marbles from the rest of the blue marbles. They are not tracked in any way. 

If the system was 81% inaccurate, it would have selected over 800 marbles from the 1,000 for you to manually assess, not saving you any time at all.

Consider if there did not happen to be any red balls in the bucket. That does not mean the system is 100% inaccurate because no red balls were selected.

Now imagine you have many, many of these buckets of 1,000 marbles, in which you need to find as many red marbles as possible in as little time as possible. Would you use this tool? Or would you inspect them all manually?

The main difference between these two examples is that, in the scenario where you are looking for red marbles, all red marbles are considered equally. With people, you need to actually differentiate between the different criminals. i.e. you have multiple shades of red, one for each distinct criminal you are looking for.

You can read further for a fuller analysis of facial recognition accuracy, within the context of live operational use by the South Wales Police Force in the United Kingdom.

Leave a comment

Your email address will not be published. Required fields are marked *