The Face Recognition Problem
Is a single image database sufficient for face recognition? A Psychological Experiment
Final project by
Liron Michaely Alon Grubshtein
Introduction
In recent years facial recognition has become a very popular area of research in computer vision. Its uses range from
home application to criminal apprehension. Due to the nature of the face recognition problem, this field involves not
only computer vision researchers but also neuroscientist, cognition scientists and psychologists. As such, progress in
the field of computer vision may require deep physiological understanding, and in turn may advance certain theories in
the field of neuroscience.
A general statement of the face recognition problem can be formulated as follows: Given an image of a scene, identify
or verify the person in the scene using a stored database of faces.
Since today's solutions are based on comparisons, efficiency is based on both the time required for computation but
also on the size of the database required. In our experiment we want to examine the minimum size of database required
for successful face recognition. Our examination was based on the best available hardware-software mechanism available
to us – the human visual system.
Our experiment challenged the human visual system to successfully recognize a face using the smallest possible database
– a single frontal image.
Our assumption: The human visual system is capable of successful facial recognition based on a
single image, even under extreme conditions.
Method
The experiment we conducted was based upon an application we wrote, in which subjects were exposed to a single frontal
image of a face (the target), and were tested on their ability to distinguish the target from different other
faces (the distracters) at various angles. Subject’s results were kept together along with their respectable
response time.
We've taken several steps to increase consistency of our experiment, such as: Angle consistency, Control image, image
order and similar prominent features between target and distracters.
Results
We define four types of possible responses:
- A correct “yes” answer is called a “Hit” (H).
- A correct “no” answer is called a “Correct Rejection” (CR).
- A wrong “yes” answer is called a “False Alarm” (FA).
- A wrong “no” answer is called a “Miss” (M).
We analyzed results of 40 different test subjects, of various ages and gender. No test subject has ever seen our taget
or our distracters prior to the experiment.
Subjects acheived an overall of 96% accurate responses.
Of the total wrong responses, 93% were FA mistakes and 7% were M mistakes.
Taking into account the fact that there were 8 times more distracters images and hence a greater chance for
FA mistakes, we normalized our result, and received a total of 66% FA mistakes against only 34% of M mistakes.
Surprisingly, no correlation was found between the angle of the image and the number of FA mistakes.
Average RT was: 1.47838 Seconds.
A correlation was found between the number of mistakes and the RT.
Another surprise is that no correlation between the RT and the angle of image was found.
It should also be noted that no mistakes were made for our control image, and nearly all mistakes were made in 2 specific
images of the same distracter.
Conclusions
The human visual system can recognize - with a high degree of success - a new face, based on a database composed of
a single image.
We've noticed there were many more FA mistakes then M mistakes – even after normalizing. This may point to a certain
feature of the mechanism, i.e. that the HVS is better at accepting the correct stimuli then rejecting the wrong one.
Another possible explanation is a psychological one: these results may be caused by our measure of bias, which is our
preference to a certain decision.
Another interesting conclusion is that if we consider mistakes and response time as indicators to difficult images, then the direction of face does
not necessarily predict the difficulty level of the required analysis.
In an attempt to understand how information is processed in the HVS, we've considered an exisiting working computer
algorithm. The algorithm uses a combination of component based recognition and 3D morphable models. A 3D morphable
model is used to generate 3D face models from three input images. The 3D models are rendered under varying conditions
to build a large set of synthetic images. These images are then used to train a component based face recognition
system. The resulting system achieved 90% accuracy which is quite close to our experiment results.
Does the HVS create a large database of images based upon its single given image? According to our findings of the
non existing relation between the performance level and the angle of the face it seems plausible that this assumption
is correct: Once we have a good 3D model to produce a larger database it no longer matters at which direction we view
it (or we should say it matters less). On the other hand, psychological researches indicate that there are other
factors involved in our mechanism, for example: the HVS produces significantly lower results in the case of upside
down faces. Had the HVS produced a 3D model and compared images derived from it with the upside down target, we
would not have seen such a decrease in performance.
Additional Information
References
[1]. Bernd Heisele, Purdy Ho, Jane Wu, and Tomaso Poggio (2003) “Face recognition: component-based versus global approaches”.
Computer Vision and Image Understanding 91 (2003) 6–21
[2]. Karl Haberlandt.” Cognitive Psychology”, 2nd –Ed, Trinity college.
[3]. Jennifer Huang, Bernd Heisele, and Volker Blanz (2003). “Component-based Face Recognition with 3D Morphable Models”.
[4]. Yin, R.K. (1969) “Looking at upside-down faces”. J. Exp. Psychology 81, 141–145
[5]. Pawan Sinha, Benjamin Balas, Yuri Ostrovsky, Richard Russell. “Face Recognition by Humans: 20 Results all Computer
Vision Researchers Should Know About”.
|