Saturday, March 11, 2017

Face recognition in our time Vicki Bruce

Face recognition in our time

Vicki Bruce

DOI:10.1093/acprof:oso/9780199228768.003.0005

Abstract and Keywords

This chapter describes developments in the field of face recognition during the past fifty years. It explains that an interest in face recognition was spurred by the increasing number of cases of mistaken identity in which the wrong person was convicted of a crime during the 1970s. In the 1990s, the issue of mistaken identity and face recognition became part of efforts to improve data protection and security. During this period, a number of deficits that affect face processing and interpersonal perception, including prosopagnosia, were recognized. This development helped theorists explore the logic of the neural systems that decipher facial messages.

This chapter is dedicated to the memory of Hadyn Ellis, father of this field.
When I was an undergraduate student in the early 1970s there was much talk and some surprise expressed at the apparently great capacity and accuracy of picture memory, recently rediscovered after years of neglect (e.g. Shepard 1967). Pictures of faces seemed particularly well remembered compared with other categories of similar objects such as houses (Yin 1969). At about the same time, many cases of wrongful conviction in courts—often from mistaken eye-witness identification—were attracting public attention. This seeming paradox—of good memory for pictures of faces and poor memory for faces in everyday life—set the scene for much subsequent work on memory for faces.
When visual memory was rediscovered, it seemed difficult to do good experiments using pictures in contrast to words, where for decades research had established the importance of dimensions such as imageability, frequency, and so forth. The important dimensions of variation in appearance and significance for picture memory were not known, and before the development of image processing tools it was extremely difficult to vary systematically the composition of pictures of natural scenes and real objects. So pictures in general, and faces in particular, seemed challenging from the perspective of cognitive psychologists such as Hadyn Ellis seeking theoretical insight into the processes of remembering such items. For psychophysicists, too, the face seemed a strange distraction. Soon after I started working with Alan Baddeley on my PhD, Fergus Campbell, FRS, a brilliant Scots physiologist who was in Cambridge at that time, looked at me curiously and said, ‘Well Vicki, the face is very interesting, but you can't do a whole PhD on it…’. Fergus, and most other visual scientists, studied how the mammalian visual system responded to grating patterns. More complex natural images such as faces were difficult to control.
(p.50) Fergus was wrong. Although a search in Web of Science reveals only 12 articles with titles mentioning ‘face recognition’ or ‘face perception’ during the period of my PhD research between 1974 and 1977 (one of these was Hadyn Ellis's seminal review of the topic in 1975), this had risen more than 80-fold to over 850 in the years from 2003 to 2006. In my research lifetime, the study of human faces and how we perceive and recognize them has become an extremely hot topic. In this chapter I try to describe why this happened, and a little about what we have learned and what remains to be discovered.

Why did faces become a hot topic?

In 1976, Lord Devlin reported on a number of cases of mistaken identity in which the wrong person had been convicted of a crime. It became clear that eye-witnesses, who might be very sincere and credible people, were quite often wrong when they selected suspects from line-ups or sets of photographs. The case of Laszlo Virag was striking. Apprehended as the man responsible for armed robberies in Liverpool and Bristol, several independent witnesses picked him from photo-spreads or live line-ups. A police witness testifying in court said, ‘His face is imprinted on my brain’. But later, another man, George Payen, who bore some resemblance to Virag, confessed to these along with other crimes. So one impetus for research into face recognition was the need to find ways to ensure that eye-witness testimony could help more reliably with the apprehension of criminals. I will return to this later. In the years since the Devlin report many other application areas have influenced the kind of scientific questions people pose about faces.
The issue of mistaken identity is part of a broader theme to do with ‘security’. Computer scientists competed to develop programs that can recognize faces as well as (or preferably better than) humans can. In theory such programs could allow automatic screening for identity at work, at cash machines, at airports. In the late 1990s an international competition—the FERET evaluation (Face Recognition Technology)—was held to compare different systems developed at that time, For example, Phillips and co-workers (1998) examined how well an automatic or semi-automatic face recognition algorithm could find matches in a large gallery of 1196 images of faces when probed with variant images of the same people. When the variants were photographs taken on the same day and with the same camera (but some difference in expression), matching was only 80% accurate with the best of the systems. When probes were of the same people taken under different lighting conditions or at different times, performance of all systems was extremely poor. Whatever the popular press would have us believe, we are still far from (p.51) having camera systems at airports that can scan for known terrorists, but the challenge of doing so remains a strong driver behind some of the research.
Another growth area over the 30 years or so since I finished my PhD has been in the area I will loosely term ‘cosmetics’. Facial appearance is big business and the effects on appearance of a change in hairstyle, lipstick, or even shape of nose can now be modelled in computer graphics and shown to clients before they submit to superficial or more radical treatments. Some of the most interesting and challenging work I was involved with was in the 1980s when my own group in Nottingham worked in collaboration with Alf Linney's group at University College London. Alf is a medical physicist who had developed a laser scanning device to measure the three-dimensional (3D) shape of the face surface, so that such shapes could be pulled over 3D models of the skull to predict the effects on facial appearance of surgical operation. Alf's 3D scanning and visualization software—quite novel at that time—allowed us to try to investigate faces as 3D surfaces rather than as ‘flat’ patterns, and we made some modest progress in understanding aspects of face perception in this way (Bruce et al. 1993a,b; Bruce and Langton 1994; Burton et al 1993; Hill et al. 1995).
The development of good ways to display 3D surfaces of faces was also relevant to the hugely important entertainments and communications industries. It is now possible to generate reasonably convincing human-like moving heads (avatars) which are used in video games or even automated customer service applications. There is still room for considerable improvement in the realism of such images, and it is particularly challenging to create detailed realistic speech-related movements (for a review, see Bailly et al. 2003). In terms of theory, the widespread uptake in recent years of mobile telephony, video, and even virtual conferencing leads us to ask new questions about nonverbal communication. In what ways is face-to-face communication different from telephone communication? Is a video-phone a good or a poor substitute for ‘real’ face-to-face communication?
Face recognition research did not develop just because of applications interest; it was theoretically timely too. In the early 1970s vision scientists had been investigating how human and animal brains perceived simple grating patterns, so developing our understanding of early cortical processing in vision. But discoveries by a number of scientists in the USA and UK of cells in monkey cortex apparently tuned to natural images such as monkey paws (Gross et al. 1972) and human and monkey faces (Gross et al. 1972; Perrett et al. 19821984) were intriguing.
Rare but fascinating neuropsychological impairments in brain-injured humans were also important. ‘Prosopagnosic’ patients (Bodamer 1947(p.52) apparently lost their capacity to recognize individual faces, while still recognizing person identities by different routes such as voice or name recognition. Such patients could fail to recognize their friends or family—even their own faces in the mirror may appear unfamiliar (De Renzi 1986; Young 1998). Although prosopagnosic patients usually have other problems too, the apparent specificity of face processing problems, coupled with the observation of selective responses to faces in monkey cortex, led to the idea that face processing relies on dedicated neural machinery.
Prosopagnosia is just one of a number of deficits that affect face processing and interpersonal perception, and that have helped theorists explore the logic of the neural systems that decipher facial messages. For example, recognition of the identity of a face and recognition of facial expressions often dissociate, suggesting logical and/or physical independence of the systems used for person recognition and emotion interpretation (Young et al. 1993). Campbell and colleagues (1986) described a fascinating example of a double dissociation between expression recognition and lip-reading that led to further theoretical differentiation. Using the ‘box and arrow’ kind of model imported to cognitive psychology from computer science in the 1960s and 1970s, fairly elaborate models of face perception and recognition were first developed on paper in the 1980s (e.g. Bruce and Young 1986) and then further refined and implemented with connectionist modelling techniques in the 1990s (e.g. Burton et al. 19901999). But computer power influenced the field in other ways too, as we shall see.

New methodologies for investigating face perception

Just as the area of face recognition was getting increasingly interesting theoretically, so new methodologies allowed us systematically to vary facial appearance.
When I was working on my PhD, doing experiments with faces was rather tedious. I was investigating the process of visual search for specific target faces. In one set of studies I asked the photographer to copy lots of faces from books into inch-square prints. I mounted these into arrays five faces deep by three wide, had these results rephotographed on to cards, covered these with transparent film, and presented them manually in a tachistoscope, one at a time (Bruce 1979). To manipulate the appearance of an individual face in any way generally involved taking scissors to it. I'm sure that's what Peter Thompson was doing when he accidentally discovered the Thatcher illusion (Thompson 1980), though quite why an academic should be taking scissors to a picture of Margaret Thatcher's face in the late 1970s of course remains mysterious.
                   Face recognition in our time
Figure 4.1 This face looks normal until you turn the page upside down. Image created by Peter Hancock, University of Stirling, of himself.
The Thatcher illusion is an extremely neat way of demonstrating our inability to see the configuration of a face when it is turned upside down. It doesn't have to be Thatcher's face, of course. Figure 4.1 illustrates using the face of my colleague, Peter Hancock. Printed as it is, upside down, his face looks quite normal. Turn the book around and you'll see what he's allowed the program Adobe Photoshop to do to him. When the face is upside down, individual features such as eyes or mouth are processed rather independently, so the mismatch in their orientation does not become apparent. This kind of manipulation can be done more easily in Photoshop, but Thompson's work with scissors made the same point.
For my visual search experiments I wanted also to be able to merge different facial identities. Some kinds of merging were easy—using scissors again. That is how Andy Young and colleagues discovered the face composite effect. Young, Hellawell, and Hay (1987) found that if the top half of one celebrity's face was paired up with the bottom half of another, as in Figure 4.2, it was extremely difficult to identify to whom each half-face belonged, unless the two halves were misaligned, when the individual identities became more readily identifiable. This, like the Thatcher illusion, again shows the critical importance of the relationship between individual features in an upright (p.54)
                   Face recognition in our time
Figure 4.2 It is difficult to identify the top or bottom half of this face. Cover up one half and you'll find the other easier to recognize. Image created by Peter Hancock, University of Stirling.
face—Tony Blair's eyes just don't look like Tony Blair when paired with George (W) Bush's chin.
Other things were less easy to do without computer graphics and had to wait till the 1980s. Careful manipulations of the effects of displacing different face features on facial appearance could be done properly only with photographs of faces using computers (Haig 19841986; Hosie et al.1988). But other kinds of novel manipulations involving merging faces became possible with computer graphics. About 20 years ago, Dave Perrett and his colleagues at the University of St Andrews first developed means of producing photographic quality caricatures of faces. This involved developing some of the first software for ‘morphing’ faces (e.g. Benson and Perrett 1991). Such techniques allow different images of faces to be blended without blurring. A set of control points are marked on each face image (such as the corners of the mouth, the tip of the nose, etc.) and the spatial locations of these points can be averaged. Points are joined in triangles and the colour/texture in each triangle can be averaged (see http://perception.st-and.ac.uk/Software/research.htm).
It is possible to average together many different male faces and many different female faces to create the average male and average female. Caricaturing (p.55) techniques then make it possible for an individual face to be made more feminine or masculine in appearance. Similar techniques can be used to age faces—to predict what an individual might look like 10 or 20 years later (Burt and Perrett 1995). Although some of these applications are frivolous, such techniques also have practical application, for example to manipulate an image of a person missing for several years as they might appear today.
The capacity to blend different faces has also been used theoretically to investigate whether expressions and identities are seen ‘categorically’. It is possible to produce an ordered series of images that vary in terms of how much of each face is present in the blend. For example, we can create faces with a 50% ‘happy’ expression and a 50% ‘sad’ expression, or a face that is 80% Paul Newman and 20% Robert Redford. Morphed continua of variations can be produced in this way, for example stepping in 10% increments from a face that is 100% happy to one that is 100% sad. By examining how detectable differences between images on such series are, and how readily variants can be categorized, evidence has been found that emotional expressions (Young et al. 1997) and familiar identities (Beale and Keil 1995) are seen categorically. That is, it is easier to categorize images at the extremes than nearer the centre of the continua, and harder to discriminate between small variants at the extremes than near the middle of these continua.
A practical application of morphing in the context of eye-witness testimony was published by my own group quite recently (Bruce et al. 2002). We have been working on ways to help witnesses to produce better face ‘composites’— remembered images of faces seen at crimes. Even with today's very realistic electronic composite systems such as E-Fit and Pro-fit, it is extremely rare that a recognizable composite can be produced from memory (Frowd et al. 2005a,b).
We reasoned that, although witnesses produce composites that are error prone, there is no reason to suppose that the errors produced by different witnesses working independently would be correlated. Thus, combining such memories should reinforce the correct aspects of the face and minimize the influence of the errors. So we hypothesized that where two or more witnesses had seen the same crime, and could each produce a composite image working independently, morphing their composites together should produce a better likeness and be more likely to trigger recognition. Laboratory experiments confirmed this prediction (Bruce et al. 2002) and this has led to a modification in the acceptable ways in which composites can be used by police investigators in the UK.
The experience of translating a laboratory experimental result to real-world practice was also very instructive. It is easy to forget within the synthetic and (p.56) somewhat sterile laboratory context the complexity of the situation within which a real eyewitness may encounter a criminal. For example, when my colleagues and I first urged those drafting the guidelines for police to allow more than one witness composite to be collected and combined, the question was asked, ‘How do you know each witness is describing the same face?’. In the laboratory, of course, you know—there is only a single target the ‘mock’ witness could be describing, so the issue had not occurred to us as a problem. The resulting guidelines are clear in allowing composite combination only where there is good reason to suppose that the witnesses are describing the same person.
Perhaps even more significant than the developments in computer graphics have been those enabled more recently by human brain imaging. In addition to the earlier neuropsychological and neurophysiological findings, there is now considerable evidence that there is an area in the human temporal cortex that responds particularly vigorously and selectively to faces. The properties and significance of the ‘fusiform face area’ (Kanwisher et al. 1997) currently remain hotly debated. Some suggest that the area is also involved in making fine (‘expert’) discriminations within other categories whose members share overall similar appearance, such as bird shapes or dog shapes when discriminated by ornithologists or dog judges (Gauthier et al. 19992000; McKone et al. 2007). Others emphasize that many other areas of the brain are involved in face processing too, and so the fusiform face area can be only a component in a much more extended system for deriving meaning from faces (e.g. Gobbini and Haxby 2007; Haxby et al. 2000).
The complexity of the neurological underpinnings of face recognition and perception should not be surprising. More than 20 years ago, Andy Young and I (Bruce and Young 1986) described how any adequate theory of face processing would have to articulate properly what is meant by ‘face recognition’ (the face, or the picture?) and the relationships between the derivation of the many different kinds of meaning from faces. If we want to ask about whether face processing involves ‘special’ neural underpinnings we need to ask what we mean by ‘special’ (Young 1998), and also to be clear about what aspects of face processing we are trying to describe. In this respect, the work of Jim Haxby and his collaborators stands out as exemplary.

What have we learned?

Thirty years after the Devlin report, I think we understand more about the reasons why witnesses find it hard to recall and recognize faces, and we have found some ways to help them do it a little (though not much) better. We also (p.57) have cameras in almost all our streets and other public places recording the actions and appearances of passers-by. These camera images also provide new opportunities for mistaken identification, however. Bruce and co-workers (1999) showed that participants were surprisingly bad at deciding whether a high-quality video image matched one of the people shown in an array of photographs below it. Different images of the same person can look very different, and images of different people can look very similar. Apparent resemblance between a camera image and a suspect should not be used to prove identity, just as the memorial resemblance between Laszlo Virag and George Payen should not have been used to signal identity either.
So recent work on the matching of CCTV images really confirms the paradox that stimulated my entry to the field some 30 years ago: remembering or matching individual pictures is very good, but recognizing ‘real’ faces in the world can be very difficult. Recent theoretical and cognitive neuroscientific work suggests that robust representations of familiar faces must be built over time from flimsy image-specific descriptions of initially glimpsed items. Computer modellers are showing how this can be done (Burton et al. 2005). Because our initial internal representations of faces are so tied to specific image properties, we are bad at matching across different images, even if there is no memory load.
The past 30 years have seen us make some progress at understanding some of the reasons why we can be both good at familiar face recognition but poor at dealing with unfamiliar ones. We have also made considerable progress at understanding how meaningful interpretations of personal identities are derived (e.g. Bruce and Young 1986; Burton et al. 1999a).
The next decade will, I think, see us spreading out from the simple interpretation of static face images. I would expect to see much more theoretical work in three areas: dynamic image processing (e.g. Lander and Bruce 20002003; Pilz et al. 2006); the integration of information from faces, bodies, and voices into full ‘person’ recognition; and the interaction between different kinds of emotional, social, and cognitive systems involved in person perception and recognition. On these last two, interrelated, themes there has already been some excellent work in the new field of‘cognitive neuropsychiatry’. This owes much to Hadyn Ellis's research on conditions such as Capgras syndrome (e.g. Ellis and Lewis 2001; Ellis and Young 1990). Capgras patients think their friends and relations have been replaced with imposters or cunningly disguised aliens or robots. They recognize the people correctly, but claim they are not really who they seem to be. Explaining such conditions requires an understanding of the interaction between perceptual-cognitive and affective-autonomic systems in the normally functioning brain, an (p.58) understanding that may also illuminate the much older questions arising from eye-witness testimony. There's still plenty to occupy another generation or two of PhD students.



References

No comments: