Saturday, March 18, 2017

Visual Perception 1950–2000 John Ross

Visual Perception 1950–2000

John Ross

DOI:10.1093/acprof:oso/9780199228768.003.0019

Abstract and Keywords

This chapter looks at changes and developments in the subject of visual perception. The dominance of behaviourism in psychology in the 1950s started to fade as new ideas emerged from research into visual perception. Some of the most influential works include Claude Shannon's proposition that information could be quantified, Hermann von Helmholtz's doctrine of unconscious inference, and Richard Gregory's proposal that percepts are literally hypotheses about the world.

This chapter tells a tale of the study of visual perception during the second half of the twentieth century as seen by a somewhat peripheral observer, who came to perception late in his scientific career. It ignores or touches only lightly on many topics, including adaptation, attention, the perception of faces, and the mysteries of colour.

Back then

Behaviourism had a strong grip on psychology in 1950. This began to loosen only in about 1956, but the field of perception was spared its most stultifying effects because it is difficult to take the subjective out of perception. By 1956 a great deal of the phenomenology of vision was known (Helmholtz having single-handedly contributed much of it, and the Gestaltists adding a lot more); most of the now standard visual illusions had been discovered, and colour theory, though described as a ‘jungle’ in the 1950 Annual Review of Psychology, had developed to a level daunting to all but its adepts. Wartime work on what came to be called ‘human factors’ in the USA and the ‘human engineering’ in the UK had revealed much about perceptual strengths and weaknesses. It usefulness in war and subsequently in the design of instruments such as radar screens had added even more respectability to the study of perception.
The laws of colour mixture were well established, as was the fact that cones were effectively filters for wavelength, but there was deep controversy about the explanation of the appearance of colour. The trichromatic theory, championed by Helmholtz, explained it by the filtering properties of three types of cone (S, M, and L); the opponent process theory, first proposed by Hering, instead appealed to three opponent processes: black vs white, red vs green, and blue vs yellow, but no physiological basis had yet been proposed for opponent processes. Colour matching was better explained by the trichromatic theory, perceived colour differences better by opponent processes. The opponent process theory, which had languished somewhat, received a boost from work published in the 1950s (Hurvich and Jameson 1957). A true (if possibly not (p.244) final) reconciliation between the two, with a plausible physiological basis for opponent processes, was not achieved until the brink of the twenty-first century, with the discovery that ganglion cells within the retina added and subtracted signals from the cones. For a summary see (Kandel et al. 2000).

New ideas

In the early 1950s it was still possible for vision scientists to believe (as some philosophers still do) that a perceiver had direct contact with the external world, that the brain was ‘idempotent’ (had no neuroanatomical centres specialized for particular tasks) and that the functional neuropsychology of perception required little more work than to invert the upside-down images from the eyes and combine them. J. J. Gibson (1950) proposed that there was a direct ‘pick-up’ of invariants in the flow of images on the retina. There was no structural or functional map of the visual system, no hint of its hierarchical organization, and no understanding of information processing. No information-processing devices (as distinct from information-transmitting devices such as telephones) were known, or could provide analogies for the function of the brain. It was not at all clear what neurones were needed for.
Some new ideas that emerged during, or shortly before, the 1950s shaped the course of research into visual perception over the rest of the century, and made belief in unmediated perception untenable for vision scientists. Preeminent in influence was Claude Shannon's idea that information could be quantified. His theory of information enabled it to be quantified in ‘bits’ on the assumption that signals registered by a receiver reduced uncertainty about the content of a message being sent. (One bit is the reduction in uncertainty reduction when a signal decides the choice of one of two equally probable alternatives.) Within the community of vision scientists, the assumption that information reduced uncertainty resonated with Helmholtz's doctrine of unconscious inference, which carried with it the implication that the perceiver, like the receiver in information theory, has preconceptions about what might possibly be the case and uses visual information to resolve his or her uncertainty. It also paved the way for accepting the possibility of Bayesian inference as a guiding principle for visual analysis. Like Shannon's information theory, the Bayesian framework assumes the existence of a set of prior probabilities that guide perceptual processing. Surprisingly it was not until the middle 1950s that Richard Gregory put the proposition that percepts are literally hypotheses about the world, to be abandoned if they fail further testing. It was even later that Bayesian framework began to be applied formally to perception.
(p.245) Information theory introduced another theme that powerfully shaped subsequent research on visual perception: the theme of coding. Shannon's theory permits not only the measurement of the amount of information that may be transmitted but also the measurement of the efficiency with which transmission is effected. If more than the minimum necessary number of symbols is used the message includes redundancy that can be removed by efficient recoding. It was not long before the vision community learnt to ask how the eye encoded information to send it to the brain, how efficient this encoding was and how the brain decoded the messages it received. As early as 1959 Horace Barlow (1959) suggested that redundancy reduction resulting in sparse coding is an overriding organizing principle for visual perception and he was (is?) still actively and fruitfully pursuing the theme more than forty years later.
Ideas about how machines might process information began to emerge. Work on both sides of the Atlantic led to the development of what we now know as computers and, along with it, what we now know as computer science. Ideas about how the brain might undertake computations began to burgeon; in 1958, John von Neumann published a book comparing computers and brains. But explaining how the visual system might engage in computations was one thing; showing how it could do the computations necessary to locate edges, to identify figures and segregate them from ground, or to resolve ambiguities and organize scenes was another, and proved more difficult than anyone initially had imagined.

New techniques and big discoveries

As these ideas began to influence thinking about visual perception, a quiet, almost unnoticed, methodological shift began. Research in visual perception began to make increasing use of measurement techniques developed for the study of psychophysics, and to adapt these for its own purposes. (Psychophysics is the somewhat arcane, and so far never successful, pursuit of the laws connecting objective and subjective, like light intensity and brightness, or sound intensity and loudness.) Psychophysical techniques have provided visual science with something that is rare in psychology: measurements that are precise and replicable, on ratio scales such as area or time, and even on dimensionless scales such as contrast and sensitivity. Because these measures are precise and replicable, and because, when they are applied, individual differences between observers tend to be small, papers reporting careful measurements on a few individuals or even a single person are now perhaps more the rule than the exception, in the perception literature. Such measurements (p.246) freed vision scientist from the inconvenience of having to use large numbers of subjects and from the necessity to assess effect sizes in terms of individual variation. The use of precise models to explain or, even better, quantitatively predict the results of experiments became common. As these methods became more widely used and trusted, the term psychophysics lost its original meaning, coming to mean ‘the experimental study of perception by recording verbal reports and behavioural reponses’ or ‘the direct, quantitative study of sensory performance’.
After the 1960s neurophysiological techniques for recording the responses of individual cells in the visual cortex became more widely used, and in the 1990s it became possible both to record from specific brain loci in awake animals and to image neural activity in humans and animals while they were exposed to visual stimuli. The new technique, functional magnetic resonance imaging (fMRI), measures blood oxygen level in the brain to index neural activity at given locations. The use of fMRI increased as knowledge of and questions about the functions performed at different sites increased,. Toward the end of the twentieth century, studies combining psychophysics, electro-phyiological recording, and brain imaging became increasingly common.
Two discoveries in the late 1950s and early 1960s had already entirely changed the field of discourse. They were originally independent, and at one time regarded as antagonistic, but later linked. One was that visual neurones had structured receptive fields enabling them to detect features or even events. Such receptive fields had originally been suggested by Hartline, and confirmed by Kuffler in the USA and Barlow in the UK for ganglion cells in the retina, showing that they detect and report to the brain the presence of or change in spots of light or dark. Thus it became clear that the retina analysed the images that it received and did not simply send copies of them to the brain. This discovery was extended in studies of frogs by Lettvin, Matturana, McCulloch, and Pitts, who, in a 1959 paper entitled ‘What the frog's eye tells the frog's brain’, dramatized the fact that the eye transmitted coded messages about features of retinal images after first subjecting them to a local analysis. They identified four operations that the eye performed and reported upon: sustained contrast detection, net convexity detection, moving edge detection, and net dimming detection. They argued that each of these identified a feature of the frog's environment useful to the frog in surviving and in catching prey; for example, a moving edge might indicate a predatory bird or an insect that might be eaten. The idea of a ‘moving edge’ detector was a precursor to the later idea of spatiotemporal receptive fields (Burr and Ross 1986).
The concept of what a receptive field might be developed greatly in complexity. Hubel and Wiesel (1968), in work that later earned them the Nobel Prize, (p.247) showed that cells in the striate visual cortex responded not to spots of light, but to lines of a particular orientation. Some (simple) cells required lines to be in a particular position within their receptive fields, others (complex cells) did not; some required lines to be moving and others did not. Hubel and Wiesel's work revealed a highly regular, orientation-based, columnar organization of cells at the first stage of the visual system and thereafter a hierarchical arrangement seemingly abstracting progressively more complex information from visual images as signals from them ascended the visual system. Later discoveries unearthed cells at higher levels with receptive fields for detecting global pattern, and even paths of global motion. Speculation began as to whether there existed a ‘grandmother cell’, a cell that responds to the observer's grandmother. No such cell has been found, but there is a recent report of cells that fire only when the faces of Jennifer Aniston (!) or Bill Clinton (neither of them a grandmother) enter their respective receptive fields (Quiroga et al. 2005).
The other idea, emerging originally from the psychophysical studies of Campbell and Robson (1968), was that images were analysed by functionally independent spatial-frequency channels, with the controversial suggestion that the brain performed some kind of Fourier analysis of visual images. The implication of Hubel and Wiesel's discoveries had seemed to be that visual analysis broke down the image into local features, like lines. The implication of Campbell and Robson's work seemed to be that visual analysis started by describing the image in terms of global Fourier-like components. In some places, most notably on the Berkeley campus of the University of California, tensions ran high between those who favoured local lines and those who favoured global waves as the components of visual images.
Campbell and Robson, citing an early paper by Hubel, had already pointed out in their original paper that ‘receptive fields of ganglion cells might provide a physiological basis for the frequency-selective channels suggested by the psychophysical results’ (1968, p. 565). Tension was eased by a gradual shift to the idea that images were analysed in local patches at different scales and in different orientations (wavelet rather than Fourier analysis). A link between the two developments was made possible by the recognition that different receptive fields operated at different orientation and different scales. The effect of the linkage was to alter radically ideas about the information available in an image, the processes by which information was extracted, and how visual perception could and should be studied.
In addition, vision scientists began better to understand and control visual stimuli. Fourier analysis provided an alternative description of stimuli, useful in calculating how visual mechanisms would respond to them. As computers (p.248) became more available, affordable, and tractable, they allowed the construction of high-precision stimuli of great complexity and sophistication, like the random-dot stereograms that Julesz introduced in 1959. Computers enabled not only the construction of such stimuli, but also analyses such as two-dimensional Fourier analysis and the processing of images to restrict the information that they could provide. This capacity proved invaluable in dealing with natural images sampled from a wide variety of environments to determine how visual mechanisms may have evolved to extract information from them efficiently.
By common agreement, the most dazzling book of the half-century was Bela Julesz's Foundations of Cyclopean Perception (Julesz 1971), but the most influential was David Marr's Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, published in 1980 after his early death.
Julesz was a Hungarian, trained as an engineer, who disdained precise psychophysical measurements. He preferred razzle-dazzle—stunning demonstrations that produced, as he described them, ‘57 dB effects’. He worked at the then richly funded Bell Laboratories, and boasted that he had the funds and equipment to do experiments for which no other vision scientist could find resources. His greatest coup was the random-dot stereogram (used by the even more flamboyant Salvador Dali in one of his paintings), which demonstrated that the visual system could extract a startlingly vivid pattern in depth from a pair of images, neither of which conveyed any indication of pattern or depth. Some, who still clung to the belief that stereo vision depended upon matching features in the two images formed by the eyes, were outraged, to the point of denying (by refusing to look, in some cases) that they saw what was plain to everyone else.
Marr was a Cambridge mathematician, who early decided to pursue a career in neuroscience. His book Vision (Marr 1982), published after his untimely death in 1980 at the age of 35 years, articulated a theme, earlier suggested by Kenneth Craik—who died in 1945 even younger than Marr—that the visual brain built itself an internal model of reality (Craik 1966). Marr called this reality a representation, and set out to show in detail how such a representation could be computed by the brain given the mechanisms it was known to possess, and given the indeterminacies in going from 2D images to 3D representations. Marr's work set the agenda for much that was to follow in the field of computational vision, as well as in experimental studies. Marr himself (Marr and Poggio 1979) proposed a model for stereopsis to explain how the brain computed pattern-in-depth from Julesz's random-dot stereograms.

(p.249) Specialized and distributed processing

By the end of the twentieth century it was almost universally agreed that the brain analysed different aspects of visual stimuli, such as colour, form, and motion, separately, and formed different maps, not all aligned, to assign the results of its multiple analyses. This seemed to follow from what neurones in different parts of the brain responded to, and led to the so-called ‘binding problem’—how information about different aspects of stimuli was pulled together in our percepts. No plausible solutions have been proposed, and doubts have been expressed both that the visual system is organized quite as hierarchically as is commonly accepted and that the binding problem exists (Lennie 1998).
During the 1980s it began to be widely, but not universally, accepted that there were two distinct pathways in the visual system, anatomically separate and functionally independent: a ventral, parvocellular pathway and a dorsal, magnocellular pathway. Broadly speaking the parvocellular is thought to handle form and colour, the magnocellular spatial relationships and motion. This hypothesis is consistent with a mass of anatomical, electrophysiological, psychophysical, clinical, and brain imaging evidence. Textbooks now present the existence of these two pathways and their functional independence as fact, dubbing them the ‘what’ and ‘where’ pathways; but as early as 1993 Merigan and Maunsell warned that the description of the two pathways as parallel and independent was likely to be only a rough approximation to the truth. In the same year, Goodale and Milner (1992) suggested that the ventral pathway was responsible for (conscious) perception and the dorsal for action, which could be guided by vision unconsciously, as happens in ‘blindsight’ (Sanders et al.1974).
Discovery of even greater specialization was in store. Two areas within the dorsal (magnocellular) pathway (known as MT and MST in the monkey brain) were found to combine information from local motion detectors in order to compute the direction and speed of motion over large distances, and to respond to the optic flow, described much earlier by Gibson, that results from motion of an observer walking, driving, or flying through the environment. In area V4 of the ventral pathway, neurones were found that responded only to particular types of global pattern—concentric circles or a fan of radial lines, for example. These can be considered as static analogues of optic flow detectors and there is some evidence to suggest that, despite the supposed independence of the ventral and dorsal pathways, they participate in the analysis of optic flow. In the early 1980s neurones were discovered in the temporal cortex of monkeys that responded selectively to faces (Perrett et al. 1982), and, (p.250) as mentioned above, in 2005 it was claimed, on the basis of electrophysiological studies of epileptics, that some neurones in humans respond selectively to particular well-known faces (Quiroga et al. 2005).


How far have we got?

By 2000, the study of visual perception had moved toward the centre of mainstream science with papers being accepted by general science journals such as Nature and Science, and in more specialized but still broad journals such as Current Biology, Neuron, the Journal of Neuroscience, and Nature Neuroscience. This trend accompanied a growth after 1960 in the number of good-quality international journals devoted specifically to visual perception, such as Perception, Vision Research, Spatial Vision, and (the electronic) Journal of Vision.
Vastly more was known about the visual system in 2000, anatomically and functionally, than was known in 1950. Vision scientists had tools, experimental, analytical, and theoretical, that had developed greatly in range of application and precision over the half-century. No stimulus was too complex to consider constructing, no variation in timing, contrast, texture, or colour too difficult to manage, and no aspect of perception too subtle to be measured. The number of people working on visual perception, and their level of skill, increased greatly.
And yet large gaps in our understanding remained, and still remain: How do we keep the perceptual world stable, despite the continual darting about of our eyes? Why does the world look so vividly complete, given the paucity of the information about it that the visual system contains (Sperling 1960) and given our blindness to change (O'Regan et al. 1999)? Why have visual prosthetics, the spectacles and contact lenses we wear, remained much the same as when first invented, and why have optometrists continued to use the Snellen chart to measure visual acuity? Why have improvements in the quality of television, and in animation techniques, been made with such little influence from the findings of late-twentieth-century vision scientists? Why has the age-old moon illusion, along with old warhorses such as the Mueller-Lyer and the Poggendorf illusions, so stubbornly resisted explanation?
The explanatory gap between the brain activities set in train by visual stimulation and our visual experience of the world yawns as wide as ever, perhaps wider. It is not just that qualia seem so out of reach: it is the sheer externality of the world, the richness with which we see it, and its compelling reality that seem so far from what we understand about the calculations the brain makes and the hypotheses, representations, or models it constructs.

No comments: