Saturday, March 11, 2017

The ups and downs of cognitive psychology: attention and other ‘executive functions’ Alan Allport

The ups and downs of cognitive psychology: attention and other ‘executive functions’

Alan Allport

DOI:10.1093/acprof:oso/9780199228768.003.0002

Abstract and Keywords

This chapter examines developments in cognitive psychology during the past fifty years, especially concerning the issues of attention and other executive functions. It describes the emergence of information processing psychology, cognitive neuropsychology, and the concept of parallel distributed processing during the 1970s. In the field of executive control, there were key dominant mindsets. These include the recognition of discrete cognitive operations as the basic units of analysis for cognitive psychology and belief that control was something intrinsically centralized and imposed on the basic cognitive operations from the outside.
A few books I read as a student shaped my youthful idea of what a new experimental science of mind might—or ought—to be like. Arthur Koestler's The Sleepwalkers, about Tycho Brahe and Keppler, offered a delightful image of obsessive amateurs, groping their way into inventing a science of the solar system. Could psychology ever be like that? A new Copernican revolution, sweeping away not the old Ptolomeic epicycles, but the old folk-psychology conceptions such as perception, memory, will, consciousness, self—and transforming them into radically new ways of understanding mind? The thought was dizzying! In 1959 even to have a second-row seat in the gallery while such a conceptual revolution was going on seemed an exciting prospect. Was it possible? Ross Ashby's Cybernetics (1956), and that extraordinary symposium published in cardboard covers, Mechanisation of Thought Processes (National Physical Laboratory 1959), suggested that it was. In contrast, the experimental psychology that I was mostly supposed to study as an undergraduate (Woodworth and Schlosberg 1954) seemed a bizarre, atheoretical, and almost totally shapeless subject, an arbitrary collection of behavioural phenomena tagged to experimental paradigms: serial position curves and rate-of-decay of spiral after-effects. It was a big disappointment. I almost gave up and went back to South America.
I'll explain. Before going to university (in 1958) I had done my two years of National Service, ending up in Guyana (‘British Guiana’, in those days). My conscript sentence over, I stayed on for a six-month walkabout through northern Amazonia (Venezuela, Brazil, finally Peru), travelling with Akawaio Indians, hunting, fishing, living with them in the bush. I climbed Mount Roraima, contracted malaria, got comprehensively lost in the Pakaraima highlands. Eventually I re-emerged, got to hospital, recovered, and found myself with a life-long passion for biodiversity. I wanted to be a biologist. But arriving (p.14) at Oxford with a scholarship to read modern languages did not make it easy. My profoundly traditional school had encouraged me to drop science at the age of 13. A transfer to biology without chemistry or mathematics? No chance! Philosophy and Psychology (‘PPP’), on the other hand, had no apparent academic prerequisites whatever. So I slipped in. (Jeffrey Gray had found his way into psychology by the identical route, a couple of years earlier.)
Regretfully, after graduation, I still couldn't figure out how to finance a life as a nineteenth century-style naturalist in Amazonia. Instead, not knowing what else to do, I took a job as a ‘probationer’ clinical psychologist, at £512 per annum. I was given singularly little to do aside from administering occasional IQ tests, so I started doing experiments. It was fun. Surprising fun! Perhaps experimental psychology was what I should do, after all, rather than studying orchids or orthoptera.
I embarked on a PhD at Cambridge, with Richard Gregory as my supervisor, which in practice meant a free rein to do whatever I liked. I spent a few weeks wandering around the science area wearing a crash helmet with a steel rod bolted across the top, and a set of sliding mirrors that could give me an effective 2-, 4-, or 6-foot spacing between the eyes. (That felt the sort of thing to be doing in a lab run by Richard. The idea was to study Ivo Kohler-style visuomotor adaptations.) But after trying out my apparatus on a bike, with seriously injurious results, I decided a shift of research track might lead to a longer-lived career.
I had just discovered Karl Popper's Logic of Scientific Discovery (1959). It was a revelation. I decided that, for me, the primary goal of research should be to do my damnedest to falsify theories (other people's theories, mostly, but my own too, just as enthusiastically—if I ever had any). For this purpose, the broader the scope of the theory the better. The ideal, if it could be achieved, would be to group possible theories of a given domain into two (or more) broad categories, and then to find a way to falsify—experimentally to rule out—one whole category. Donald Broadbent wrote a paper to this effect. I was impressed. At this ideal, or idealized, level, the falsificationist strategy becomes a kind of experimental conceptual surgery (‘natural philosophy’ in its truest form), which I continue to find hugely attractive.
Obviously, there are at least two major difficulties before the conceptual surgeon can bring her scalpel effectively to bear. First of all, the question (or theoretical dichotomy) has to be well posed. Supposing it is not (as in questions like: ‘Is attentional selection early or late?’ aka ‘Where is the attentional bottleneck?’),1 then the attempts to falsify one or other alternative are (p.15) doomed to go round and round without ever reaching closure (see Allport 1993). Usually the problem lies in the underlying assumptions—the sort that may not even be recognized as such, hence seldom, if ever, seriously questioned. A typical example, related to the above, is the idea that ‘attention’ is (must be?) the name of an identifiable subset of causal, neurocognitive control operations (‘shift’, ‘select’, ‘disengage’, etc.) If so, it could make sense to try to localize these control functions in the brain (as in so many recent publications in Nature: Neuroscience). But what if ‘attention’ is properly an outcome state—a resultant—rather than a causal process, as first mooted by William James (1890), and by Johnston and Dark (1986) a century later? What if, for example, the reaction-time costs attributed to these supposed control operations (‘disengage’, etc., etc.) are simply indicators of conflict anywhere in the system? In that case, what one is localizing (e.g. on the basis of increased parietal or frontal activations following an ‘invalid’ spatial pre-cue, etc.) might be simply the sites of maximum (here, spatial) conflict. Nothing more. It is clearly these ‘bedrock’ assumptions that the conceptual surgeon needs to examine first of all.
The second major difficulty for the would-be falsificationist is one that is common to experimentalists in practically every field, at least in the nonstandard or exploratory phases of research. It is how to invent or design an experimental procedure that can do the job. All too often, alas, experimental invention is just not up to it. In other cases, however, it is because the theory is so ambiguously specified as to be genuinely unfalsifiable. An example, in my opinion, is Alan Baddeley's ‘central executive’, aka Norman and Shallice's ‘supervisory attentional system’, aka ‘the Will’, which is supposedly called in, like some obliging Auntie, whenever the children (i.e. the lower-level systems) are engaged in ‘non-routine’ (or ‘dangerous’) activities. Given that we are told essentially nothing about what Auntie actually does, still less how she does it, (p.16)nor any other of her properties besides ‘limited capacity’, I am led to infer that claims to her existence, like that of fairies, do not belong at the present time within the domain of science.
But I digress. (Glimpse a passing hobby-horse and hitch a ride. As Ogden Nash said, ‘Shake and shake the ketchup bottle: first none will come and then a lot'll .’)
I did my PhD work on John Stroud's (1955) theory of the discrete ‘psychological moment’, essentially the hypothesis that psychological time is discontinuous. And I was lucky. Besides a series of largely descriptive experiments, I stumbled on a method that provided essentially a disproof of the hypothesis (Allport 1968). Since then, so far as I know, the theory has not mounted a come-back.
I thought I had a post-doc job in Edinburgh from September 1965, to work on schizophrenia, but at the last minute the funding fell through. I had just got married to my beautiful Virginia, and suddenly had no job and no prospects. I hurriedly scanned the job adverts in the Edinburgh municipal library, and spotted one for an assistant lecturer in psychology, in Aberdeen. Again, I was lucky. Aberdeen is a wonderful place to live if you enjoy the grey North Sea and the wild, sea-bird-haunted cliffs, crocuses in May, and skiing on wet porridge in the Cairngorms. I was assigned to lecture on developmental psychology, which surprised me as I knew nothing about the subject. But I was happy to learn—I did so mostly a day or so ahead of my students.
By some quirk of economics the University had one or two ‘professorial’ houses to rent at that time, but no professorial takers for them; so we became the occupants of a draughty eighteenth century mansion, Tillydrone House, right opposite St Machar's Cathedral, complete with a moat and four-acre garden. The rent was £4 a month.
The Aberdeen department (then) was not particularly encouraging of research, but I managed to acquire a three-field tachistoscope and started a series of experiments, inspired by George Sperling's (1963) masking paradigm. There was not the slightest pressure to publish, so I didn't, aside from a couple of short reports. (Several of those experiments, I confess, remain unpublished to this day.) Mowing the grass on our four-acre estate, watching the sea-birds, playing with our first child, and skiing the Devil's Elbow took up too much of my time. We had four extraordinarily happy (and unstressed) years there. But by then it seemed time to move, and I found a job at Reading University with some stimulating company: Leslie Henderson, Max Coltheart, Lizanne Bainbridge, and over the years a series of marvellous students, who quickly became colleagues, such as Derek Besner, Ruth Campbell, and Elaine Funnell. Under their stimulus, and some years past my thirtieth birthday, I began to be almost hopeful about psychology.
(p.17) I had ten pleasant years in Reading, before moving to Oxford in 1979—a rash move in several respects. Pat Rabbitt left more or less the moment I got there, and Jerry Bruner had departed in clouds of smoke just before: there was still smoke everywhere. I found myself the sole lecturer in the department teaching the experimental psychology of cognition—language, memory, skill, attention, word recognition, AI, etc.—constituting four full undergraduate courses. It was hard going, and remained so for several years. On the other hand, of all the cities I know well, Oxford is for me the most liveable, the quirkiest, the richest in strange and unexpected talents. I have spent more than thirty years of my life there and would not now wish to live anywhere else.2 As well as many, rewardingly challenging, Oxford undergraduates (Felix Wichmann, now a Professor in Berlin, perhaps the cleverest of the lot), I had some wonderful graduate students from whom I learnt immeasurably, many of them much abler and more professional than me: people like Liz Styles, Shulan Hsieh, Ian Dennis, Jon Driver, Steve Tipper, Geoff Ward, Renata Meuter, and Glenn Wylie. I am grateful to them all.
But back to Reading, 1970. Pursuing my Popperian agenda I recruited a couple of bright undergraduates, Barbara Antonis and Pat Reynolds, to run the experiments we subsequently published together as a ‘disproof of the single channel hypothesis’ (Allport et al. 1972). What (if anything) did it ‘disprove’? (There's a lesson here, somewhere.) The core idea of Broadbent's famous (19581971) model was that, in the sensory control of action, and as regards all access to long-term memory, the brain constituted a unitary processing channel, defined by its limited informational capacity, i.e. in terms of ‘bits per second’. The postulated capacity limit was, explicitly, independent of the type, modality, or ‘content’ of the information involved—visual, auditory, linguistic, spatial…what you will. In other words, the single (or ‘central’) channel was a ‘general purpose’ processor, with unmistakeable resemblance to the Von Neumann architecture of the general-purpose digital computer. (Why anyone might suppose such a thing perplexed me even then, given what classical neurology had long since discovered about the degree of specialization for different high-level cognitive processes in different cerebral areas.)
(p.18) The canonical evidence put forward in support of Broadbent's hypothesis was the (apparent) inability of subjects to follow (to remember or respond to) more than one verbal message at a time, as first reported by Cherry (1953). Because the information rate of speech is so high, the argument went, letting in more than one speech stream would seriously overload the informational capacity of the central channel; hence all but one input stream must be ‘filtered out’ at an earlier stage. But (I wondered) what if, while ‘shadowing’ one speech stream, you tried to follow a second high-bit input in a different processing domain—pictorial, musical, etc.? The answer we reported was simple: even after minimal practice people could carry out both tasks concurrently with almost no loss of information transmission in either task. Broadbent's response to this was completely disarming. Rather than fully loading the ‘central channel’, as previously supposed, he now suggested that speech shadowing actually bypassed it altogether, via its own domain-specific pathway. Experimental falsification is clearly not a one-stop shop. One ‘non-central’ pathway down, how many more to go?
Meanwhile, the explanation for Cherry's ‘dichotic listening’ limitation presumably had to be sought somewhere other than the supposed capacity limit of a ‘single channel’, such as in the eminently special-purpose constraints of sentence processing? (There is a lesson here.)
In that case (you may well ask) why is attention—or awareness—so narrowly focused, so selective? (What is ‘selective attention’, anyway, and what purpose does it serve?) It's all very well rejecting one type of model: give us a better one!
Fair enough. Here are a few ideas that I and others have explored. I like to think of them as preliminary steps toward a ‘theory of consciousness’. (But they can be sketched here only briefly; for more argument and experimental evidence—still very incomplete—see Allport 19871989.)
  1. (i) Consciousness is inseparable from ‘very short-term memory’, that is, from cognitive states that persist (at least for some hundreds of milliseconds), which in turn is a precondition for longer-lasting memory encoding.
  2. (ii) Only coherent cognitive states can persist. Non-coherent (e.g. mutually incompatible or conflicting) states are intrinsically unstable and are swiftly modified or suppressed. (This is a basic property of brain function; see ‘parallel distributed processing’, below.)
  3. (iii) Which cognitive states dominate at any moment—hence can persist, hence enter awareness—is of course the central explanandum of psychology as the ‘science of mental life’. Innumerable different factors can contribute to this cognitive ‘control’. (More on this, below, as well.)
(p.19) However, in most behavioural experiments, for example where a speeded sensory–motor response is required, the physical constraints of action itself are a major controlling factor. It works like this. At any one moment I can speak only one word, grasp one object, foveate one point in space, etc. Thus, if one set of stimulus parameters (among other competing ones) has to guide or specify that unique action—as in selective naming, grasping, eye movement, yes/no monitoring tasks, etc.—then the neural coding of those stimulus parameters must be enhanced, relative to any others potentially competing for control of the same class of action. (I call this ‘selection-for-action’, Allport 1987; for a particularly elegant illustration, see Deubel and Schneider 1996.) In all such cases, the need for selection (selective prioritization) among sensory inputs is a direct consequence of the demands of action—nothing to do with limited processing capacity. On the other hand, where different effectors (e.g. eyes, hands, voice, locomotion) can be guided by different, appropriately compatible, sense inputs (hence without significant cross-talk between them), ‘dividing attention’ between ongoing tasks shows no such limitation. Similar constraints, I believe, apply to ‘selection-for-memory’, but I have not done nearly enough work on this.
Thinking about contemporary cognitive science, it is interesting to reflect on where it has come from over the past forty-plus years. The 1970s (give or take a few years) were the hey-day of ‘information processing’ psychology. Dozens of new behavioural phenomena were reported and canonized, each one linked to its particular experimental paradigm. RT (reaction time) methods predominated, many of elegant ingenuity. (Sternberg's high-speed memory scanning, from the 1960s, is perhaps the prototype; Shepard and Metzler's linear rate of ‘mental image rotation’ is another, as is Anne Treisman's ‘conjunction search’.) And research became increasingly phenomenon driven. Uncomfortably, however, as time went by and countless (small or large) variations on the experimental paradigms were explored, the underlying phenomena came to look less and less robust, their original interpretations less and less convincing. Very few of these micro-worlds, if any, seemed genuinely to consolidate, still less to link up convincingly with other research lines, based on other ‘phenomena’. The topic of ‘attention’ was certainly no exception.
Around that time two new developments rescued me from a gathering disillusion. The first was the emergence of cognitive neuropsychology, combining the traditions of classical neurology with these newer information-processing concepts and methods. Marshall and Newcombe, and Warrington and Shallice, were already busily inventing the approach in the 1960s. For me, the conviction that I wanted to join in dates from a paper in 1975 by Marin, (p.20) Saffran and Schwartz. And by 1980, with ‘Deep Dyslexia’, cognitive neuropsychology seemed to me clearly the way to go. Hot on its heels, however, came the connectionist revolution, with a swirl of publications from 1981 onwards, emerging from San Diego and elsewhere. Earlier AI approaches—particularly Production Systems—had seemed to offer a promising language for cognitive theory but remained brittle, rule based, incapable of spontaneous generalization, generally failing to match behavioural data at any real level of detail. Suddenly the prospect looked different: it was the most optimistic moment of my career. Was this—at last—the beginning of a conceptual revolution for psychology, the one of which I had dreamed when I first embarked on the subject? Well, in part, yes, I still believe, but with narrower scope than I had hoped—still hope. For another twenty years, at least, the emergence of connectionism had surprisingly little impact on how cognitive psychologists—for the most part—continued to think about (or not think about) the ‘central’ problems of psychology: attention, consciousness, voluntary decision, ‘executive control’; nor on the traditional boundaries between specialisms (e.g. ‘attention’ vs‘memory’; ‘cognition’ vs ‘motivation’).
Parallel distributed processing (PDP) brought three conceptual leaps forward, all closely interlinked: ‘memory’ and ‘processing’ were no longer separate—or separable—components; memory itself was no longer a filing system, a passive store-place of declarative data, to be ‘searched or ‘scanned’ by the processor, but a (purely dispositional) ‘processing landscape’ embodied in a network of local connection-weights or biases; and ‘processing’ was radically distributed throughout these memory structures, as a process of integrated ‘constraint satisfaction’. This last property also had important implications for thinking about ‘control’. According to the traditional information-processing approach, some kind of additional, supervisory control was necessary to ensure that all ongoing processing and decision-making remained mutually consistent and coherent. From the PDP perspective, however, mutual consistency was the essence of what was computed, and ‘integration’ was a byproduct.
Perhaps understandably, PDP modellers tended to concentrate on particular micro-domains, chosen to be broadly consistent with the kind of functional modularity postulated by cognitive neuropsychology (visual word recognition, face recognition, past-tense morphology…). Ironically, the results of experimentally ‘lesioning’ these models frequently challenged the ultra-modularity inferred (hitherto) from neuropsychological deficits (in the caricature, a new ‘box’ to be added to a box-and-arrow schema, for each new deficit reported). A lot of ‘boxes’ had to go. But the conceptual challenge from PDP ran a lot deeper than that.
(p.21) A dominant concept (perhaps the dominant concept) in the neuropsychology of language at that time was that of a ‘mental lexicon’ (logogen system, in John Morton's theoretical coinage). However, in the new PDP models the idea of explicit word-form representations (‘dictionary units’) disappeared entirely, leaving only systems of connection-weights linking supralexical ‘semantic features’ with sublexical phonological and/or orthographic features. The ‘word forms’ themselves had become attractor states, purely emergent properties of the action of the network as a whole. An essay by Stephen Monsell (1987) on ‘lexical input and output pathways’ is one of the monuments of the period: an iconic Laokoon struggle with the entangling serpents of connectionism and the neuropsychologists’ concept of lexicon. (Stephen protested that his essay provoked more words of commentary from me, as its editor, and the other referees than were contained in the final chapter. Well, Stephen, they were—and are—cunning serpents. And the wrestling still goes on: see Max Coltheart's chapter here.)
And what about ‘executive control’? The way any field of enquiry starts out, early in its history, inevitably has profound effects on how that field subsequently develops. The ‘information processing’ approach, rooted in the concept world of sequential computational operations and routines, with separate ‘processor’ and ‘memory’, imposed a distinctive and instantly recognizable mind-set on the whole of cognitive psychology, the legacy of which has persisted for more than half a century. Three of the key elements of this mindset were the following:
  1. (a) The basic units of analysis for cognitive psychology were specifiable, discrete ‘cognitive operations’, whose duration (and perhaps other properties) were to be inferred through appropriately cunning RT methodology.
  2. (b) These cognitive operations were assumed to run sequentially, and generally unidirectionally, from ‘stimulus’ to ‘response’.
  3. (c) ‘Control’ was something intrinsically centralized (as in the ‘central executive’), and imposed on the basic cognitive operations from outside—from ‘above’. (What exactly was meant by ‘central’ has never been clear to me; the fundamental idea, in any case, was that such ‘control’ emanated from a control system (or systems?) separate from the lower-level (‘slave’) processing systems; control was thus mysterious, and came from outside—from ‘elsewhere’.)
Why elsewhere?
From the earliest days, the heartland topics of information-processing psychology were speeded performance, attention, working memory, ‘executive control’. (p.22) By contrast, learning and memory (‘long-term memory’) was a separate specialism. Researchers who studied attention and speeded performance seldom also worked on learning or long-term memory, and vice versa. (There were of course honourable exceptions: students of priming and ‘automaticity’, for example, have formed something of a bridge between these otherwise surprisingly divergent fields.) This separation was evident half a century ago, back in the 1950s and 1960s, and remains enshrined to this day in separate scientific journals, separate research groups and symposia, separate textbooks, separate undergraduate courses. Even the connectionist revolution has failed to overturn it altogether. And over the same half century, until very recently, an even deeper gulf divided the ‘information processing’ field from the field of emotion and motivation. (Indeed, the very idea of ‘cognitive’ psychology was defined by this latter separation.)
Suppose, as I do, that ‘control’ (like attention, like consciousness) is an emergent property, a resultant rather than a cause. Suppose that what is causal are the multitudinous constraints (biases) within the system, constraints arising, that is, from anywhere and everywhere: from the sensory environment; from ongoing and intended action; from the entire processing history of the organism encoded in memory; from current and long-term goals in so-called ‘working’ memory (long-term working memory, for sure); and from all the rest of the immensely powerful, inbuilt emotional/motivational biases woven throughout the system. It is curious to reflect that, from the perspective of (early) information-processing psychology, all of these sources of control were conceptually outside of what was originally thought of as basic information processing—that is, memory-less, motive-less, and certainly emotion-less ‘cognitive operations’! No wonder, then, that ‘control’ had to come from outside—from ‘elsewhere’!
You may say that this is a caricature. It is. All a caricature does is to exaggerate (slightly) the true features of the object.
It is undeniable that, for over fifty years, the search has been on to identify (and measure the time-course of) discrete ‘control operations’ (spatial attention shifts, shifts of set, etc.), imposed from above upon the stupid and ignorant (inevitably, because memory-less) ‘basic processing systems’. And as cognitive psychology morphed into cognitive neuroscience, the search has continued, to try to localize the source of the postulated control operations—the ‘attentional control systems’—in parietal or prefrontal cortex, the anterior cingulate, or wherever.
Just consider the research effort expended over the past three decades in measuring the supposed duration (the ‘time cost’) of a shift of spatial attention; or the, by now, almost comparable research effort expended on attempts (p.23) to measure the duration of a shift of ‘set’, using the various task-switching paradigms. Belief in the reality of discrete ‘control operations’, whose endogenous time-course can be measured, seems as deep-rooted as the corresponding belief in the reality of ‘basic’ cognitive operations. As one small but revealing symptom of this belief bias: in the task-switching literature, the difference between RTs on ‘task-switch’ and ‘task-repetition’ trials is almost invariably described as a task-switch cost, rather than a task-repetition benefit, although the two are logically equivalent.
So can experimental falsification work, in this sort of case?
One thing we discovered about these RT ‘switch costs’ is their dependence on learned associations between individual stimuli and ‘tasks’. For example, even one previous experience of a given stimulus—in the context of the other RT task (the task to be switched from)—can be enough to double the so-called switch cost in some cases, even after hundreds of intervening RT trials (Allport and Wylie 2000; Waszak et al. 2003)—a good example of ‘bottom-up’ rather than ‘top-down’ control, and of the inseparability of processing and implicit memory.
Another thing we discovered about task-switching, using an ‘RSVP’ (rapid serial visual presentation) method for visual search, is that the control operation supposedly needed to shift ‘set’ (e.g. from one type of search target to another) does not actually have an intrinsic time-course of its own, but is simply paced by the presentation rate of the RSVP stimuli (Allport and Hsieh 2001). Many other observations about RT switch costs, likewise, run strongly counter to the popular intuition that they reflect the time taken—by a central executive: the ego, the self, the will?—to reconfigure the subordinate-level processing pathways from one task ‘set’ to another. Here's just one. When naming ‘Stroop’ stimuli (e.g. the word ‘RED’ written in green ink), you can name either the word (an easy task) or the colour (a much harder task; see MacLeod 1991). Intriguingly, switching to the easy task turns out here to have a very big time-cost (or is it a big repetition-benefit?), whereas switching to the harder task has a small (sometimes even zero) ‘switch cost’ (Allport and Wylie 19992000; Allport et al. 1994). Does all this amount to a Popperian falsification of the executive control model, as regards RT ‘switch costs’? It is hard to see why not. But I suspect the folk intuitions about the nature of volition are too deeply entrenched. Even if their time-course can't be measured by RT switch-costs (and they certainly can't), executive control operations must surely exist, mustn't they?
Well—I wonder.

References

No comments: