4 Cognitive Architectures

The information processing hypothesis leads cognitive psychologists to conduct functional analysis and face Ryle’s regress. To escape Ryle’s regress, they must discover a cognitive architecture. However, not all cognitive psychologists propose the same architecture, and many different architectures appear in cognitive psychology. In Chapter 4, I explore architectural variety and its causes by examining different architectural properties. Each property can take on different forms. Cognitive psychologists generate competing architectures when they make different decisions about the forms that these properties take.

4.1 The Variety of Cognitive Psychology

Cognitive psychologists hypothesize that cognition emerges from the rule-governed manipulation of mental representations. Cognitive psychology aims to explain such processing by conducting functional analysis. Cognitive psychologists collect data to infer processes that they cannot observe directly. They intend to make functional analysis explanatory by discovering primitive functions, the cognitive architecture.

One might expect that, if all cognitive psychologists perform functional analysis, then they must all discover the same architecture. However, cognitive psychology hosts many competing theories and rival architectures. How can such variety arise if cognitive psychologists embrace the same general research strategy? At least three answers exist.

First, we can infer different information processes from the same results. For example, consider memory scanning experiments (Sternberg, 1969b). Earlier we saw graphs of a linearly increasing relationship between reaction time and list length (Figure 3-5). Sternberg predicted such functions by assuming that we scan list items in serial fashion (i.e., one at a time). However, such graphs also conform to theories based upon parallel scanning of memory (i.e., scanning all items at once) (Townsend, 1971, 1990). Consider a parallel scanning process that slows down as the list length increases. Such a process also predicts the Figure 3-5 graphs. Thus, completely opposite proposals—serial versus parallel processing—can produce identical predictions.

Second, the ideas explored by cognitive psychology’s general approach do not arise in a theoretical vacuum. Cognitive psychologists explore predictions emerging from interesting hypotheses. But contrasting hypotheses about the same phenomenon lead different researchers in different directions, producing results supporting different theories. Consider the visual search (Treisman, 1986, 1988; Treisman & Gelade, 1980) introduced in Section 3.9. Treisman motivated her research by hypothesizing a single attentional spotlight that shifts from one location to another. As a result, she studied a visual search in tasks requiring participants to locate an individual target, discovering results to support feature integration theory.

However, different hypotheses about attention lead to very different studies. Pylyshyn (2001, 2003a, 2007) rejects the attentional spotlight hypothesis and instead proposes multiple attentional tags that attach themselves to different targets at the same time. As a result, in Pylyshyn’s studies, participants track multiple targets simultaneously (Pylyshyn et al., 2008). Pylyshyn’s results support a theory quite different from feature integration theory. In short, different hypotheses inspire different investigations. In turn, different investigations produce results supporting different theories of the same phenomenon.

Third, cognitive psychology does not restrict ideas, because it permits deliberate rebellions against established theories, rebellions that produce new ideas. Cognitive psychologists explain many well-studied topics by using widely accepted theories. We can move research in new directions by rejecting the established theory’s assumptions. Roboticist Rodney Brooks promoted such scientific rebellion,

During my earlier years as a postdoc at MIT, and as a junior faculty member at Stanford, I had developed a heuristic in carrying out research. I would look at how everyone else was tackling a certain problem and find the core central thing that they all agreed on so much that they never even talked about it. I would negate the central implicit belief and see where it led. This often turned out to be quite useful. (2002, p. 37)

Cognitive psychology provides many examples of rebelling against established theory. Established theories assume that memory involves different storage systems (Shiffrin & Atkinson, 1969; Waugh & Norman, 1965). Very different theories arise if we abandon the assumption and instead assume that different memories reflect differences in control (Baddeley, 1986) or differences in processing (Craik & Lockhart, 1972). Established theories assume that explicit symbols and processes exist (Newell & Simon, 1972). Very different theories arise when we rebel by assuming that cognition does not require symbols or rules (McClelland & Rumelhart, 1986; Rumelhart & McClelland, 1986b). Established theories assume that the skull completely contains the mind (Adams & Aizawa, 2008; Fodor, 1968). Very different ideas emerge when we assume that the mind extends into the world, making the world part of cognition (Brooks, 2002; Shapiro, 2011).

Thus, cognitive psychologists can share a general research strategy but still produce widely varying theories. Chapter 4 describes how different models arise when researchers make different assumptions about the cognitive architecture. Some propose serial processing, whereas others propose parallel processing. Some propose data-driven processing, yet others propose theory-driven processing. Some propose automatic processing, but others propose controlled processing. Some propose innate processes, whereas others focus on learning. Some propose isotropic processing, yet others propose modular processing. Different cognitive psychologists propose different structure-process pairings or different kinds of control. Chapter 4 shows how different assumptions produce radically different cognitive theories.

4.2 Serial and Parallel Processing

Chapter 4 illustrates that theoretical variety emerges in cognitive psychology when different cognitive psychologists make different assumptions about the cognitive architecture. To begin, I explore one architectural property: does the architecture execute one rule at a time (serial processing) or several rules at a time (parallel processing)? Many different theories begin when cognitive psychologists make different assumptions about serial versus parallel processing.

Mental chronometry, pioneered by the subtractive method (Donders, 1869/1969) (see Section 3.9), measures the time taken by mental processes (Luce, 1986; Posner, 1978). If Task B requires one more processing stage than Task A does, then we measure the additional stage’s processing duration by subtracting the time required to perform Task A from the time required to perform Task B.

The subtractive method assumes that mental operations involve serial processing, which only executes one process at any given moment (Figure 4-1). Figure 4-2 illustrates four different processes carried out in serial fashion. Process 1 occurs first, then Process 2, and so on. However, results like the famous Stroop effect indicate that cognitive processing is not always serial.

There are four rectangles arranged in a column, each representing a process. Arrows link each rectangle to the one below it. Time proceeds downwards. — Figure 4-1 Serial processing.

Stroop (1935) studied the interference between information available at the same time. He presented participants with a list of colour names. In one condition, the ink colour for each word differed from the colour named by the word. For instance, Stroop printed the word red in blue, green, brown, or purple ink but never in red ink. He asked some participants to read the printed words out loud (ignoring the colour of the ink) and found that ink colour did not interfere with performance. Stroop found no difference between the time to read coloured words and the time to read the same words printed in black ink. However, he found a very different result when participants named each word’s ink colour (ignoring the colour named by the word). Colour words interfered with naming ink colour; participants required 50 seconds more to name the ink colour of colour words than required when naming the ink colours of squares.

Stroop’s result illustrates parallel processing, which occurs when more than one process occurs at the same time. If words slow down naming ink colour, then two different processes operate simultaneously: processing the word and processing the ink colour. Researchers have proposed numerous explanations for the Stroop effect (Dyer, 1973; Jensen & Rohwer, 1966; Macleod, 1991, 2015). All explanations share the idea that we process words and ink colours in parallel.

For example, consider the “horse race model” (Posner & Snyder, 1975) illustrated in Figure 4-2. That model has two processing streams—one for words, the other for ink colours—operating in parallel, as illustrated by the vertical overlap of the boxes in the figure. For example, Process 1 and Process A start at the same time, because the tops of their boxes align vertically in Figure 4-1.

The horse race model proposes that we process words faster than we process colours. Figure 4-2 illustrates faster word processing by shortening the height of word process boxes compared with the height of colour process boxes. Because the model processes words faster, the word stream will finish first. Therefore, word processing will finish before colour processing interferes. But, by finishing first, the word stream can interfere with naming ink colour by delivering a competing colour word participants must ignore to name ink colour.

Two sets of serial processes (like Figure 4-1) are arranged side by side. One contains three stages of word processing. The other contains four stages of colour processing. Both serial processes feed into a final “respond” process represented by a rectangle. — Figure 4-2 The Stroop effect indicates that different processes can operate in parallel. The “Word” processing stream finishes before the “Colour” processing stream: Process 3 finishes before Processes C or D.

Figure 4-2 combines parallel and serial processing. The two different processing streams run in parallel. However, each stream operates in serial. Feature integration theory provides another example of combining both serial and parallel processing in the same theory (Treisman, 1986, 1988; Treisman & Gelade, 1980). Feature integration theory (Figure 4-3) begins when specialized processors detect different features such as colour, motion, and so on. Feature detection occurs in parallel; Figure 4-3 illustrates parallel processing by vertically aligning feature detection processors.

Five separate processes for detecting different features are represented as side-by-side rectangles; they include colour, motion, depth, orientation, and curvature. These processes operate at the same time. They all feed into a three-process serial processing system like Figure 4-1. — Figure 4-3 Feature integration begins with parallel processing. This is then followed by processing stages operating in serial.

Once parallel processes detect various features, serial processing begins. First, attention combines different features together to create object representations called object files (Kahneman et al., 1992). Next, object recognition (object classification or object naming) occurs by linking object files to semantic memory. Note that the final three stages in Figure 4-3—combining features, building object files, and accessing semantic memory—operate in serial. Figure 4-3 illustrates serial processing by placing the three final stages at different vertical positions.

Researchers can propose other combinations of parallel and serial processing. Cascaded processing provides one example (McClelland, 1979). In cascaded processing, a second (serial) process begins before the preceding (serial) process finishes. Cascaded processing (Figure 4-4) permits incomplete information to be passed from one process to the next, giving the second process a head start before the first process ends. Figure 4-4 illustrates mostly serial processing: Process 1 occurs first, Process 2 second, and so on. However, the vertical overlap between Process 1 and Process 2 in the figure indicates that Process 2 begins before Process 1 finishes. Similarly, Process 2 is cascaded with Process 3, and Process 3 is cascaded with Process 4. The dual route cascaded model of reading uses cascaded processing (Coltheart et al., 2001).

Four processes are represented using rectangles. Process 1 is directly above Process 3; Process 2 is directly above Process 4. Process 2 is placed to the right of Process 1, overlapping Process 1’s vertical position. Arrows indicate the processes send information to the next process according to process number. — Figure 4-4 Four processes in cascaded processing.

Still other theories use only parallel processing. Figure 4-5 illustrates four processes running at the same time, continually sending information to each other, in a system called an auto-associative network (Ackley et al., 1985; Hopfield, 1982; Kohonen, 1977). Auto-associative networks can model many cognitive phenomena, such as tracking moving objects (Dawson, 1991), paired-associate learning (Rizzuto & Kahana, 2001), visual search (Fukushima, 1986; Gerrissen, 1991), and concept categorization (Anderson et al., 1977).

Four processes are represented as rectangles, arranged along the same row. Arrows indicate all four processes receive information from the environment and from all of the other processors. — Figure 4-5 A system in which all processing is parallel.

Auto-associative networks illustrate connectionism (Bechtel & Abrahamsen, 2002; Churchland et al., 1990; Dawson, 2004; McClelland & Rumelhart, 1986; Rumelhart & McClelland, 1986b). Connectionists believe in dramatic differences between information processing in brains and information processing in digital computers. We can treat each process in Figure 4-5 as a model neuron, each operating in parallel, continually sending signals back and forth to one another.

Importantly, many connectionist models combine parallel and serial processing. Figure 4-6 illustrates one network, the multi-layer perceptron. Processors in the multi-layer perceptron represent neurons, and therefore connectionists describe such networks as using parallel processing. However, multi-layer perceptrons also include serial processing: input units must first send signals to hidden units before the hidden units activate. Similarly, output units cannot activate until they receive signals from the hidden units.

Four output units are illustrated as circles in a row. Below them are three circles representing a row of hidden units. Below the hidden units is a row of 12 circles representing input units. Lines indicate that each input unit sends a signal to each hidden unit, and each hidden unit sends a signal to each output unit. — Figure 4-6 The multi-layer perceptron is a typical connectionist network. Circles represent neuron-like processors.

Connectionism arises from adopting a rebellious counter-assumption to conventional theory: what if cognition differs from the serial digital computer? Connectionists produce theories quite different from other approaches (Bechtel & Abrahamsen, 2002; Dawson, 1998, 2013). We will encounter connectionism again in Chapter 5. In Chapter 4, I only need to emphasize connectionism’s preference for parallel processing over serial processing.

By making different assumptions about serial versus parallel processing, or by combining both serial and parallel processing in the same theory, cognitive psychologists can create diverse theories. Such diversity emerges from making different assumptions about one architectural property (temporal relations between processes). Other architectural properties also permit different assumptions, generating theoretical diversity. The next section explores a second architectural property, whether a theory proposes data-driven or theory-driven processing.

4.3 Data-Driven and Theory-Driven Processing

Researchers produce different theories by proposing different combinations of parallel and serial processing (Section 4.2). The direction in which information flows in a system provides another architectural property to vary to create diverse cognitive theories.

We define the direction in which information flows by distinguishing between peripheral processing and central processing. Peripheral processing occurs early (at cognition’s start), has direct contact with the world, and involves detecting information. In contrast, central processing occurs later (after we detect and represent information), has no direct contact with the world, and manipulates information already represented. Information can flow from peripheral to central processes, or in the opposite direction, from central to peripheral processes.

Data-driven processing, or bottom-up processing, occurs when information flows from peripheral to central processing (Figure 4-7). In Figure 4-7, sensation is the most peripheral processing and involves detecting information from the world. Awareness is more central and involves being consciously aware of some detected information. Thought is most central and involves reasoning about detected information (e.g., classifying conscious information as an object, such as “scruffy brown dog”). Figure 4-7 illustrates data-driven processing because the arrows between the boxes point in the direction from peripheral processes toward central processes.

Three rectangles, arranged in a column, represent serial processes of sensation, awareness, and thought. Arrows indicate information flows from the upper rectangle to the rectangle below. Sensation is indicated to be peripheral processing, and thought is indicated to be central processing. — Figure 4-7 In data-driven processing, information flows from peripheral processes toward central processes.

The multi-layer perceptron presented in Figure 4-6 illustrates data-driven processing because input units first detect environmental information. Input units send signals to hidden units, which detect more complex features. Finally, hidden units send activity to output units, which generate a complex response. We describe the network’s processing as data-driven processing because information always flows from input units (peripheral) toward output units (central).

Theory-driven processing, or top-down processing, occurs when information flows from central processes toward peripheral processes (Figure 4-8). Note that the arrows between boxes in Figure 4-8 point in the opposite direction when compared with the arrows in Figure 4-7. In theory-driven processing, results from central processes influence or guide more peripheral processing.

We find theory-driven processes in many cognitive theories of perception (Bruner, 1957, 1992; Bruner et al., 1951; Gregory, 1970, 1978; Rock, 1983). Most perceptual theories recognize that data-driven processing does not deliver all of the information that we need to experience the world (Marr, 1982). Theory-driven processes use our beliefs, knowledge, or expectations to fill in missing information. “We not only believe what we see: to some extent we see what we believe” (Gregory, 1970, p. 15). For example, my data-driven processes provide me with information indicating that I see a small black-and-white animal. Top-down processing permits a more sophisticated experience. When I am in my house, my expectations lead me to recognize my cat Phoebe. In contrast, when I am in the ravine, my expectations lead me to recognize, and avoid, a skunk. Top-down processing enables different expectations to add different information to the same representation delivered by data-driven processes.

Identical to Figure 4-7, with the exception that arrows indicate information flows in the opposite direction: from the lowest rectangle (thought) to the next (awareness), and from Awareness to the top rectangle (sensation). — Figure 4-8 In theory-driven processing, information flows from central processes toward peripheral processes.

Figures 4-7 and 4-8 illustrate models involving only data-driven or only theory-driven processing. However, information can flow in both directions in the same theory, as illustrated in Figure 4-9.

Treisman’s feature integration theory combines both data-driven and theory-driven processing (Treisman, 1986, 1988; Treisman & Gelade, 1980). The first stage, feature detection, uses data-driven processing. Different feature maps represent locations of different features. We recognize objects by combining features at the same location in different maps. However, data-driven processes do not combine features. In feature integration theory, an attentional spotlight provides the “glue” to hold different feature maps together. Higher-level processes position the attentional spotlight: we deliberately direct our attention to a location of interest. Thus, theory-driven processing moves the attentional spotlight from location to location.

Feature integration theory combines not only data-driven with theory-driven processing but also serial processing with parallel processing. Thus, feature integration theory illustrates that making different assumptions about architectural properties permits diversity within cognitive theory. Other architectural properties also support theoretical diversity. Section 4.4 introduces another example, when cognitive psychologists hypothesize whether processing is automatic or controlled.

Identical to Figure 4-7, with the exception that two arrows indicate information flows in both directions between the two lowest rectangles (awareness and thought) — Figure 4-9 Many modern theories, such as Treisman’s feature integration theory, include both data-driven and theory-driven processing.

4.4 Automatic and Controlled Processing

In Sternberg’s (1969b) memory scanning task (see Section 3.9), participants determine whether a probe belongs to a memorized list. To study how we search primary memory, Sternberg varied list size and measured participants’ response time. Sternberg argued that we scan memory with an exhaustive serial search. Later researchers varied Sternberg’s task to discover additional properties of search processes (Schneider & Shiffrin, 1977; Shiffrin & Schneider, 1977). Schneider and Shiffrin varied not only how many items were memorized but also how many probes participants searched for in memory.

Schneider and Shiffrin also manipulated the number of elements used to create memory sets and stimulus sets. A target belongs both to the stimulus set and to the memory set. A distractor belongs to the stimulus set but not to the memory set. In Schneider and Shiffrin’s varied mapping condition, items served as targets in some trials but as distractors in others. The varied mapping conditions increase task difficulty because participants must constantly attend to which items are targets since targets change from trial to trial. In contrast, in Schneider and Shiffrin’s consistent matching condition, one set of items always served as targets, and a different set of items always served as distractors. In the consistent matching condition, targets never served as distractors.

Schneider and Shiffrin discovered that the two conditions produced strikingly different results. Participants found the varied mapping condition much more difficult than the consistent mapping condition. In the former condition, performance became poorer (slower, less accurate) with increases in the size of the memorized lists or in the number of probes. Performance did not improve with training. In contrast, participants performed faster in the consistent mapping condition, and varying the number of memorized items or the number of probes did not affect performance. Performance in the consistent matching condition improved with training: participants reported that early trials demanded effort but experienced less effort after performing several trials (Shiffrin & Schneider, 1977).

Schneider and Shiffrin used their results to argue for two qualitatively different types of processes. Automatic processes are fast, automatically activated by stimuli, place few demands on cognitive resources such as attention, and do not require top-down control. In contrast, controlled processes are slow, initiated by higher-order processes, place high demands on attentional resources, and require top-down control. Although the architectural distinction between automatic and controlled processing differs from the architectural proposals introduced earlier in this chapter strong relationships do exist. Automatic processing is more likely to be data driven and parallel, whereas controlled processing is more likely to be theory driven and serial.

Feature integration theory, which contains both parallel and serial processing, and both data-driven and top-down processing, also contains both automatic and controlled processing. For instance, pop out results from automatic processing, whereas searching for unique feature combinations is controlled processing. We could create variations of feature integration theory by changing how all of the different processing types combine in the model. We could make attentional scanning parallel instead of serial but slow down the processing with increases in how many objects are scanned. We could make parallel scanning of objects data driven. Such changes involve modifying architectural properties, and each modification produces a different theory.

We have seen three architectural properties in Chapter 4 (serial versus parallel processing, data-driven versus theory-driven processing, and automatic versus controlled processing). Varieties of cognitive theories emerge when researchers assign different values to any of these processes. I now turn to another important architectural property that permits different design decisions, the format of symbols and the nature of rules to manipulate them.

4.5 Structures and Processes

Cognitive psychologists believe that a physical symbol system produces cognition (Newell, 1980; Newell & Simon, 1976). The physical symbol system describes a class of devices “capable of having and manipulating symbols, yet is also realizable within our physical universe” (Newell, 1980, p. 136). Digital computers belong to the class of physical symbol systems. Cognitive psychologists believe that the brain also belongs to the same class. However, we must do more than merely claim that a physical symbol system causes cognition. Cognitive psychologists must also provide many architectural details. If cognition emerges from a physical symbol system, then which symbols does the system manipulate, and which processes perform the manipulating?

The many-to-one relationship between physical and functional properties (Section 1.2) means that any physical entity is a possible symbol. Because of the many-to-one relationship, we can construct universal machines (Section 1.5) from gears (Swade, 1993), LEGO (Agullo et al., 2003), electric train sets (Stewart, 1994), hydraulic valves, or silicon chips (Hillis, 1998). As a result, cognitive psychologists can consider many options for the nature of mental symbols and processes. Fortunately, the relationship between symbols and the processes for manipulating them is not arbitrary. Symbol properties—a symbol’s format or structure—determine which processes can manipulate them. Symbols of one format can be manipulated only by certain processes, making some problems easier to solve than others. Changing the format means that symbols can be manipulated only by different processes, making different problems easier to solve. The close relationship between symbols and processes helps cognitive psychologists to define a cognitive architecture.

Let me illustrate how different structure-process pairings affect performance. Consider using a roadmap to represent spatial locations of places. A roadmap’s format—its spatial layout—permits us to execute specific operations easily, such as visual scanning. We scan a roadmap to determine quickly which city will be encountered next along a route. However, a roadmap’s spatial layout makes other operations more difficult to execute, making some questions more difficult to answer. For instance, we cannot determine the precise distance between two cities simply by scanning the roadmap. Instead, we must measure the distance on the map and then use a scale to convert the measured distance into kilometres.

If we represent the same information in a different format, then we can more easily execute different operations and therefore answer different questions more quickly. For instance, we can use a table of distances as a different format for representing the spatial relationships between cities. Each row or column of the table corresponds to a particular city. Each table number represents the distance between the row city and the column city. A table easily permits one operation called table lookup. We perform table lookup when we quickly read a table to retrieve information from the intersection between a row and a column. Table lookup permits us to use the table to quickly find the distance between two cities, a question more difficult to answer with the roadmap. However, table lookup does not permit us to easily determine the next city along our route, a question that we can answer quickly using a roadmap.

The roadmap versus distance table example illustrates the structure-process relationship (Dawson, 1998, 2013). Structure refers to the symbols’ format (e.g., spatial map versus distance table). Process refers to the operations for manipulating structure (e.g., scanning versus table lookup). According to the structure-process relationship, when one chooses a particular structure, one also chooses which operations can easily manipulate that structure. The structure-process relationship determines the questions that we can easily answer because of a particular pairing between structure and process. Roadmaps permit scanning, making questions about routes easy but questions about distances difficult. In contrast, a different structure-process pairing—distance tables and table lookup—makes questions about distance easy but questions about routes difficult.

The practice of cognitive psychology depends on the structure-process relationship, which indicates that we can easily answer certain questions but cannot easily answer others. Thus, choosing a particular architecture for a theory, or choosing a particular combination of structure and process, generates hypotheses to test by collecting relative complexity evidence, error evidence, or intermediate state evidence (Sections 3.9–3.11). To illustrate, let us consider one theory about mental imagery.

We experience mental imagery as mental pictures, which we often use to solve spatial problems (Kosslyn, 1980). For instance, to remember how many windows a building has, we might create a mental image of the building, scan the image, and count how many windows we see with our “mind’s eye.”

Which mental structure produces mental imagery? Which format do mental images take in the cognitive architecture? Cognitive psychologist Stephen Kosslyn answers such questions with his depictive theory of mental imagery (Kosslyn, 1980, 1994; Kosslyn et al., 2006). According to the depictive theory, mental images literally depict spatial information with a picture-like format: mental images have a picture-like spatial layout.

According to the depictive theory, the picture-like characteristics of mental images result from a small number of privileged properties (Kosslyn, 1980). First, images occur in a spatial medium functionally equivalent to a coordinate space: images are analog representations possessing spatial extent. Second, images visually resemble the things that they represent: there is an “abstract spatial isomorphism” between mental images and the world (Kosslyn, 1980, p. 33). Mental images represent visible properties such as colour and texture.

The privileged, architectural, properties of mental imagery define its structure. In turn, the structure permits certain visual processes to manipulate images easily. We can scan mental images, inspect images at different apparent sizes, or rotate images to new orientations. By coupling visual processing with the depictive structure of images, we can easily solve visuospatial problems. Furthermore, the privileged properties generate strong predictions about the time required to use mental images to answer specific questions.

For example, in the mental rotation task, Roger Shepard presented participants with two side-by-side images, each rotated to a different orientation (Cooper & Shepard, 1973a, 1973b; Shepard & Metzler, 1971). Participants decided whether both images represented the same object. Shepard measured the angular disparity between the two images (the difference between the images’ orientations) and how long it took participants to decide.

The depictive theory proposes that participants perform the mental rotation task by creating a mental image of one stimulus and then rotating the image to a new orientation. In the new orientation, participants can compare the rotated image to the other stimulus and decide whether the two represent the same object. The mental rotation task reveals that response time increases with increases in the amount of required mental rotation (angular disparity between stimuli). The result supports the depictive theory, which proposes that we rotate mental images holistically (as whole pictures), through intermediate orientations, because images have a picture-like format. The greater the angular disparity, the greater the time we need to rotate an image from the starting orientation to the ending orientation.

The image scanning task provides another example of testing the privileged properties of the depictive theory (Kosslyn, 1980; Kosslyn et al., 1978). In the image scanning task, participants create a mental image of a map. Kosslyn asked participants to scan the image from one location to another and to press a button when they arrived at the second location. Researchers manipulate the distance between the two locations and measure the time required for participants to respond. The image scanning task produces a linear relationship between response time and distance between locations (Kosslyn et al., 1978). Increasing the distance produces a corresponding increase in reaction time. The result supports the depictive theory, which claims that we scan the spatial extent of mental images at a constant rate. The distance-time relationship arises from an image being extended in space.

Researchers have also used computer simulations to study the depictive theory’s privileged properties (Kosslyn, 1980, 1987, 1994; Kosslyn et al., 1984; Kosslyn et al., 1985; Kosslyn & Shwartz, 1977). The simulations demonstrate that the hypothesized properties of mental images produce many regularities observed in experimental studies of mental imagery.

The mental imagery example shows that detailed proposals about the structure-process relationship generate hypotheses for cognitive psychologists to test. Importantly, results obtained from mental imagery experiments do not restrict the variety of architectures that cognitive psychologists can explore. They can create alternative theories by proposing alternative structure-process relationships or by taking issue with one proposed by others.

For instance, some researchers challenged the depictive theory in a decades-long imagery debate (Block, 1981; Tye, 1991). The imagery debate (Section 3.12) examines one basic question: do Kosslyn’s privileged properties belong to the architecture? Pylyshyn (1973, 1981a, 1981b, 1984, 2003a, 2003b, 2003c, 2007) argues that the depictive properties of mental images do not belong to the architecture; instead, primitive, non-spatial elements give rise to our spatial experience of mental images. Pylyshyn’s argument led to variations of the image scanning task that produce results against the depictive theory. Cognitive psychology uses experimental evidence to resolve debates about the architecture.

4.6 Structure, Process, and Control

The preceding sections described many possible architectural choices, which in turn produce a variety of cognitive theories. I now explore how cognitive psychologists can create radically different accounts of the same phenomena by making different architectural decisions.

Many examples in Chapter 2 involved the modal memory model (Shiffrin & Atkinson, 1969; Waugh & Norman, 1965). That model depicts memory as a sequence of different stores (iconic memory, primary memory, and secondary memory) with different architectural properties (symbols, processes, durations, and capacities). The architectural differences between stores in the modal memory model emphasize different assumptions about structure and process (Section 4.5).

However, we do not define information processing using only structure and process. Information processors must also incorporate control (Section 1.1). Control determines which process manipulates a data structure at any given time. Cognitive psychologists who emphasize control over structure and process produce very different theories from the ones produced by cognitive psychologists who emphasize structure and process over control.

The levels of processing theory of memory provides one example of a control-based memory theory (Cermak & Craik, 1979; Craik, 2002; Craik & Lockhart, 1972; Lockhart & Craik, 1990). That theory replaces a structural account of memory with a procedural account. Lockhart and Craik (1990, p. 88) sought to displace “the idea (a) that memory could be understood in terms of elements (‘items’) held in structural entities called memory stores, (b) that the fate of an item so stored was determined by the properties of this store.”

Craik and Lockhart used depth of processing to displace structural accounts of memory. Depth of processing reflects the degree to which we analyze a stimulus. Deep processing involves a semantic analysis of an item. For example, participants might determine whether each word in a list belongs to the category flower. Shallower processing involves analyzing non-semantic properties. For instance, participants might determine whether each word in a list rhymes with train.

Participants who perform deeper processing of a list also perform better in a surprise memory test, supporting levels of processing as an alternative account of the memory phenomena introduced in Chapter 2. Many view Craik and Lockhart’s theory as attacking the distinctions between memory stores in the modal memory model. Craik and Lockhart believe that this view is overstated (Craik, 2002; Lockhart & Craik, 1990).

Importantly, depth of processing is under conscious control. We can deliberately decide to pay attention to stimulus meanings and therefore determine how well we remember items. Improving memory by performing deeper analysis offers another perspective on the mnemonic techniques introduced in Chapter 2.

4.7 Nativism and Empiricism

Chapter 4 demonstrates that various cognitive theories emerge when cognitive psychologists make different assumptions about the cognitive architecture. We have considered many different assumptions, ranging from serial versus parallel processing to emphasizing structure and process over control. Section 4.7 introduces yet another architectural property, whether information is innate or learned, by briefly considering the psychology of language.

Symbols make particular information explicit. For example, a grammatical sentence contains words in a linear order. In addition, a sentence’s words belong to various parts of speech, and parts of speech are hierarchically organized. A sentence’s representation must make explicit linear order, parts of speech, and hierarchical organization.

One representation, phrase marker, makes the three properties explicit (Figure 4-10). The words at the bottom of a phrase marker are in linear order. The nodes of a phrase marker represent different parts of speech: determinant (“Det”), adjective (“Adj”), noun (“N”), and verb (“V”). Links between nodes show the hierarchical organization of parts of speech. For instance, a noun phrase (“NP”) can combine a determinant, an adjective, and a noun in a particular order.

A phrase marker starting with an S node connected to an NP and a VP node in the next level. The NP node is then connected to Det, Adj and N nodes on the next level. The VP node is then connected to a V node and a NP node at the next level. This final NP node is connected to Det and N nodes below it. Each lowest node of the phrase marker is connected to a word in the sentence “The cognitive psychologist seeks the architecture.” — Figure 4-10 An example phrase marker.

To define an architecture, we must also specify the rules for manipulating symbols and not just specify the symbols themselves. Two different kinds of rules manipulate phrase markers. A context-free grammar consists of rewrite rules for creating a phrase marker. For Figure 4-10, one rule, “S → NP VP,” creates the figure’s top two branches. The rule “NP → Det Adj N” and the rule “VP → V NP” create the figure’s next level of branches. Other rules, called transformations, belong to a different grammar called a transformational grammar. Such a grammar converts one phrase marker into a different one. For instance, one transformation could convert the phrase marker for the Figure 4-10 sentence “The cognitive psychologist seeks the architecture” into a different phrase marker to represent the question “Does the cognitive psychologist seek the architecture?”

Phrase markers, and the rules for creating and manipulating them, belong to generative grammar (Chomsky, 1957, 1965, 1966, 1995; Chomsky et al., 2000). A generative grammar consists of explicit, well-defined rules for assigning structural descriptions (e.g., phrase markers) to sentences or for manipulating such descriptions (e.g., converting one phrase marker into another). We achieve language competence when we master and internalize a generative grammar. How do we do that? Obviously, we learn languages: children learn to speak the languages of the households in which they are raised. However, though we learn some aspects of language, researchers believe in the innateness of other important aspects of generative grammars.

Consider Gold’s paradox (Gold, 1967; Pinker, 1979). Gold proved that generative grammars are too complex to learn under the conditions that children experience during typical language learning. Paradoxically, children clearly learn a language under these conditions! How do children avoid Gold’s paradox? We can avoid it if much of a generative grammar is innate and therefore does not require learning.

We find one example of language innateness in a transformational grammar theory called principles and parameters (Chomsky, 1995). That theory includes phrase markers and rules for manipulating them but allows certain properties to vary from language to language. For instance, languages such as English are head-initial (noun phrases precede verb phrases), whereas others such as Japanese are head-final (verb phrases precede noun phrases). The head-directional property is an example of a parameter, a property that can adopt different values. A parameter’s value determines certain phrase marker properties (e.g., head-initial versus head-final).

Linguistic experience determines parameter values. For instance, the head-directional property will adopt one value for children raised in an English-speaking household but adopt a different value for children raised in a Japanese-speaking household. However, a grammar’s other properties (i.e., phrase markers and rules for manipulating them) are innate. We avoid Gold’s paradox because we can learn parameter settings but cannot learn an entire transformational grammar. A grammar’s partial innateness permits mastery of the whole grammar.

A transformational grammar’s innate components define what Chomsky calls the universal grammar: “The study of innate mechanisms leads us to universal grammar” (1980, p. 206). Proposing an innate universal grammar illustrates nativism. We often associate nativism with the 17th-century philosophy of Rene Descartes. Cartesian philosophy (sometimes called rationalism) asserted that we derive new knowledge from the logic-like manipulation of innate ideas. When cognitive psychologists hypothesize that cognition is the rule-governed manipulation of symbols, and relate rules and symbols to a biological architecture, they adopt a modern version of Cartesian rationalism (Dawson, 2013). Indeed, Chomsky (1966) titled one of his books Cartesian Linguistics.

Cartesian psychology did not go unchallenged. The 17th-century philosopher John Locke (1706/1977) rejected the Cartesian claim of innate knowledge. For Locke, we acquire all of our ideas through experience: “Let us then suppose the mind to be, as we say, white paper, void of all characters, without any idea, how comes it to be furnished? . . . To this I answer, in one word, from experience” (p. 54). Locke’s philosophy is known as empiricism.

Like nativism, empiricism also provides the philosophical foundation for many cognitive theories (Dawson, 2013), such as connectionist networks like the multi-layer perceptron in Figure 4-6. Such a network’s connection weights usually begin as small, randomly selected values, making the weights analogous to “white paper, void of all characters.” Networks learn by responding to stimuli; mistakes cause connection weight changes to make response errors smaller. Networks illustrate empiricism, learning to solve problems from experienced mistakes.

Many modern cognitive theories combine nativism and empiricism. For instance, though principle and parameters theory (Chomsky, 1995) is nativist in its appeals to an innate transformational grammar, it is empiricist in its appeals to experience-based parameter setting. Competing theories weight empiricism and nativism differently. For instance, connectionist networks play an important role in another theory of language development emphasizing empiricism (Elman et al., 1996). However, Elman et al. also invoke nativism by assuming that the biology of neural development constrains connectionist learning mechanisms.

Our understanding of the architecture is affected by cognitive psychology’s tendency to combine nativism and empiricism. The architecture consists of primitive information processing capabilities (Chapter 3). Since primitives are built into the system, and cognitively impenetrable, we cannot functionally decompose them. The brain must instantiate the architecture, suggesting that the architecture is innate. However, this suggestion need not be true.

Cognitive psychologists believe that brains cause cognition. But the brain brings other, non-architectural, cognitive characteristics to life. For instance, when we change our beliefs, or when we learn new information, our brains change: learning and experience modify neural connections (Dudai, 1989; Eichenbaum, 2002; Gluck & Myers, 2001; Lynch, 1986; Squire, 1987). The brain must bring such non-architectural properties to life.

If the brain causes the architecture, but also stores (non-innate) information, then how can we distinguish the architecture from information? Importantly, brain structures change over time but not at the same rate (Newell, 1990). For Newell, the architecture is a relatively fixed structure and changes very slowly. In contrast, other structures change much more rapidly, such as the memories holding the information for the architecture to manipulate. Newell does not say why the architecture might (slowly) change. Architectural changes could be innate (e.g., neural development), or they could be caused by experience (e.g., when practice makes object recognition automatic). Explaining the architecture might require appealing to both nativism and empiricism. Appealing to both nativism and empiricism arises in another varying architectural property, isotropic processing versus modular processing.

4.8 Isotropic and Modular Processing

Karl Duncker (1945) pioneered the experimental study of problem solving and profoundly influenced the development of cognitive psychology (Simon, 2007). Duncker famously studied the radiation problem: “Given a human being with an inoperable stomach tumor, and rays which destroy organic tissue at sufficient intensity, by what procedure can we free him of the tumor by these rays and at the same time avoid destroying the healthy tissue which surrounds it?” (1945, p. 1). Duncker discovered fundamental characteristics of problem solving by having participants think out loud when solving the radiation problem.

Modern researchers studied whether participants find the radiation problem easier to solve after reading seemingly unrelated stories (Gick & Holyoak, 1980). Gick and Holyoak presented some participants with the Attack-Dispersion story, in which a general wants to capture a fortress. Many different roads lead to the fortress, all protected by explosive mines. A small squad of soldiers, because of their light weight, can cross the mines, but the weight of a large army causes the mines to explode. The general captures the fortress by dividing her army into many small groups, each taking a different road and safely reaching the fortress at the same time.

We can relate, via analogy, the general’s solution in the Attack-Dispersion story to the solution for the radiation problem. We can destroy the tumor, but preserve the surrounding tissue, if we aim lower-intensity rays at the tumor from many different directions. As a result, the tumor receives a massive dose of radiation not received by the surrounding tissue. Gick and Holyoak (1980) found that the Attack-Dispersion story helped participants to solve the radiation problem faster than participants who did not read the story.

Results like Gick and Holyoak’s (1980) inform cognitive psychologists who believe in the centrality of analogical thinking to problem solving (Gentner et al., 2001; Holyoak & Thagard, 1995). Analogical thinking finds insightful relationships between very different domains. Famous scientific analogies include Kepler’s comparison of the motion of the planets to the motion of a clock and Huygens’s hypothesization that light is wavelike by considering waves on water.

Analogical thinking requires cognition to relate disparate domains, to access information about clocks and planets, or about military strategy and cancer treatment, at the same time. We call the wide-ranging access to very different kinds of information isotropic processing (Fodor, 1983). As Fodor notes, isotropic scientific reasoning occurs when “everything that the scientist knows is, in principle, relevant to determining what else he ought to believe. In principle, our botany constrains our astronomy, if only we could think of ways to make them connect” (p. 105). For Fodor, central processes—thinking and problem solving—are necessarily isotropic.

However, Fodor (1983) also argues that many cognitive processes are neither central nor isotropic, processes that Fodor calls modules, specialized devices for solving specific information processing problems. A module receives information from sensors, manipulates information to solve a problem, and sends the solution on to central processes. A module uses parallel processing, is data driven, and is automatic. These characteristics arise because localized neural functions instantiate a module.

Being associated with localized neural functions makes modules domain specific or informationally encapsulated. We achieve modular processing by “wiring” modules only to necessary information. “The intimate association of modular systems with neural hardwiring is pretty much what you would expect given the assumption that the key to modularity is informational encapsulation” (Fodor, 1983, p. 98). Modules are not isotropic; they cannot access information irrelevant to their specialized function.

Neuroscience provides evidence for the existence of modules. Results from anatomy, physiology, and clinical neuroscience reveal the modularity of visual perception. Two distinct pathways exist in the human visual system (Livingstone & Hubel, 1988; Maunsell & Newsome, 1987; Ungerleider & Mishkin, 1982). We process the appearances of objects, while the other processes object locations. We can describe the two pathways as modules because we do not process the information (features versus location) processed by the other. Furthermore, each pathway consists of smaller modules. Researchers have identified over 30 distinct visual processing modules, each responsible for detecting a very specific kind of information (van Essen et al., 1992).

For Fodor (1983), modules are informationally encapsulated, domain specific, fast, and automatic because localized neural processes implement each module. Fodor also argues that the same properties cannot be true of central or isotropic processing and concludes that cognitive psychologists cannot explain isotropic processes: “The more global (e.g., the more isotropic) a cognitive process is, the less anybody understands it” (p. 107).

We can treat that pessimistic conclusion skeptically. For instance, the memory systems introduced in earlier chapters are isotropic because they can store many different kinds of information. Nevertheless, cognitive psychologists have acquired a deep understanding of these systems.

4.9 An Example Architecture

In this chapter, I have introduced several different architectural properties for cognitive psychology. Cognitive psychologists create different theories by choosing different values for different architectural properties. I now take a different approach to considering architectural questions by describing one important candidate for the cognitive architecture, the production system. I then consider the example architecture in the context of the various properties that we have considered.

A production system is a computer simulation used to model problem solving (Anderson, 1983; Newell, 1973; Newell & Simon, 1972). The simplest production system (Figure 4-11) has a working memory for holding strings of symbols and a set of rules, called productions, for manipulating memory contents.

In a production system, each rule or production is a condition-action pair. A production searches memory for symbols matching its condition. When a production finds a match, the production performs its action, which manipulates memory contents. A production system starts with a to-be-solved problem in memory. All productions simultaneously search memory for conditions. When one production finds its condition, it first disables the other productions, and then it performs its action. After performing the action, all of the productions scan memory in parallel again.

Four rectangles aligned in a row indicate four different condition-action pairs operating in parallel. Each production is connected by an arrow to a bottom rectangle representing working memory. — Figure 4-11 A simple production system consisting of four different productions.

We can represent a production system’s behaviour over time using a problem behaviour graph (Newell & Simon, 1972). Such a graph (Figure 4-12) consists of a set of linked nodes. Each node represents the current memory contents. One production’s action links two nodes together. For instance, the top left of Figure 4-12 shows that, when production P1 acts on memory State 1, the memory changes into State 2.

Problem behaviour graphs also depict changes in knowledge states not directly related to productions. In some instances, a participant might recall a previous state of knowledge during problem solving. A problem behaviour graph depicts such backtracking by copying the recalled state and placing the copy below the original in the graph. Figure 4-12 illustrates two examples of backtracking, one for State 1, the other for State 2. The examples indicate that time proceeds both horizontally and vertically in a problem behaviour graph.

Production systems can model human problem solving. How do we create such a model? Production systems emerge from a methodology called protocol analysis (Ericsson & Simon, 1993), which involves a detailed analysis of what participants say when they think aloud during problem solving (Section 3.11). Protocol analysis produces a problem space for representing a participant’s knowledge at each moment during problem solving. Researchers use a problem space to create a problem behaviour graph. The graph represents the rule-governed transition from one knowledge state to the next. The graph also helps researchers to create a production system to simulate the participant’s problem solving.

Eight different states of knowledge are represented by squares. Three squares (states 1-3) are in the top row. Three (states 2-5) are in the middle row, shifted so the two state 2s are aligned. The bottom row has four squares representing states 1, 6, 7, 8. Vertical arrows indicate productions, and link a square to the next square to its right in a row. A vertical arrow indicates backtracking, linking the first squares in rows 1 and 3, and the second squares in rows 1 and 2 — Figure 4-12 An example problem behaviour graph. States are different memory contents, and links are created by a production’s action.

Newell and Simon (1972) used their method’s utility for a variety of problems. They found a high degree of correspondence between the problem behaviour graph created using protocol analysis and the problem behaviour graph generated by the production system. In general, production systems can generate a very accurate step-by-step account of a participant’s problem-solving operations. Because production systems can successfully simulate many psychological phenomena (Anderson, 1983; Anderson et al., 2004; Anderson & Matessa, 1997; Meyer et al., 2001; Meyer & Kieras, 1997a, 1997b; Newell, 1990; Newell & Simon, 1972), numerous researchers treat the production system as a plausible cognitive architecture for providing a unified theory of cognition (Anderson, 1983; Anderson et al., 2004; Newell, 1990).

The simple production system presented in Figure 4-11 has evolved into more complex models (Anderson, 1983; Anderson, 1990; Meyer & Kieras, 1997a, 1997b). Different production systems emerge from different combinations of the architectural properties discussed in this chapter.

Simple production systems scan memory in parallel, but act on memory in serial, because only one production operates at any time. However, other production systems permit productions to act in parallel (Meyer & Kieras, 1997a, 1997b).

Simple production systems model problem solving using data-driven, automatic processing, because patterns held in working memory trigger productions. However, other production systems permit theory-driven, controlled processing. For instance, the adaptive control of thought-rational (ACT-R) architecture includes components for directing the system to accomplish a desired goal (Anderson et al., 2004).

Memory contents do not perfectly control simple production systems. Sometimes multiple productions discover their conditions at the same time, or one production discovers its condition at more than one memory location. Additional control mechanisms can resolve conflicts between productions or conditions. Production systems can differ from one another in terms of the control mechanisms used to deal with such conflicts.

When Newell and Simon (1972) described modelling problem solving with production systems, they said little about nativism and empiricism. Although not claiming innateness, their simple production systems did not learn. Later production systems included learning mechanisms. For example, ACT-R uses learning mechanisms to modify existing productions and uses learning mechanisms to add new knowledge to memory (Anderson et al., 2004).

We can describe simple production systems as modules because such systems accomplish particular tasks, such as solving a specific cryptarithmetic problem (Newell & Simon, 1972). However, later production systems seem to be more isotropic (Anderson, 1983; Anderson et al., 2004; Newell, 1990). These systems include general knowledge of the world and goal-directed problem solving, which make them general purpose problem solvers.

In short, even when researchers view production systems as a plausible cognitive architecture, they need not agree on various specific details. Many different production systems emerge when researchers make different decisions about architectural properties.

4.10 Chapter Summary

To overcome Ryle’s regress, cognitive psychologists must discover a cognitive architecture. However, not all cognitive psychologists will converge on the same architecture. Many different architectures appear in cognitive psychology because the same results can support different models, because different researchers explore different ideas, and because researchers can negate the architectural assumptions held by others. Cognitive psychologists frequently explore competing architectures.

Furthermore, potential cognitive architectures exhibit many possible properties. Are processes serial or parallel? Are they data driven, or theory driven? Are they automatic or controlled? Which particular combination of structure and process is involved? How are processes controlled? Are they innate or learned? Are they isotropic or modular? Different cognitive psychologists generate different answers to such questions, producing architectural variety. In addition, we can answer some questions in many different ways. For instance, we can propose many different combinations of symbolic structures and manipulated rules (Dawson, 1998, Table 6.1). Again, different answers to architectural questions produce architectural variety.

Thus, cognitive psychology hosts a number of competing models. Is language embedded in the innate structure of a universal grammar (Cook & Newson, 1996), or are its regularities learned by connectionist networks (Joanisse & McClelland, 2015)? Is memory a set of different stores (Shiffrin & Atkinson, 1969), or do different memories reflect different levels of processing (Craik & Lockhart, 1972)? Is attention a serial spotlight (Treisman & Gelade, 1980), or is it a set of indices deployed in parallel (Pylyshyn, 2003a)? Radically different theories arise when researchers adopt radically different hypotheses about the architecture.

Further variety emerges because different answers to the same architectural question exist in a single theory. For instance, feature integration theory includes feature detectors operating in parallel but then uses serial processing to combine features into objects (Treisman & Gelade, 1980). One model exhibiting multiple architectural properties (e.g., both parallel and serial processing) illustrates another source of theoretical variety.

Theoretical variety also arises when models differ in their emphasis on architectural properties. For example, where does data-driven processing stop? In feature integration theory, it ends early, delivering fairly simple visual features (Treisman & Gelade, 1980). In natural computation theories, data-driven processing ends later, delivering more complex representations of surfaces and objects (Biederman, 1987; Marr, 1982). It is little wonder that cognitive psychology hosts so many different models.

In short, even when cognitive psychologists share foundational assumptions, they can still propose and explore different architectures. Cognitive psychology possesses notable theoretical diversity. Nevertheless, its diversity faces a common fate: empirical testing. Cognitive psychologists must collect data to support architectural proposals.

Of course, finding a single, unifying cognitive architecture requires that such an architecture exists, which many researchers assume. Allen Newell defines a unified theory of cognition as “a single set of mechanisms for all of cognitive behavior” (1990, p. 15), and an elaborate production system called SOAR (‘state, operator and result’) can provide a unifying theory (Laird et al., 1987). However, not everyone agrees that a unifying cognitive architecture exists. Instead, some believe that diverse arrays of processes carry out cognition, each possessing distinct architectural properties. The society of mind provides one example, producing cognition with a large number of simple, distinct processes called agents (Minsky, 1985, 2006). A related idea is massive modularity, which proposes that many specialized modules produce human cognition (Carruthers, 2006; Pinker, 1997).

The architecture of one agent or of one module need not be identical to the architecture of another. When two cognitive psychologists propose different architectures, both can be correct, because different architectures can exist in different agents or modules. Questioning the existence of a unifying cognitive architecture illustrates a foundational debate. Cognitive psychology requires such debates to develop. In the next and final chapter, I explore some example debates. Different positions in such debates produce radically different theories. Ultimately, different theories reflect different ideas about the fundamental nature of cognition.

5. Questioning Foundations