4 The Scale Mode Network

4.1 The Multilayer Perceptron

4.1.1 Carving Pattern Spaces

How can researchers overcome the limitations of perceptrons? Perceptron power increases when we add one or more layers of intermediate processors are added between the input and the output units of the perceptron. These hidden units do not have any direct connection to the external world; they reside completely within the network. An artificial neural network that incorporates hidden units is called a multilayer perceptron. This chapter illustrates using a multilayer perceptron by training and interpreting the one illustrated in Figure 4-1, a network designed to detect whether an input scale is major or minor.

Figure 4-1

Figure 4-1 A multilayer perceptron, with two hidden units, that detects whether a presented scale is major or minor.

4.1.2 What Do Hidden Units Do?

Why do hidden units make a multilayer perceptron capable of solving problems too complex for a simpler perceptron? One answer to this question is that hidden units detect higher-order features. A higher-order feature involves relationships among different input units considered simultaneously. By including units that can detect such higher-order features, networks become capable of solving much more complicated problems (Lippmann, 1989). Indeed, adding hidden units makes artificial neural networks as powerful as any universal computing machine of interest to cognitive science (Dawson, 2013; McClelland & Rumelhart, 1986; Rumelhart & McClelland, 1986b). Thus, we should never be surprised when such a network learns to solve a musical problem. Surprises must emerge instead from our investigation of the solution that the network has discovered, and from what it might tell us about music (Dawson, 2004).

4.1.3 Training Multilayer Networks

As was the case for the perceptron introduced earlier, I will train a multilayer perceptron using a supervised learning rule that minimizes output unit error. Such learning rules modify a connection weight on the basis of two values: the activity from the unit at the input end of the connection, and the error computed for the unit at the output end of the connection (Dawson, 2004).

However, researchers encounter a problem when they attempt to define weight changes for any of the connections from an input unit in Figure 4-1 to either of the hidden units in this network. The activity at the input end of one of these connections is input unit activity. However, researchers have no idea what the error at the hidden unit end of the connection is. This is because they do not know in advance the responses that hidden units should make, and therefore they cannot define error in the same fashion as they do for the output unit.

The connectionist revolution that occurred in cognitive science in the 1980s (McClelland & Rumelhart, 1986; Rumelhart & McClelland, 1986b) began when researchers discovered a procedure for determining the error for any hidden unit in a multilayer perceptron. They found that a hidden unit’s error can be specified as the sum of error signals sent to it by each of the output units to which it is connected (Rumelhart et al., 1986). Each of these signals is an output unit’s error, scaled by the weight of the connection between the output and the hidden unit. In other words, output units send error signals backward through the network, and these signals determine hidden unit error, solving the credit assignment problem. Not surprisingly, the learning rule for multilayer perceptrons is often called backpropagation of error, or backprop for short. Because this rule is a generalization of the supervised learning rules for perceptrons, it is also known as the generalized delta rule.

4.1.4 Error Backpropagation

The generalized delta rule trains a multilayer perceptron to mediate a desired input/output mapping. Before training begins, a network is a “blank slate”; all of its connection weights, and all of the biases of its activation functions, start as small, random numbers. The generalized delta rule involves repeatedly presenting training patterns—input/output pairs—and then modifying weights; this was the case for perceptron training in Chapter 3.

In the generalized delta rule, a single presentation of an input/output pair proceeds as follows: The first step is to feed signals forward through the network. The multilayer perceptron’s input units are then activated in order to present it an input pattern. This in turn sends signals to hidden units, which compute their net input and then their activity. Next, the hidden units send signals to the network’s output units. The output units then compute their net input and activity. Output unit activities represent the network’s response to the input pattern.

Second, now that output units have been activated, it is possible to measure their response error by taking the difference between the desired activity and the observed activity for each output unit. This procedure is identical to that used to train perceptrons.

Third, an output unit’s error is used to modify the weights of its immediate connections. Up to this point, there is no essential difference between the supervised learning of a multilayer perceptron and the supervised learning of a perceptron.

The fourth step differentiates backprop from the training of a perceptron. In this step, each hidden unit’s error is determined.This is accomplished by treating an output unit’s error as if it were activity, and sending it backward as a signal through a connection to hidden units. This signal is then multiplied by the weight of the output unit’s connections. Each hidden unit computes its error by summing together all of the error signals that it receives from all of the output units to which it is connected.

Fifth, once hidden unit error has been computed, the weights that feed into the hidden units can be modified using the same equation that was used to alter the weights of each of the output units. Training continues by presenting the next input/output pair in the training set, and repeating the error backpropagation procedure.

The generalized delta rule for multilayer perceptrons was initially defined for processors that use the logistic activation function (Rumelhart et al., 1986). However, variations of the algorithm exist for training multilayer perceptrons that use value units (Dawson & Schopflocher, 1992).

4.1.5 Design Decisions

One theme of this book is that artificial neural networks can only inform the cognitive science of music after their internal structure is interpreted. With this goal in mind, I gear the design decisions toward discovering the simplest network capable of solving a musical problem. A simpler network is easier to interpret than a more complex network.

My first step in achieving this goal is to determine whether the simplest network —the perceptron—can solve the problem. In Chapter 3, we found that a value unit perceptron with µs held to zero could identify scale tonics. However, at the end of that chapter we also noted that a perceptron could not learn to classify an input scale as being major or minor.

In this second case, when a perceptron is not up to the task, I next proceed to explore the more powerful multilayer perceptron. This exploration involves the same sort of design decisions used when the perceptron was investigated (learning rate, initialization, activation function, etc.). However, multilayer perceptrons require additional design decisions, such as deciding how many hidden units to include in the network.

Determining the number of hidden units requires exploring the behaviour of a number of different-sized networks. Typically, one begins with an educated guess about how many hidden units should be included. If the multilayer perceptron performs poorly on the problem (e.g., fails to learn it, stabilizes to very high error, etc.), then a different network that uses more hidden units is explored. If the network solves the problem quickly, then a network that uses fewer hidden units is trained. I may run many different simulations as I search for the simplest multilayer perceptron—that is, the one with the smallest number of hidden units—that I can reliably train to solve the problem of interest. Once I discover the simplest network for solving a musical problem of interest, I proceed to analyze its internal structure

4.2 Identifying Scale Mode

4.2.1 Task

My goal is to train an artificial neural network to distinguish major scales from harmonic minor scales—to identify scale mode. I present the network the same stimuli used to train the scale tonic perceptron in Chapter 3: either a major scale or a harmonic minor scale encoded using a pitch-class representation. I train the network to turn its single output unit “on” if the input pattern defines a major key, and to turn it “off” if the input pattern defines a minor key.

4.2.2 Network Architecture

Seeking the simplest network for accomplishing this musical task proceeds along the lines described in Section 4.1.3. As mentioned at the end of Chapter 3, we cannot successfully train a perceptron to identify key type for these different input patterns. In accordance with Section 4.1.3, the next step is to discover the simplest multilayer perceptron that is capable of identifying scale mode. After exploring a variety of different networks, the network settled upon is the multilayer perceptron illustrated in Figure 4-1.

This multilayer perceptron uses 12 input units to represent the presence or absence of pitch-classes using the same encoding described in Chapter 3 for the scale tonic perceptron. (This is because I train both networks with exactly the same set of input patterns.) The network uses a value unit as its single output unit. The output unit is trained to turn “on” if a stimulus represents a major scale, and to turn “off” if a stimulus represents a harmonic minor scale. Finally, the network uses two hidden value units as intermediate processors between its input and output units. There are no direct connections between the input units and the output unit.

4.2.3 Training Set

The training set consists of 12 different major and 12 different harmonic scales represented in pitch-class (i.e., each input stimulus was one of the rows of numbers presented in Chapter 3 as Table 3-1). As was the case for the scale tonic perceptron, each input pattern is used to turn the network’s 12 different input units either “on” or “off” to indicate the presence or absence of the various pitch-classes. The desired response for an input pattern that represents a major scale is one, and the desired response for an input pattern that represents a harmonic minor scale is zero.

4.2.4 Training

The network is trained using the generalized delta rule developed for networks of value units (Dawson & Schopflocher, 1992) using the Rumelhart software program (Dawson, 2005). This program is available as freeware from the author’s website http://www.bcp.psych.ualberta.ca/~mike/AlienMusic/

During a single epoch of training each pattern is presented to the network once; the order of pattern presentation is randomized before each epoch.

All connection weights in the network are set to random values between −0.1 and 0.1 before training begins. The µs of the output and hidden units are set to zero throughout training. I employ a learning rate of 0.01. Training proceeds until the network generates a “hit” for each of the patterns in the training set. I define a “hit” as activity of 0.9 or higher when the desired response is one or as activity of 0.1 or lower when the desired response is zero.

The multilayer perceptron in Figure 4-1 quickly learns to identify scale mode, typically converging after between 350 and 475 epochs of training. The network described in more detail in the next section learns to solve the problem after 390 epochs of training.

4.3 Interpreting the Scale Mode Network

How does the multilayer perceptron in Figure 4-1 detect the difference between a major scale and a harmonic minor scale, regardless of the scale’s tonic? In order to answer this question let us examine the hidden unit space that confronts the output unit, as well as the input pattern features detected by the two hidden units.

4.3.1 Hidden Unit Space

The two hidden units in the network must detect stimulus properties that permit the output unit to distinguish major keys from minor keys. In order to discover what features the hidden units detect, let us plot the hidden unit space for the network. Because the multilayer perceptron uses only two hidden units, its hidden unit space can be illustrated with a two-dimensional scatterplot as is shown in Figure 4-2. In a hidden unit space, each input pattern is plotted as a point on a graph. The hidden unit activities provide the coordinates of the point. For instance, in Figure 4-2 the activity the pattern produces in Hidden Unit 1 provides a point’s x-coordinate, and the activity the pattern produces in Hidden Unit 2 provides its y-coordinate. Hidden unit spaces are useful in interpreting network structure because output units can be viewed as solving a classification problem by “carving” the hidden unit space into different decision regions that separate one type of pattern (e.g., major scales in Figure 4-2) from another (e.g., minor scales in Figure 4-2) (Pao, 1989; Ripley, 1996).

A number of regularities are evident in Figure 4-2. First, all of the patterns that represent major scales fall near the origin of this space. This means that major key stimuli tend to produce near-zero activity in both hidden units. This further suggests that the role of each hidden unit is to detect (to turn on to) some property that indicates that a stimulus is not related to a major key.

Second, all of the stimuli related to major scales appear to be highly similar to one another, because all cluster closely together in Figure 4-2. In contrast, the stimuli related to minor scales spread themselves a wide distance apart. The minor scale stimuli seem to all fall roughly in a diagonal line that falls downward from left to right; there are distinct clusters of different stimuli along this line. An account of the features detected by the two hidden units should explain such regularities. Third, in many instances different scales fall so close together in the hidden unit space that their plotted symbols fall on top of one another. For example, this is true for Dm and G♯m, and for Am and D♯m at the top left of Figure 4-2. This overlapping of symbols can make the graph harder to inspect. However, it is actually a visual property that reveals a key property of this hidden unit space: scales that seem quite different to us (like Am and D♯m) are actually nearly identical to one another as far as this network is concerned, which is why their coordinates in the hidden unit space are so close together.

Figure 4-2

Figure 4-2 The hidden unit space for the scale mode network.

4.3.2 Hidden Unit Weights

What features do the hidden units detect to determine that a stimulus represents a minor key and not a major key? In other words, what are the two hidden units detecting that permits them to position the different stimuli in the hidden unit space of Figure 4-2? In order to explore this issue let us examine the values of the weights connecting each of the input units to the hidden units.

Figure 4-3

Figure 4-3 The connection weights between the 12 input units and each hidden unit in the scale mode network.

Figure 4-3 presents two bar plots, one illustrating the weights of the connections between the 12 input units and Hidden Unit 1, the other illustrating the same information for Hidden Unit 2. These two patterns of connectivity determine when particular input patterns will cause either hidden unit to generate high activity, representing the presence of a minor key (as revealed by the Figure 4-2 hidden unit space). Interpreting these connection weights should reveal what “minor scale features” are being detected by each hidden unit.

There is a high degree of regularity in both of the graphs illustrated in Figure 4-3. Exactly half of the pitch-classes in each plot have negative connection weights, and the other half have positive connection weights. Furthermore, the overall shape of each plot is identical, but the pattern for Hidden Unit 1 is “phase shifted” three pitch-classes to the left in the plot for Hidden Unit 2. Consider the pattern of bars for Hidden Unit 1 from D♯ onward to the right. The identical pattern is evident for Hidden Unit 2, but only if one begins with pitch-class C instead of D♯.

Relationships between weights in the same graph in Figure 4-3 reveal a striking property. Consider the two most extreme weights for Hidden Unit 1, the ones connecting this unit to incoming signals from A and D♯. Not only are the two weights the most extreme, but they seem almost equal in magnitude (although one weight is positive while the other is negative). These two pitch-classes are a musical interval of a tritone (six semitones) apart. If we compare other weights coming from input pitch-classes that are a tritone apart, then we see that this pattern repeats: these pairs of weights are roughly equal in magnitude but opposite in sign.

This same regularity is also evident in the Figure 4-3 plot of Hidden Unit 2 weights. For instance, the two extreme weights (from C and from F♯) are roughly equal in magnitude but point in opposite directions, and again are separated by a tritone.

Figure 4-4 presents exactly the same connection weights shown in Figure 4-3, but aligns weights from pitch-classes a tritone apart. Plotting the weights in this fashion produces two graphs in which the upper bars are a mirror image of the lower bars. This clearly indicates that weights from any two pitch-classes that are a tritone apart have the same magnitude, but are opposite in sign. This means that, for either hidden unit, if two pitch-classes a tritone apart are both turned on then their signals will cancel each other out, producing a net signal of zero.

The notion that signals from different input units cancel each other out when received by a hidden unit is of particular importance given that the multilayer perceptron in Figure 4-1 uses value units for hidden units. Recall that we define a value unit’s Gaussian activation function with one parameter, µ, and that for a value unit to generate maximum activity its net input must equal µ.

In the multilayer perceptron trained to distinguish major from harmonic minor scales, the value of µ for each hidden unit, and for the output unit, is zero. Thus for either hidden unit to generate a maximum response, the incoming signal from the input units must also equal zero. This is why the notion of two signals that cancel one another is of such import.

Figure 4-4

Figure 4-4 Connection weights between input units and each hidden unit.

The connection weight pattern evident in Figure 4-4 suggests that both hidden units detect tritone balance. That is, the ideal stimulus (i.e., the input pattern that produces a maximum response) is one in which pairs of pitch-classes a tritone apart are balanced. In such a stimulus, two pitch-classes a tritone apart are in the same state: they are either both off or both on. When both pitch-classes are off, their input units send a zero signal through the two connections. When both pitch-classes are on, both send signals through the two connections. However, because the weights of the two connections are equal in magnitude but opposite in sign, their signals will again cancel out, contributing little to a hidden unit’s net input. Of course, it is possible for pairs of pitch-classes a tritone apart to be unbalanced. This occurs when one pitch-class is present in a stimulus but the corresponding pitch-class is not. In this situation, a definite positive or negative contribution will be added to a hidden unit’s net input. This is because when the tritone is not balanced, the signals from the two pitch-classes do not cancel out. As a result, the unbalanced tritone will shift net input away from µ, making it much more likely that a hidden unit will not respond.

Figure 4-2 indicated that the two hidden units detect features true of minor keys and not true of major keys. Our analysis of connection weights indicates that this feature is “tritone balance.” How do we relate this feature to the musical definition of major or minor keys?

4.4 Tritone Imbalance and Key Mode

Figure 4-5

Figure 4-5 Spokes in a circle of minor seconds used to represent the pitch-classes that define major and minor keys.

In Western music, there are six possible pairs of pitch-classes separated by a tritone: [A, D♯], [A♯, E], [B, F], [C, F♯], [C♯, G], and [D, G♯]. In any stimulus presented to the network analyzed in the previous section, the more of these pairs that are balanced (i.e., their pitch-classes are both in the same state, on or off), the greater is the response of either hidden unit.

Why does the network use tritone balance to distinguish harmonic minor scales from major scales? Figure 4-5 provides an answer to this question. This figure is an alternative version of Figure 3-2, and depicts the pitch-class content of the A major and the A harmonic minor scales. Figure 4-5 differs from Figure 3-2 by indicating not only which pitch-classes are present in a scale (solid spokes) but also which pitch-classes are absent from a scale (dashed spokes).

Although Figure 4-5 provides only two example scales (A major and A minor), we saw in Chapter 3 that the two spoke patterns that it depicts can represent any major or harmonic minor scale respectively. This is because one can transpose one of the depicted scales into any other musical key by rigidly rotating the spoke pattern to a new orientation within the circle.

Figure 4-5 highlights the tritone relationships between corresponding pairs of pitch-classes. Two pitch-classes that are a tritone apart are directly opposite one another in the circle of pitch-classes. As a result, one can quickly inspect Figure 4-5 for tritone balance. If a tritone pair is balanced, then the diameter through the wheel that connects its two pitch-classes will be constant in appearance. For instance, in the A major diagram, the pair [D, G♯] balances because there is a single solid line connecting these two pitch-classes. In the A minor diagram, [D, G♯] and [B, F] are both balanced with a solid line, while [C♯, G] is balanced with a dashed line.

Recognizing that the two spoke diagrams in Figure 4-5 apply to any major or harmonic minor key respectively, two general properties are now apparent. First, major scales have almost no tritone balance. For any major scale, there will be one and only one balanced tritone. Second, for any harmonic minor scale, there will be three and only three balanced tritones. Two of these involve pairs of pitch-classes being present, while the third requires both members of a pitch-class pair to be absent. That this system requires a particular pair of pitch-classes a tritone apart to be missing is an interesting property discussed below in more detail in Section 4.5.2.

4.5 Further Network Analysis

4.5.1 Comparing Scale Geometries

Now that the hidden units of the scale mode network as tritone balance detectors have been interpreted, let us return to understanding the hidden unit space of Figure 4-2. In particular, let us explore the arrangement of the 12 harmonic minor scales in this space.

There is a long history of using spatial or geometric representations to highlight the relationships between musical entities (Hook, 2006; Krumhansl, 2005; Schoenberg, 1969; Tymoczko, 2006, 2011, 2012). When these techniques are used to map relationships between complex entities like musical scales, one feature that emerges is that major scales and minor scales tend to be mixed among one another. This is because similar scales will lie closer to one another in a musical space, and different kinds of scales can be similar to one another because they share many pitch-classes. For instance, the set of pitch-classes that defines the C major scale differs from the set that defines the A harmonic scale by only a single pitch-class. The C major scale set differs from those that define the C, D, and E harmonic minor scales by only two pitch-classes. Thus, one would expect that in a typical spatial representation C major would be near to these similar harmonic minor scales.

Figure 4-2 is interesting because it differs from this expected arrangement. Instead of surrounding minor scales with major scales as is seen in other spatial depictions (Schoenberg, 1969), the hidden unit space pulls minor scales away from the major scales at the graph’s origin. Such a clear difference between the hidden unit space and other traditional geometric representations is what makes the hidden unit space interesting. A second regularity in Figure 4-2 is not only that harmonic minor scales spread along a line through the hidden unit space but also that there is a definite grouping of points along this line. In particular, points cluster together in pairs. This grouping is less evident in Figure 4-2 because in some cases paired points fall exactly in the same position. However, the pairing of points is obvious if one examines the coordinates of the minor scales in hidden unit space; Table 4-1 provides these coordinates.

Table 4-1 Properties of the 12 harmonic minor scales and their position in hidden unit space.

Scale tonic	Hidden Unit 1 activity	Hidden Unit 2 activity	Balanced tritone 1	Balanced tritone 2	Balanced tritone 3	Unbalanced pitch- classes	Hidden Unit 1 unbalanced signal	Hidden Unit 2 unbalanced signal
G#	0	0.93	[A#, E]	[C#, G]	[C, F#]	B, D#, G#	1.58	−0.16
D	0	0.92	[A#, E]	[C#, G]	[C, F#]	A, D, F	−1.59	0.16
A	0	0.84	[B, F]	[D, G#]	[C#, G]	A, C, E	−2.03	0.22
D#	0	0.84	[B, F]	[D, G#]	[C#, G]	A#, D#, F#	2.06	−0.25
A#	0.19	0.76	[C, F#]	[A, D#]	[D, G#]	A#, C#, F	0.69	0.31
E	0.24	0.72	[C, F#]	[A, D#]	[D, G#]	B, E, G	−0.70	−0.31
C#	0.72	0.23	[A, D#]	[C, F#]	[B, F]	C#, E, G#	0.29	−0.67
G	0.76	0.22	[A, D#]	[C, F#]	[B, F]	A#, D, G	−0.32	0.70
C	0.84	0	[D, G#]	[B, F]	[A#, E]	C, D#, G	0.24	2.05
F#	0.84	0	[D, G#]	[B, F]	[A#, E]	A, C#, F#	−0.24	−2.05
B	0.92	0	[C#, G]	[A#, E]	[A, D#]	B, D, F#	−0.14	−1.55
F	0.92	0	[C#, G]	[A#, E]	[A, D#]	C, F, G#	0.18	1.57

Note. Each row provides details for each scale, including the activity the scale produces in each hidden unit, the three balanced tritones, and the unbalanced pitch-classes. The final two columns provide the net input delivered to each hidden unit by the unbalanced pitch-classes.

Table 4-1 provides some additional information concerning the pairing of minor scales. It provides three columns (Balanced Tritone 1, 2, and 3) that list the three balanced tritones for each minor scale. (Note that balanced tritones 1 and 3 are pairs of pitch-classes that are both present in a stimulus, while balanced tritone 2 is a pair of pitch-classes that are both missing from the stimulus.) Examining these three columns in Table 4-1 reveals that pairs of points with identical or near-identical coordinates in Figure 4-2 represent two minor scales that have three identical balanced tritones. For instance, in the hidden unit space the inputs for the F harmonic minor scale and the B harmonic minor scale both are located at position (0.92, 0). Both of these minor scales have the same balanced tritones: [C♯, G], [A, D♯], and [A♯, E].

The row shading in Table 4-1 highlights the pairing of minor scales. If two minor scales are adjacent in the table, and have the same shading, then they are paired in the sense that they are located nearly on top of each other in Figure 4-2. Examining the scale roots (Column 1) of rows paired in this way reveals an interesting musical property that emerges from hidden unit activities: paired minor scales have roots that are a tritone apart. For instance, the harmonic minor scales for G♯ and D are both located at (0, 0.93) in the hidden unit space, and G♯ and D are a tritone apart (i.e., opposite one another in Figure 4-6).

The proximity relations among minor scales in the hidden unit space are based on a definite musical property (shared balanced tritones), and produce a very regular pairing of scales. However, the balanced tritones property is an atypical musical regularity in comparison to more traditional spatial representations. As a result, the proximity relations in the hidden unit space are markedly different from proximity relations in more traditional spaces.

Consider the following set of harmonic minor scales: G♯m, Dm, Bm, and Fm. Traditional spatial representations would plot G♯m closer to Bm and Fm than to Dm. This is because it differs from the former two scales by only two pitch-classes but differs from the last one by three. Another way to consider the similarity is that the tonic (G♯) of G♯m is a minor third or three semitones away from the tonic of either Bm or Fm. However, the tonic of G♯m is a full tritone or six semitones away from the tonic of Dm. In a traditional spatial depiction of these four scales or keys, Dm is farther away from the other three.

Figure 4-6 provides two examples of spatial arrangements based on measures from traditional music theory. Multidimensional scaling (MDS) was used to analyze distances between scales (where distance is a function of the number of pitch-classes shared by two scales) into a spatial map in which scales that are similar to one another are located closer together. The left part of Figure 4-6 provides the two-dimensional MDS solution; it accounts for 58.1% of the variance in the original distance matrix. It places all of the major scales in a well-known musical pattern: the circle of perfect fifths. It also arranges the minor scales around a separate circle of perfect fifths, and places this circle inside the circle of major scales. The two different circles of perfect fifths in this MDS solution have different orientations; for instance, the inner circle of harmonic minor scales is oriented so that the two scales closest to A major are F♯ minor and B minor. Note that within this MDS solution G♯m is closer to Bm and Fm than it is to Dm.

Figure 4-6

Figure 4-6 A two-dimensional and a three-dimensional multidimensional scaling solution that arranges scales associated with different keys in a spatial map.

The right side of Figure 4-6 presents the three-dimensional MDS solution for the scale distances matrix. The added dimension improves the fit to the data; it accounts for 69.2% of the variance in the original distance matrix. The third dimension pulls different sets of minor scales away from one another in the vertical direction. At the top of the cube one finds D♯m, Cm, Am, and F♯m; in the middle of the cube G♯m, Fm, Dm, and Bm; at the bottom of the cube C♯m, G♯m, Gm, and Em. Once again, G♯m is closer to Bm and Fm than to Dm. There is also some vertical separation between major scales in this solution, but all are generally positioned outside and around the middle of the cube. What is the relationship between the two MDS solutions in Figure 4-6? It appears that if one were to project the positions of the scales in three-dimensional MDS solution down to the bottom plane of the cube, the result would be the two circles of perfect fifths that are apparent on the left of Figure 4-6.

Now let us compare the spatial arrangements of scales based on traditional theory (Figure 4-6) with the spatial arrangements of scales in the hidden unit space (Figure 4-2). In particular, let us continue to focus upon the four minor scales G♯m, Bm, Fm, and Dm. The spatial relationships between these four harmonic minor scales are quite different in the hidden unit space. First, rather than being farther apart, scales whose roots are a tritone apart (G♯m and Dm, or Bm and Fm) have nearly identical locations in Figure 4-2. Second, rather than being closest to scales a minor third away, scales that differ by this amount are very far apart in the hidden unit space. In particular, the points for G♯m and Dm are the two that are farthest away from the points for Fm and Bm in Figure 4-2. Table 4-1 shows the coordinates for the latter two scales are the reflection of the coordinates of the former [(0.92, 0) vs. (0, 0.92)]. Why are different scales with similar balanced tritone structure so far apart in Figure 4-2? The three pitch-classes that are not part of a balanced tritone must be responsible.

When we present a harmonic minor scale to our multilayer perceptron, the three unbalanced pitch-classes are in essence the only source of net input to either hidden unit. This is because all other pitch-classes are balanced, and therefore have near zero to net input. The final three columns of Table 4-1 provide information regarding the effects of unbalanced pitch-classes on a scale’s position in the hidden unit space.

The first of these three columns simply lists, for each harmonic minor scale, the three pitch-classes that are unbalanced. The remaining two columns provide the net input to each hidden unit that is only due to the three unbalanced pitch-classes. These columns reveal some interesting properties concerning the spatial arrangement of the hidden unit space.

First, consider two scales that have identical balanced tritones (e.g., Fm and Bm). These two scales differ from one another in terms of their three unbalanced pitch-classes. For Fm these are F, G♯, and C, and for Bm these are B, D, and F♯. Look in Table 4-1 at the net inputs that each of these unbalanced sets produces in each hidden unit. One of these sets produces net inputs that are essentially equal in magnitude, but opposite in sign, to the net inputs produced by the other set. Remember that the symmetric form of the Gaussian activation function means that it in essence ignores the sign of net inputs. Thus while these two different sets of unbalanced pitch-classes send different net inputs to the hidden units, the hidden units generate the same activity to either set, so the two patterns wind up in the same position in the hidden unit space. This account holds for any of the paired scales in Table 4-1

Now consider a different pair of scales, G♯m and Dm, whose balanced tritone structure is similar to that of Fm and Bm. The unbalanced pitch-classes for G♯m and Dm produce the same magnitude of net input (ignoring sign) to the hidden units as do the unbalanced pitch-classes for Fm and Bm. However, they send this magnitude to the opposite hidden units! As a result, the position of these two scales is reflected in the hidden unit space (i.e., the coordinates [x, y] of Fm and Bm become the coordinates [y, x] of G♯m and Dm).

This analysis leads to two general musical statements about the positioning of harmonic minor scales along the line through hidden unit space. First, two scales whose roots are a tritone apart will have the same position on this line. Second, two scales whose roots are a minor third apart will fall at positions reflected across the centre of this line.

The discussion of the geometry of Figure 4-2 in this section indicates that while the proximity relations in Figure 4-2 (made clear in the Table 4-1 coordinates) are quite different than traditional ones (e.g., Figure 4-6), they are based upon musical structure. The atypical arrangement of scales in the hidden unit space is a consequence of the multilayer perceptron detecting a particular musical feature, balanced tritones. Because of this musical regularity, scales that typically are viewed as being distantly related become highly similar, and scales that are typically viewed as being highly similar become distantly related. In short, paying attention to the hidden unit space can reveal novel properties that are still completely consistent with formal music theory.

4.5.2 The Missing Balanced Tritone

The hidden units of the scale mode network detect tritone balance. This property, reflected in the connection weights from input units to hidden units (Figure 4-4), recognizes that a harmonic minor scale contains three sets of balanced tritones, two of which are present in the scale; the third pair is balanced because its two pitch-classes are both absent. This balance is critical for the function of the hidden units: balanced tritones contribute little to hidden unit net input, and therefore cause high hidden unit activity because the µ of each Gaussian activation function is zero. This in turn causes the arrangement of minor scales in the Figure 4-2 hidden unit space.

The tritone that balances because both of its pitch-classes are absent is critical to the ability of hidden units to separate harmonic minor scales from major scales in the hidden unit space. It also provides an interesting twist on music theory. This is because the necessary absence of tritones is not a traditional component of music theory. For instance, consider using mathematical set theory to explore musical formalisms (Forte, 1973; Roig-Francolí, 2008; Straus, 2005). One contribution of set theory is the ability to generate, for some set of pitch-classes, a six-dimensional vector called an interval class vector or an ic vector. An ic vector provides information about the frequency of occurrence of different musical intervals in a musical entity, where a musical interval is the distance between two pitch-classes measured in semitones. In Forte’s system, the set of pitch-classes that define any harmonic minor scale produces the ic vector 335442. The last digit in this ic vector indicates the presence of two tritone intervals (i.e., in terms of Figure 4-5 two balanced pairs of tones present in the scale). Similarly, the ic vector for any of the major scale stimuli presented to the scale mode network is 253461. Its last digit indicates that a major scale includes only a single pair of pitch-classes a tritone apart, as was earlier illustrated in Figure 4-5. In other words, ic vectors make explicit the well-known property that a major scale includes only a single tritone interval. It might be tempting to conclude that the scale mode perceptron simply uses the number of present tritones to distinguish the major mode from the harmonic minor mode.

Importantly, what the ic vector for a harmonic minor scale fails to make explicit is the absence of the two additional pitch-classes that represent the third balanced tritone (e.g., the absence of both C♯ and G in the A harmonic minor diagram in Figure 4-6). Identifying which pitch-classes are absent from a musical object might be an odd way for a human musical analyst to think about musical scales, but it arises naturally in this artificial neural network. Indeed, when the nature of the absent balanced tritone is considered, we encounter some interesting music theory insights.

In Section 4.5.1, we noted that the arrangement of minor scales in Figure 4-2 was affected by the three pitch-classes that are not part of a balanced tritone. For each harmonic minor scale, Table 4-1 provides the three unbalanced pitch-classes in its final three columns. From the perspective of music theory, the unbalanced pitch-classes suggest an elegant and simple musical interpretation: they are the three pitch-classes that define the minor triad of the harmonic scale. This means that for the minor stimuli, Figure 4-2 also provides a spatial arrangement not of harmonic minor scales but instead of minor triads.

Other spatial representations have been developed to represent the relationships between triads or chords (Hook, 2006). In Hook’s Tonnetz, each node represents a triad; connections between nodes represent a possible progression from one triad to another. Furthermore, each connection between nodes represents an efficient voice leading. This is because linked triads share two pitch-classes, even though they are opposite in mode (i.e., minor vs. major). In other words, Hook’s (2006) representation is similar to the scale relations that we considered earlier to create Figure 4-6.

Figure 4-7

Figure 4-7 The structure of Hook’s (2006) Tonnetz for triads.

The nearest neighbours of triads aligned in the same row in Figure 4-7 are related by a short musical interval (a minor third); the nearest neighbours of Bm are G♯m and Dm. Furthermore, triads related by a longer musical interval are not nearest neighbours. For instance, Bm and Fm are farther apart in Figure 4-7.

We are now in a position to contrast the spatial arrangement of triads in Figure 4-7 with the hidden unit space of Figure 4-2. It was noted above that a consequence of the missing balanced tritone is that the positions of harmonic minor scales in the hidden unit space also provide the positions of minor triads. The proximity relations among minor triads in the hidden space are quite different from those in the triad Tonnetz. In particular, minor triads that are farther apart in Figure 4-7 occupy identical locations in the hidden unit space (e.g., Bm and Fm). In addition, nearest neighbours in the triad Tonnetz are quite far apart in the hidden unit space. For instance, G♯m and Dm, the two triads closest to Bm in Hook’s space, are the farthest away from Bm in the hidden unit space. Of course, the fact that the hidden unit space arranges minor triads in a line, while Hook’s Tonnetz arranges them in a grid, is another crucial difference between the two. None of these differences between the two spaces should be particularly surprising: hidden unit space locations reflect tritone structure, while Hook’s (2006) triad Tonnetz bases proximity in terms of shared pitch-classes.

If we interpret the hidden unit space as establishing certain proximity relations among minor triads, and these differ from those in traditional geometric models of harmonic progression or voice leading, then what does the hidden unit space imply about chord progressions? The key feature of the hidden unit space is that two minor triads that are a tritone apart have the same location, and therefore are equivalent. In terms of progressions, this suggests that one can easily change from one triad to another that is a tritone away. Interestingly, this implication of the hidden unit space parallels the tritone’s relevance to jazz.

One common technique to add variety to jazz chord progressions is the use of chord substitutions. In this practice, one chord in the progression is replaced with another musically related chord. For example, in tritone substitution a dominant seventh chord in one key is replaced with the dominant seventh chord from a key that is a tritone away. This is possible because both dominant seventh chords contain the same two notes a tritone apart, making the original changes harmonically similar to the changes created by the tritone substitution (Tymoczko, 2008).

Tritone substitution is possible (i.e., sounds musically correct) in jazz because the two dominant seventh chords contain exactly the same tritone, and the two keys a tritone apart share enough pitch-classes to make them harmonically similar (Tymoczko, 2008). In certain respects, the multilayer perceptron is detecting analogous structure in harmonic minor scales, organizing them in such a way that the nearest neighbour to a minor scale in the hidden unit space will be a scale with identical underlying tritone structure. Interpreting these scale positions as being the positions of minor triads reveals that the hidden unit space of the multilayer perceptron may have a special affinity for tritone substitution.

4.6 Summary and Implications

This chapter explored the properties of a multilayer perceptron trained to identify the mode of a stimulus scale (major or harmonic minor). A network that employed two hidden value units accomplished this task. Further investigation of the internal structure of this network revealed that both hidden units detected a particular property: tritone balance. The network learned that harmonic minor scales have three balanced tritones while a major scale has only one.

The scale mode network’s musical theory departs from traditional theory. In particular, the functioning of this network depends upon an absent balanced tritone: two pitch-classes, separated by a tritone, which are both missing from a harmonic minor scale. Furthermore, by considering this tritone in combination with the two present balanced tritones we discovered that the only unbalanced pitch-classes in a harmonic minor scale define a minor triad. This observation is a new addition to music theory.

The most telling indication that the network’s emphasis on balanced tritones represents a departure from traditional music theory is to consider that it leads to a spatial arrangement of scales (and of minor triads) that is markedly different from those based on traditional notions of musical relationships like shared pitch-classes. Figures 4-6 and 4-7 arrange musical entities in a map whose distances reflect the number of shared pitch-classes. Both of these maps interweave major and minor musical elements. In contrast, the spatial arrangement the network provides—its hidden unit space illustrated in Figure 4-2—is markedly different. Major and minor musical elements do not interweave; instead, a linear arrangement of minor harmonic scales reveals distance relationships based on shared tritone structure instead of shared pitch-classes.

The novel spatial arrangement of scales that the network reveals provides some interesting suggestions regarding composing music. One fundamental principle of composition is modulation, in which a rational structure is used to change from one musical key to another midway through a piece. Schoenberg’s (1969) spatial map of keys provides a spatial guide to such modulation. One can modulate from the current musical key to another nearby in the map without musical disruption because keys that are near one another in the map have shared properties (e.g., they have many pitch-classes in common).

The proximity relationships between minor scales in the hidden unit space of Figure 4-2 space suggest an alternative approach to explore for modulating between minor keys. In particular, the hidden unit space suggests that one may be able to modulate between one minor key and another that is a full tritone away, not because of shared pitch-classes but instead because of shared balanced tritones. Furthermore, the hidden unit space suggests the possibility of successful modulation to keys other distances away, because these scales are close to one another in the hidden unit space. For instance, G♯m and Dm are closest to D♯m and Am in the space. Modulating from G♯m to D♯m is part of common practice (these two keys are a perfect fifth apart), as does modulating from Dm to Am (these two keys are a perfect fourth apart). However, the same geometry suggests less typical modulations a minor second apart: from G♯m to Am, or from Dm to D♯m. The hidden unit space, like more traditional spatial representations, is a source of alternative compositional ideas.

Chapter 5: Networks for Key-Finding