3 The Scale Tonic Perceptron

3.1 Pitch-Class Representations of Scales

In this chapter, I begin our synthetic psychology of music by training networks on a very basic task: identifying the tonic note of a stimulus scale. I begin by describing some basic properties of scales and consider how to present particular scales to a simple network and how to represent network responses to these stimuli. I then provide the details about the kind of network that I use and the method that I use to train it. I end by interpreting the internal structure of a trained network by examining the connection weights that are produced by training the network to generate the tonic of each scale in the training set.

Figure 3-1

Figure 3-1 An example major scale, and an example harmonic minor scale, represented using multiple staffs.

3.1.1 Musical Scales

Musical scales provide the foundation of Western tonal music (Laitz, 2008). The current chapter is concerned with two of these scales: the major and the harmonic minor. Figure 3-1 provides an example of a major scale (ascending and descending) and an example of a harmonic minor scale (ascending and descending) in musical staff notation. Both of these scales are in the key of A. These two types of scales play a central role in defining the musical stimuli that I will train a simple artificial neural network to classify in this chapter.

When discussing a particular musical scale it is typical to identify each note in terms of its position or degree within the scale. Typically the tonic of the scale (its first note) is assigned the value 1, its second note is assigned the value 2, and so on (Laitz, 2008). Figure 3-1 labels each note in the two scales in this fashion. Each scale degree also has a name that is sometimes used instead of the note’s number (Laitz, 2008). Note 1 is the tonic (or the root) of the scale, 2 is the supertonic, 3 is the mediant, 4 is the subdominant, 5 is the dominant, 6 is the submediant, and 7 is the leading-tone.

3.1.2 Scales and Pitch-Classes

What is the difference between a major scale and a harmonic minor scale? In Western tonal music, these scales differ in their patterns of distances or musical intervals between adjacent notes (Laitz, 2008). These different patterns in turn produce different musical experiences, sonorities, or emotional effects. In order to consider the patterns that define each of these scale types, let us first introduce the notion of pitch-class.

An examination of the two different musical scores in Figure 3-1 indicates that both use 18 notes to represent the scales. However, six of the notes (degrees 2 through 7) are repeated. One of the notes (degree 1, which is the note A in both scales) has two versions that are an octave apart.

In terms of musical sound, the two versions of the note A in the score are distinct. The first (the A below middle C on a piano, sometimes called A3) begins each scale and is associated with a note with a fundamental frequency of 220 Hz. The second (the A above middle C or A4) ends the first line of each scale and is associated with a note with a fundamental frequency of 440 Hz.

A3 and A4 are clearly different notes or different pitches; the former is lower than the latter. Phenomenologically, however, these two different notes seem quite related (Révész, 1954). “The octave note [A4] therefore bears a double relation to the prime tone [A3]. In one respect, of all the notes within the span of the octave, it is the one most dissimilar to the prime tone, since it is the farthest from it in point of distance. In another respect, however, it is the note most similar to the prime tone, since of all the notes through which we passed, it is the one most similar to it in quality” (Révész, 1954, pp. 56–57). This relationship between notes an octave apart is called octave equivalence.

Révész (1954) used the phenomenological notion of octave equivalence to argue that music cognition cannot simply pay attention to physical properties (e.g., sound frequency) but must also be sensitive to a nonphysical dimension: musical structure. Psychophysical experiments inspired by this perspective have studied octave equivalence in a variety of species, including humans, rats, starlings, and chickadees (Allen, 1967; Blackwell & Schlosberg, 1943; Cynx, 1993; Demany & Armand, 1984; Hoeschele, Weisman, Guillette, Hahn, & Sturdy, 2013; Shepard, 1964). These studies provide mixed support for octave equivalence. Some results suggest that the demonstration of octave equivalence depends critically on musical context and listener expectations (Deutsch & Boulanger, 1984). Others claim that octave equivalence may be a universal property (Patel, 2008).

While its psychophysical support is murky, octave equivalence is a central assumption in the formal analysis of music (Forte, 1973; Hanson, 1960; Lewin, 2007; Roig-Francolí, 2008; Straus, 1991, 2005; Tymoczko, 2011). Musical analysis assigns two pitches separated by one or more octaves (e.g., A3 and A4) to the same pitch-class (i.e., A). In so doing, it uses pitch-class set notation, sometimes shortened to pc set, to provide a simpler representation of the notes presented in standard staff notation like Figure 3-1. With pc sets, a musical selection becomes a set of pitch-classes. For instance, while both scales in Figure 3-1 are written in the staves with 18 notes, in reality each only employs seven different pitch-classes, which is why scale degree can be represented using only seven different Roman numerals.

Forte (1973) uses the axiom of octave equivalence to assign two different pitches that are an octave apart to the same pitch-class. In Western music, there are only 12 different pitch-classes available (see Figure 3-2 below). Forte further simplifies pitch-class set notation with the axiom of enharmonic equivalence. In standard staff notation, different symbols can represent the same pitch. For instance, A4 is a particular musical pitch. In a musical score, depending upon context, this one pitch could be represented with the symbol “A,” the symbol “B♭♭,” or the symbol “G♯♯.” The axiom of enharmonic equivalence uses one symbol (A) to stand for any of these different symbols. (In this book, this axiom is exploited to reduce the use of the ♭ symbol.)

Let us illustrate the structure of the major and harmonic scales from Figure 3-1 with a geometric exploitation of pitch-class notation. First, we arrange all 12 pitch-classes of Western music around a circle of minor seconds, so that adjacent pitch-classes are a minor second (one semitone) apart. Second, we draw spokes from the centre of the circle to indicate which pitch-classes are included in a scale. Figure 3-2 provides this illustration for both of the scales from Figure 3-1. Note that both figures have only seven spokes, because both scales are constructed using only seven different pitch-classes.

Western music is typically tonal: compositions are set in a particular musical key, which in turn provides important structure to an audience. For instance, when a musical key is established, some notes (in particular the tonic, subdominant, and dominant notes of the key) are more stable than others (Krumhansl, 1990a). As a result, a composer can manipulate a piece’s effect on a listener by moving from less stable to more stable tonal elements (Schoenberg, 1969). Tonality is created in music by restricting the use of musical notes (Browne, 1981). Consider the pitch-classes of the A major scale illustrated in Figure 3-2. In the circle of minor seconds, there are 12 different pitch-classes available. However, the A major scale only employs seven of them: A, B, C♯, D, E, F♯, and G♯.

The pitch-classes that are not used in the A major scale are missing for a reason. A major scale has a specific pattern of distances between adjacent notes. Again, consider Figure 3-2’s illustration of A major. Start at A, and move clockwise around the circle. The next note with a spoke (B) is two semitones, or a full tone, higher than A. The next note encountered (C♯) is a full tone higher than B. However, the next note with a spoke (D) is only a semitone higher than the preceding note (C♯). Following the circle all the way around until the pitch-class A is encountered again, a specific pattern of between-note distances is apparent: tone-tone-semitone-tone-tone-tone-semitone. This pattern of between-note distances defines a major scale (Laitz, 2008).

If one excludes a different subset of pitch-classes from the circle of minor seconds, then one creates a different type of musical scale. Consider the A harmonic minor scale that is also represented in Figure 3-2. The A harmonic minor scale is defined by the pitch-classes A, B, C, D, E, F, and G♯, which produces different between-note distances. Look at the A harmonic diagram in Figure 3-2, starting at A and moving clockwise around the circle. Most of the distances are either semitones or full tones as was the case for A major, but there is also one distance (from F to G♯) that is an augmented second (three semitones). The complete pattern of between-note distances that defines a harmonic minor scale is tone-semitone-tone-tone-semitone-augmented second-semitone.

Importantly, the pattern of between-note distances that defines either a major or a harmonic minor scale is constant. This means that the two spoke patterns presented in Figure 3-2 define the major or harmonic scale whose tonic is any of the 12 pitch-classes. For instance, consider the spokes of the A major circle as a solid unit. If we rotate this unit 30° clockwise, then the between-note distances will be unchanged, but the pattern starts on a new tonic note (B♭). This rotated spoke pattern defines the B♭ major scale. Similarly, if one takes the entire set of spokes for the A harmonic minor scale in Figure 3-2 and rotates them 30° counter-clockwise, the pattern now defines the G♯ harmonic minor scale.

Figure 3-2

Figure 3-2 The circle of minor seconds can be used to represent the pitch-classes found in the A major and the A harmonic minor scales.

The spoke patterns illustrated in Figure 3-2 clearly represent any major or any harmonic minor musical scale. This suggests, for instance, that if we were presented with a particular spoke pattern, then we could examine it for particular patterns of between-note distances, identifying whether a major or harmonic minor scale was depicted. Furthermore, a closer examination of the relative positions of the various spokes would permit us to determine the tonic note of the depicted scale.

This latter task defines our first case study of a musical network. I will present a perceptron the set of pitch-classes that define a major or harmonic minor scale, and train it to identify the tonic note of the presented scale.

3.1.3 Pitch-Class Representations

Pitch-class representations permit mathematics to be used to explore and manipulate musical structure (Forte, 1973; Lewin, 2007). For mathematical manipulation, a pitch-class representation is an ordered set of numbers that indicate which pitch-classes are present in a musical entity. Using Allen Forte’s pitch-class notation, each pitch is assigned a specific integer, with C represented as 0, C♯ as 1, and so on. Therefore, in this system, the raw pitch-class representation of the A major scale is the ordered set [9, 11, 1, 2, 4, 6, 8].

Pitch-class representations are also often used when artificial neural networks learn to classify musical patterns. However, this sort of pitch-class representation is slightly different from the mathematical representation described above. For pitch-class representation in a musical network, the network has 12 different input units. Each input unit represents the presence or absence of a particular pitch-class, as is the case in Figure 3-3 in Section 3.2. For instance, imagine that the first input unit represents the pitch-class A; the second input unit represents the pitch-class B; and so on. If the pitch-class A is present in a stimulus, then we turn its input unit “on” by giving it an activity of one. However, if the pitch-class A is absent from a stimulus, we turn its input unit “off” using an activity of zero. In short, a pitch-class representation for a network is a string of 12 bits representing the presence or absence of the 12 possible pitch-classes. For instance, one would use the following pitch-class representation to present the A major scale to a network: [1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1].

Table 3-1 provides this type of pitch-class representation for each of the possible major and harmonic minor scales in Western music. This table defines 24 different stimuli that can be presented to a perceptron that has 12 different input units. In the next section, I describe training a perceptron to accomplish the following pattern recognition task: when presented with one of the stimuli provided in Table 3-1, the perceptron will indicate the tonic note of that stimulus, ignoring whether the stimulus represents a major or a harmonic minor scale.

Before proceeding with an account of the perceptron, we can use Table 3-1 to illustrate why this task is not going to be straightforward for a perceptron to learn. We saw in Figure 3-2 that harmonic minor scales are distinct in that they include an interval of an augmented second. One might think that a perceptron could simply detect that interval and then use its position to determine the scale’s tonic. However, the easy identification of the augmented second requires that the pitch-classes already be arranged in the order in which they are found in a harmonic minor scale. This will not be the case for the perceptron; it will always receive pitch-classes in the same order regardless of scale. That is, the input patterns are the different rows of Table 3-1. One cannot simply look at any of those rows and immediately see where the augmented second lies, or for that matter where the tonic of the scale is positioned. I am making the problem difficult for the perceptron by always presenting pitch-classes in the same order. For this reason, I am interested in discovering how the perceptron deals with this difficulty when it learns how to identify scale tonics.

Table 3-1 Pitch-class representation (for an artificial neural network) of 12 different major and 12 different harmonic minor scales.

	Scale tonic	Pitch-class components of scale
	Scale tonic	A	A#	B	C	C#	D	D#	E	F	F#	G	G#
Major scale	A	1	0	1	0	1	1	0	1	0	1	0	1
	A#	1	1	0	1	0	1	1	0	1	0	1	0
	B	0	1	1	0	1	0	1	1	0	1	0	1
	C	1	0	1	1	0	1	0	1	1	0	1	0
	C#	0	1	0	1	1	0	1	0	1	1	0	1
	D	1	0	1	0	1	1	0	1	0	1	1	0
	D#	0	1	0	1	0	1	1	0	1	0	1	1
	E	1	0	1	0	1	0	1	1	0	1	0	1
	F	1	1	0	1	0	1	0	1	1	0	1	0
	F#	0	1	1	0	1	0	1	0	1	1	0	1
	G	1	0	1	1	0	1	0	1	0	1	1	0
	G#	0	1	0	1	1	0	1	0	1	0	1	1
Harmonic minor scale	A	1	0	1	1	0	1	0	1	1	0	0	1
	A#	1	1	0	1	1	0	1	0	1	1	0	0
	B	0	1	1	0	1	1	0	1	0	1	1	0
	C	0	0	1	1	0	1	1	0	1	0	1	1
	C#	1	0	0	1	1	0	1	1	0	1	0	1
	D	1	1	0	0	1	1	0	1	1	0	1	0
	D#	0	1	1	0	0	1	1	0	1	1	0	1
	E	1	0	1	1	0	0	1	1	0	1	1	0
	F	0	1	0	1	1	0	0	1	1	0	1	1
	F#	1	0	1	0	1	1	0	0	1	1	0	1
	G	1	1	0	1	0	1	1	0	0	1	1	0
	G#	0	1	1	0	1	0	1	1	0	0	1	1

Note. Each row provides the pitch-classes that are included in a particular scale whose mode and tonic are provided in the two columns on the left. The number 1 indicates that a pitch-class is included in the scale, and a 0 indicates that it is not included in the scale.

3.2 Identifying the Tonics of Musical Scales

3.2.1 Task

Our goal is to train a perceptron to determine the tonic note of a pattern of pitch-classes presented to its input units; each presented pattern defines either a major scale or a harmonic minor scale.

Figure 3-3

Figure 3-3 Architecture of a perceptron trained to identify the tonic notes of input patterns of pitch-classes.

3.2.2 The Perceptron

A perceptron is a simple artificial neural network originally invented by Frank Rosenblatt in the 1950s (Rosenblatt, 1958, 1962). This type of network consists of a layer of input units directly connected to a layer of output units (Figure 3-3). In this figure, circles represent processing units and lines between circles represent weighted connections between processers through which signals are sent. Perceptrons are simple because they do not include any hidden units between input and output units. This means that perceptrons are less powerful than the more modern multilayer networks that we will see in the next chapter (Minsky & Papert, 1969). However, perceptrons are still powerful enough to provide interesting models of some aspects of cognition (Dawson, 2008; Dawson, Dupuis, Spetch, & Kelly, 2009; Dawson, Kelly, Spetch, & Dupuis, 2010b).

How do perceptrons learn? Modern artificial neural networks use continuous, nonlinear activation functions to convert net inputs into output unit responses. As indicated in Chapter 2, many of the networks described in this book use value units, which employ a Gaussian activation function. The use of a continuous activation function permits networks to be trained using gradient descent learning. Gradient descent learning is a form of supervised learning in which output unit error is used to modify existing connection weights in order to reduce future error. That is, after connection weights are changed, the next time the same pattern is presented to the network it will generate less error from it. In general, learning is defined as w_ij(_t+₁) = w_ij(_t) + Δw_ij, where w_ij is the weight of the connection from input unit i to output unit j, (t+1) and (t) indicate time, and Δw_ij is a computed weight change. This equation simply says that each new weight is equal to some computed weight change that is added to the old weight.

The computed weight change is based on the error calculated for the output unit. In general, it is equal to a fractional learning rate (ε) times the error computed for the unit at the output unit end of the connection (t_j – a_j) times the activity of the unit at the input end of the connection (a_i). In other words Δw_ij = ε(t_j – a_j)a_i. In gradient descent learning, output unit error is also scaled by the derivative of the (continuous) activation function (Dawson, 2004, 2008). This attempts to help learning by changing weights in such a way that the network moves down the steepest slope of an error surface. The particular mathematics of gradient descent learning depends on which activation function is used in the network (Dawson & Schopflocher, 1992).

By repeatedly presenting each of the patterns in a training set, and by modifying connection weights using supervised learning, I can reduce errors to the point that it can be said that the perceptron is generating the correct response to each stimulus. At this point, one can say that the network has converged. That is, it has discovered a set of connection weights that correctly converts each stimulus into a correct response.

3.2.3 The Scale Tonic Perceptron

The scale tonic perceptron, illustrated in Figure 3-3, has 12 input units, each representing the presence or absence of a particular pitch-class in a stimulus. The perceptron also has 12 output units, which also represent different pitch-classes. The input units are used to present the perceptron a scale represented as pitch-classes (any of the rows in Table 3-1). The perceptron learns to turn on only one of its output units, the one that represents the pitch-class of the tonic of the input pattern. The grey shading in Figure 3-3 illustrates an example of a desired outcome of training: the network correctly responds with a tonic of C when presented the pitch-classes of the C major scale. Each output unit in the perceptron is a value unit that uses a Gaussian activation function to convert incoming signals to internal activity (Dawson & Schopflocher, 1992).

3.2.4 Training Set and Training

The training set consists of 24 different input patterns; each pattern is a row of numbers taken from Table 3-1. For any input pattern, the desired response turns the output unit representing the pattern’s tonic note on, and turns the other 11 output units off. Training is conducted with the gradient descent rule developed specifically for perceptrons whose output processors are value units (Dawson, 2004). The software for training the perceptron was the Rosenblatt program (Dawson, 2005), which is available as freeware from the author’s website http://www.bcp.psych.ualberta.ca/~mike/AlienMusic/

All connection weights in the perceptron are set to random values between −0.1and 0.1 before training begins. The biases of the output units (i.e., the value of µ for their Gaussian activation functions) are initialized to zero, and are not modified by training. We employ a learning rate of 0.005. Training proceeds until the network generates a “hit” for each output unit, for each of the 24 patterns in the training set. A “hit” is defined as activity of 0.9 or higher when the desired response is one, or as activity of 0.1 or lower when the desired response is zero.

Under these conditions, perceptrons of the type represented in Figure 3-3 learn to identify the tonic of an input pattern extremely rapidly, generally requiring between 20 and 30 epochs of training before generating 12 “hits” to each training stimulus. An epoch of training for this network involves training the network once on each of the 24 training patterns; we randomize the order of pattern presentation every epoch. The particular network described in more detail in the next section learned to solve the problem after only 19 epochs of training.

3.3 Interpreting the Scale Tonic Perceptron

3.3.1 Interpreting Connection Weights

A perceptron consists simply of a set of input units connected to a set of output units. Therefore, to interpret the internal structure of this type of network one is limited to examining the various connection weights in the network after training has succeeded. Table 3-2 above provides the entire set of 144 connection weights observed in a typical scale tonic perceptron at the end of training. Each column of the table is associated with one of the output units; each row of the table is associated with one of the input units. Therefore, one column of numbers in Table 3-2 provides the set of connection weights from each of the 12 input units to one of the output units.

Table 3-2 The connection weights from each input unit to each output unit for a perceptron trained to identify the tonic pitch-class of an input major or harmonic minor scale pattern.

Input units (scale pitch-classes)	Output units (tonic pitch-classes)
Input units (scale pitch-classes)	A	A#	B	C	C#	D	D#	E	F	F#	G	G#
A	0.06	−0.24	0.75	0.38	−0.40	0.33	−0.65	0.27	−0.30	−0.18	−0.11	0.44
A#	0.37	0.10	−0.34	0.77	−0.49	−0.49	0.29	−0.60	0.19	−0.16	−0.24	0.02
B	−0.01	0.36	0.07	−0.25	−0.72	−0.44	−0.55	0.32	−0.65	0.18	−0.34	0.13
C	0.27	0.01	0.41	−0.05	0.26	−0.63	−0.41	−0.52	0.37	−0.77	0.18	0.22
C#	0.39	0.22	0.04	0.48	−0.10	0.37	−0.77	−0.47	−0.49	0.38	−0.71	−0.09
D	−0.20	0.29	0.21	0.19	−0.36	−0.15	0.33	−0.75	−0.47	−0.48	0.44	0.80
D#	0.68	−0.24	0.33	0.20	0.00	−0.44	−0.04	0.26	−0.75	−0.51	−0.40	−0.38
E	−0.30	0.63	−0.21	0.32	−0.33	−0.07	−0.29	−0.08	0.25	−0.75	−0.34	0.49
F	0.54	−0.36	0.65	−0.17	−0.26	−0.23	−0.04	−0.33	−0.09	0.25	−0.75	0.45
F#	0.45	0.49	−0.33	0.80	0.12	−0.32	−0.22	−0.03	−0.39	−0.05	0.29	0.75
G	0.74	0.46	0.52	−0.37	−0.74	0.23	−0.41	−0.23	−0.05	−0.46	−0.15	−0.33
G#	−0.31	0.81	0.45	0.47	0.44	−0.64	0.23	−0.30	−0.22	−0.12	−0.33	0.16

Note. Each row provides the connection weight from a particular input unit to each of the 12 output units.

The set of connection weights in Table 3-2 represents one perceptron’s “knowledge” about the relationship between scale patterns and tonic pitch-classes. The problem is to discover the nature of this knowledge by examining this entire pattern of connectivity. Ordinarily, when confronted with a matrix of numbers like Table 3-2, one might explore its structure by using multivariate statistics (such as factor analysis or multidimensional scaling) to make the matrix more understandable. Fortunately, we can understand the workings of a scale tonic perceptron by simply rearranging Table 3-2 in a manner that takes advantage of some general, well-known musical properties.

Consider the first column of Table 3-2. The first entry in this column is the weight from the input pitch-class A to the output tonic pitch-class A. However, this entry can also be described as the weight of the connection to the output pitch-class A from the pitch-class that is zero semitones away from the output pitch-class (moving in a clockwise direction around the circle of minor seconds that was used in Figure 3-2). Similarly, the second entry in the first column of Table 3-2 is also the weight to that pitch-class from the note that is one semitone away around the circle of minor seconds.

When we apply this alternative interpretation to other columns of Table 3-2, we find that entries that are in the same row do not represent notes the same distance away from the output pitch-class. For instance, the first entry in the second column is the weight to the output pitch-class A♯ from the input pitch-class (A) that is 11, not zero, semitones away.

One can use this alternative interpretation to rearrange the Table 3-2 weights so that the first connection weight appearing in a column corresponds to a zero semitone distance between the input and output pitch-classes, the next corresponds to a one semitone distance, and so on. Table 3-3 represents the Table 3-2 weights in this alternative manner. Importantly, the connection weights in Table 3-3 are identical to those in Table 3-2; we simply provide them a different row label and reorganize the table to reflect the difference in labelling. In a real sense, the perceptron itself “interprets” the weights as labelled in Table 3-3, and not as labelled in Table 3-2, because an inspection of Table 3-3 reveals an elegant solution to converting an input scale pattern to an output tonic pitch-class, as we will now proceed to discuss.

If one inspects the values of the connection weights along each row of Table 3-3 (ignoring for the moment whether the weight is positive or negative), then one sees commonality. In general, all of the connection weights in each row are roughly the same size. (If one ignores sign, and computes the standard deviation for each row, the standard deviations range in size from 0.04 to 0.07.) In other words, if one considers input pitch-classes in terms of relative distance from the output pitch-class rather than in terms of absolute pitch-class, then structure emerges without the need for multivariate statistics.

Table 3-3 The rearranged connection weights from Table 3-2.

Number of semitones between input pitch-class and output pitch-class	Output units (tonic pitch-classes)
	A	A#	B	C	C#	D	D#	E	F	F#	G	G#
0	0.06	0.10	0.07	−0.05	−0.10	−0.15	−0.04	−0.08	−0.09	−0.05	−0.15	0.16
1	0.37	0.36	0.41	0.48	−0.36	−0.44	−0.29	−0.33	−0.39	−0.46	−0.33	0.44
2	−0.01	0.01	0.04	0.19	0.00	−0.07	−0.04	−0.03	−0.05	−0.12	−0.11	0.02
3	0.27	0.22	0.21	0.20	−0.33	−0.23	−0.22	−0.23	−0.22	−0.18	−0.24	0.13
4	0.39	0.29	0.33	0.32	−0.26	−0.32	−0.41	−0.30	−0.30	−0.16	−0.34	0.22
5	−0.20	−0.24	−0.21	−0.17	0.12	0.23	0.23	0.27	0.19	0.18	0.18	−0.09
6	0.68	0.63	0.65	0.80	−0.74	−0.64	−0.65	−0.60	−0.65	−0.77	−0.71	0.80
7	−0.30	−0.36	−0.33	−0.37	0.44	0.33	0.29	0.32	0.37	0.38	0.44	−0.38
8	0.54	0.49	0.52	0.47	−0.40	−0.49	−0.55	−0.52	−0.49	−0.48	−0.40	0.49
9	0.45	0.46	0.45	0.38	−0.49	−0.44	−0.41	−0.47	−0.47	−0.51	−0.34	0.45
10	0.74	0.81	0.75	0.77	−0.72	−0.63	−0.77	−0.75	−0.75	−0.75	−0.75	0.75
11	−0.31	−0.24	−0.34	−0.25	0.26	0.37	0.33	0.26	0.25	0.25	0.29	−0.33

Note. In this table each row indicates the relative distance between an input unit’s pitch-class and an output unit’s pitch-class measured in semitones. Thus the connection weights across a row are not from the same input unit, but are instead from different input units that serve the same role in different scales.

Alternatively, the common pattern in each row suggests that any column of weights from Table 3-3 represents a pattern of connectivity that is constant for any output unit. Indeed, this is true for output pitch-classes C♯, D, D♯, E, F, F♯, and G. The remaining output pitch-classes (A, A♯, B, C, and G♯) have essentially the same pattern, but the pattern has been multiplied by −1. Importantly, this difference (being inverted by multiplication by −1) has no effect on output unit behaviour, because the Gaussian activation is symmetric in the positive and the negative directions away from µ.

Recognizing the irrelevance of “column sign,” we can take the connection weights in the columns for the inverted output pitch-classes (A, A♯, B, C, and G♯) and multiply them by −1 to make them coincide with the pattern found in the other seven columns. Then, we can compute the average of each of the 12 rows in Table 3-3 in order to determine the average pattern of connectivity from the input units to the output units. Figure 3-4 provides this average pattern of connectivity.

Figure 3-4

Figure 3-4 The connection weights between the 12 input units and any output unit in the scale tonic perceptron.

On first inspection, it is not immediately apparent how the pattern of weights in Figure 3-4 permits the perceptron to identify an input pattern’s tonic. It is interesting, from a musical perspective, that the most extreme weights are from the pitch-classes that are either 6 or 10 semitones away from the output pitch-class. This is because both of these pitch-classes are absent from both the major and the minor harmonic scales. What is the relationship between the other weights and the tonic?

We answer these questions by undertaking a more specialized inspection of the connection weight pattern. Figure 3-5 plots the same weights as depicted in Figure 3-4 but removes the bars representing weights that receive a zero signal when a major scale pattern is input. Only seven bars remain, because only seven input units are turned on. One key observation about Figure 3-5 is the spacing between adjacent bars. Note that the spacing between bars, measured in tones and semitones, is exactly the spacing that we saw earlier described a major scale pattern.

Figure 3-5

Figure 3-5 The connection weights between the 12 input units and an output unit in the scale tonic perceptron, showing only those connections that have a signal sent through them when the output unit’s major scale pattern is presented to it.

A second property of the weights plotted in Figure 3-5, which is not quite as evident from the figure, is that if one sums their seven values, the result is −0.04. (The sum of these weights is identical to the net input sent to this output unit when the major scale associated with it is presented.) This is important because when the perceptron was trained, the value of µ in each output unit’s Gaussian activation function was held constant at zero. This means that in order to generate a maximum output response, the net input flowing into the output unit must be near zero.

A value of −0.04 is close enough to zero to generate Gaussian activity that is near maximum (0.9950). In short, if signals are sent through the seven connection weights plotted in Figure 3-5—as happens when the major scale pattern associated with that output unit is presented to the network—then the output unit of the tonic pitch-class for this major scale will activate, producing a correct response.

A similar account holds for the case in which the network’s stimulus is the harmonic minor scale. Figure 3-6 is similar to Figure 3-5, but in this case only plots the connection weights that carry signals from the harmonic minor scale input pattern. Again, the distances between adjacent bars are identical to the harmonic minor scale pattern that was discussed earlier, and the sum of the weights (or the net input to the output unit) is close enough to zero (i.e., −0.01) to generate near maximum output unit activity (0.9997).

Figure 3-6

Figure 3-6 The connection weights between the 12 input units and an output unit, showing only those connections that have a signal sent through them when the output unit’s harmonic minor scale pattern is presented to it.

In short, the pattern of weights in Figure 3-4 is highly structured. For the particular output unit that receives signals through these weights, the pattern of weights turns the output unit on when we present the major scale or harmonic minor scale built on the output unit’s tonic. Importantly, for some other stimulus—for instance, the major scale that is associated with a different output unit’s tonic pitch-class—this output unit will not turn on. Because the signal coming from the input units will not match this unit’s major or harmonic minor scale, the net input will not be close to zero, and therefore the output unit will generate near-zero activity. For instance, imagine that Figures 3-4, 3-5, and 3-6 all represent the pattern of connectivity from the input pitch-classes to output pitch-class A. For this output unit, the bar labelled 0 comes from the input pitch-class A, the bar labelled 1 comes from the input pitch-class A♯, and so on.

Now imagine that we present the G major scale to the perceptron. This involves turning on a particular set of input units: A, B, C, D, E, F♯, and G. However, if we consider these input pitch-classes in terms of the relative distance not to G but instead to the output unit for A, we get a very different pattern of incoming signals from either Figure 3-5 or 3-6.

Figure 3-7 plots the connections to output unit A that are active for the G major scale stimulus. This pattern does not match either the major scale pattern of Figure 3-5 or the harmonic minor scale pattern of Figure 3-6. Indeed, if one sums the Figure 3-7 bars, the result (the net input to output unit A) is −1.00. This value is so far away from the value of µ (which, again, equals zero) that the output unit for A generates a very weak Gaussian activity (0.0432).

Figure 3-7

Figure 3-7 The active connections to the output unit for pitch-class A when the network is presented the G major scale.

3.3.2 Summary of the Perceptron

We can now summarize what we know about the artificial neural network that has been the subject of this chapter.

We trained a perceptron (Figure 3-3) to identify the tonic pitch-class of a major or harmonic minor scale presented to the network. The perceptron learned this task very quickly, in 19 epochs of training. An examination of the 144 connection weights in the perceptron revealed that it had quickly discovered some key musical properties that it exploited to accomplish this task.

First, the perceptron learned not to treat input units as if they represented absolute pitch-classes. Instead, the perceptron “considered” input units in terms of their semitone distance (clockwise around the circle of minor seconds) from the pitch-class represented by each output unit. For each output unit, the perceptron found the input pitch-class zero semitones away, and set the weight between the input and output unit to be roughly equal to the first bar in Figure 3-4. Similarly, it found the input pitch-class one semitone away, and assigned the connection between it and the output unit to be roughly equal to the second bar in Figure 3-4; this process was repeated for all of the remaining input units.

The resulting pattern of connection weights (Figure 3-4) for each output unit pitch-class combined the two scale patterns discussed in Section 3.1.2 into one set of connection weights. If the pattern of pitch-classes presented to the perceptron is the major scale for a particular output unit, then signals sent through the connection weights come from pitch-classes spaced at distances tone-tone-semitone-tone-tone-semitone away from the tonic (Figure 3-5). This produces a net input near zero, which activates the output unit.

Similarly, if the pattern of pitch-classes presented to the perceptron is the harmonic minor scale for a particular output unit, then signals sent through the connection weights come from pitch-classes spaced at distances tone-semitone-tone-tone-semitone-augmented second-semitone (Figure 3-6). Again, this produces a near-zero net input, which turns the correct output unit on.

Importantly, the pattern of connection weights in Figure 3-4 is such that major or harmonic minor scale inputs turn the correct output unit on, but also turn the remaining output units off. This is because these patterns produce net inputs to the other 11 output units that are sufficiently different from µ to generate near-zero activity.

In short, the perceptron quickly solves the scale tonic problem by discovering the between-distance spacing of pitch-classes that define both the major and the harmonic minor scales. What is astonishing is that this basic musical knowledge—a component of basic musical training (Martineau, 2008)—is acquired by a learning rule in which perceptron weights are slightly adjusted after each pattern presentation in order to decrease the measured error in each output unit. It may also be surprising to realize that the confusing array of connection weights presented in Table 3-2 represent this basic musical knowledge.

3.4 Summary and Implications

In this chapter, I described the training of a network that generated the appropriate scale tonic when presented with the pitch-classes that defined a particular major or harmonic minor scale. The resulting connection weights reflected basic music theory, as these weights captured the between-note distances that characterize both types of scales.

It is perhaps surprising that a single set of connection weights can identify the tonic of both a major and a harmonic minor scale. Even more interesting is the fact that the perceptron uses the same pattern of weights—rotated to correspond to the appropriate pitch-classes—to identify the tonics of different scales.

Given that a perceptron quickly learns to identify a scale pattern’s tonic, one might hypothesize that a similar network could learn to classify an input scale pattern’s mode—that is, to identify whether an input pattern represents a major or minor key. A perceptron for this task would use 12 input units to represent input pitch-classes, and a single output unit to represent mode. This perceptron could be presented any of the 24 stimuli that were presented in Table 3-1. We could try to train this network to turn this output unit on if the presented pattern was a major scale, and to turn this output unit off if the presented pattern was a harmonic minor scale.

At face value, identifying scale mode seems simpler than identifying scale tonic. However, and under a wide variety of training conditions (e.g., different learning rates, different initial conditions for weights), one discovers that a perceptron cannot be taught to accomplish this task. It could never generate a correct response to each of the 24 patterns in the training set.

This illustrates an important property of perceptrons: because of their simple structure (i.e., because they have no intermediate processors between input and output units), they are not powerful enough to solve every pattern recognition problem (Minsky & Papert, 1969). The reason for this is that without additional processors, called hidden units, they cannot detect higher-order regularities that are required to solve complex problems (Dawson, 1998, 2004, 2013).

The fact that a network as simple as a perceptron is capable of solving the scale tonic problem is highly informative, because it reveals that this particular musical problem is computationally simple. That a perceptron cannot detect whether a scale is major or minor indicates that a more powerful type of network, called a multilayer perceptron, is required for other musical problems. The next chapter introduces the multilayer perceptron and provides a case study wherein it is used to explore music by training it to detect a scale’s mode: whether a presented pattern is associated with a major or harmonic minor scale.

Chapter 4: The Scale Mode Network

Show the following:

Adjust appearance:

Notes

3

The Scale Tonic Perceptron