The Journal of the Acoustical Society of America Consonance perception beyond the traditional existence region of pitch

Abstract: Some theories posit that the perception of consonance is based on neural periodicity detection, which is dependent on accurate phase locking of auditory nerve fibers to features of the stimulus waveform. In the current study, 15 listeners were asked to rate the pleasantness of complex tone dyads (two note chords) forming various harmonic intervals, and bandpass filtered in a high frequency region (all components > 5.8 kHz), where phase locking to the rapid stimulus fine structure is thought to be severely degraded or absent. The two notes were presented to opposite ears. Consonant intervals (minor third, and perfect fifth) received higher ratings than dissonant intervals (minor second, and tritone). The results could not be explained in terms of phase locking to the slower waveform envelope, because the preference for consonant intervals was higher when the stimuli were harmonic, compared to a condition in which they were made inharmonic by shifting their component frequencies by a constant offset, so as to preserve their envelope periodicity. Overall the results indicate that, if phase locking is indeed absent at frequencies greater than ~5 kHz, neural periodicity detection is not necessary for the perception of consonance.

Pythagoras (Bowling and Purves, 2015), who considered as consonant those musical intervals 23 whose frequencies formed "simple" ratios between small integers (e.g. 2:1, 3:2, 4:3). 24 In the last two centuries the debate has focused on the possible physiological mecha-25 nisms leading to the sensation of consonance. One of the major psychoacoustical theories 26 of consonance posits that the sensation of dissonance is directly related to the sensation of 27 "roughness" caused by the amplitude fluctuations, also known as "beats", produced when 28 Consonance at high frequencies purpose of this experiment was to rule out the possibility that, if consonance perception were 125 to be found poor or absent in the high frequency region, this was due to an inability of our 126 listeners to perceive melodic pitch above 5 kHz. 127 It seems reasonable to hypothesize that if melodic pitch perception is present at high fre-128 quencies, consonance perception should be present too. However, the results of a study by 129 Gockel and Carlyon (2018) suggest that different aspects of pitch processing may show unex-130 pected dissociations at high frequencies. In a series of experiments they found that while F0 131 discrimination performance at high frequencies was good, and could not be accounted for by for a mistuning of ∼ 6%, and listeners did not report hearing the mistuned component as 136 perceptually segregated from the complex. Gockel and Carlyon (2018) concluded that either 137 harmonic templates at high frequencies have wider tolerances than those at low frequencies, 138 or even though they have comparable tolerances the mechanism that leads to the perceptual 139 segregation of a mistuned component is absent at high frequencies. In either case, these 140 results suggest that it cannot be assumed that consonance perception at high frequencies 141 will be present simply because melodic pitch perception for complex tones is present at these 142 frequencies.   (Levitt, 1971 In the rating experiment, listeners were asked to rate the pleasantness of dyads consisting 171 of a low ("root") note, and a high ("interval") note. Participants rated each dyad on 172 a scale ranging from -3 to +3 in 0.1 steps by moving, through a computer mouse, a slider perceived "roughness" (Terhardt, 1984). The dyads were bandpass filtered so that their 178 components would fall either in a "low" frequency region, or in a "high" frequency region  were 100 cents (minor second), 300 cents (minor third), 600 cents (tritone), or 700 cents 184 (perfect fifth) above the root note (100 cents = 1 semitone), so as to form musical intervals 185 of the equal-tempered scale. The F0s of the interval notes are shown in Table I.  Table S1, and those of the 199 dyads in the inharmonic conditions in Table S2 of the supplementary materials.

218
These noises were gated on and off simultaneously with the dyads with 10-ms raised-cosine 219 onset and offset ramps. On each trial a 2-sec, 45-dB SPL/ERB TEN bandpass filtered 220 between 0.02 and 16 kHz was presented before the presentation of the dyad to "weaken"  within-subject designs in which subjects provide more than one datum per condition.

296
The hit and false alarm rates obtained by each listener in the melody-discrimination task generally lower ratings than in the low frequency region, but the pattern with respect to 321 interval type and harmonicity is similar to that observed in the low frequency region.
322 Figure 3 shows the mean consonance preference scores, which were calculated by sub-  The results of the melody discrimination experiment are shown in Fig. 6 It is unclear why the minor second dyad tended to be given lower ratings than the 348 "consonant" dyads in the inharmonic conditions. One possible reason is that the degree  The lowest F0 for a harmonic sieve has been generally chosen to be 30 Hz, which corre- low F0, however, will generate sieves with progressively larger meshes relative to the har-382 monic spacing as the center frequency increases, even with a relatively small tolerance. Thus 383 they will increasingly pass more components of a sound at high center frequencies and will 384 eventually pass all components above a certain frequency when the meshes become so large 385 relative to the harmonic spacing that they start overlapping. For example, a template with 386 an F0 of 30 Hz and a tolerance of ±17.49 cents will have overlapping meshes above ∼ 1500 387 Hz, which will effectively pass through all components above that frequency. This issue is 388 largely avoided by pitch models that use only templates with low-numbered harmonics ( 10  The fact that in the inharmonic conditions the minor second dyad had the lowest HNR 407 in our harmonic sieve modeling could explain why this dyad was rated lower than the other 408 dyads in the inharmonic conditions of the pleasantness rating test. However, given that there 409 is no standard way to measure HNRs these results should be interpreted cautiously. We tried 410 to choose reasonable parameters for the harmonic sieves on the basis of known constraints.

411
However, without more definitive knowledge of the psychophysiological mechanisms used 412 by the auditory system to assess harmonicity, results from harmonic sieve models remain 413 necessarily tentative. In any case, it should be remarked that in the inharmonic conditions 414 the minor second dyad was given lower ratings than the "consonant" dyads both in the 415 low, and in the high frequency region. Therefore this result is unlikely to be due to some 416 idiosyncrasy of the high-frequency diads. Instead, this result supports the view that the 417 pleasantness ratings were determined by the same mechanisms in the low, and in the high 418 frequency regions.

419
Interestingly in the high-frequency harmonic condition the HNR rankings of the tritone 420 and minor third dyads are reversed compared to the pleasantness ratings. This could be 421 taken as evidence against the idea that pleasantness ratings are determined by harmonicity.

422
However, it is possible that a learned association between pleasantness and a given dyad 423 with all its lower harmonics as they occur naturally is transferred to a dyad with only a 424 subset of those harmonics, as is the case for the dyads filtered in the high-frequency region 425 of our experiment. It is also possible that given that the dyads were presented in noise, the We found that two consonant intervals were rated higher than two dissonant intervals 432 even when they were presented in a high frequency region where neural phase locking to 433 individual harmonics is thought to be severely degraded or absent. Given that the envelope 434 repetition rates for our stimuli were higher than the highest rates at which the ability to 435 perceive pitch on the basis of purely envelope rate cues has been observed (Burns and 436  were completely outside the ∼ 20 − 300 Hz range over which roughness can be perceived 458 Consonance at high frequencies (Terhardt, 1974a , b); for example both the minor second and the perfect fifth dyads in the 459 high frequency harmonic condition, which respectively received the lowest and the highest, 460 pleasantness ratings, did not contain any difference frequencies in this range. Therefore, the 461 differences in pleasantness ratings given to these dyads cannot be attributed to perceived 462 roughness caused by envelope beats.

463
Overall, our results indicate that pleasantness ratings in our experiment were determined 464 by pitch relations between the tones forming the dyads rather than by beats. Our results

465
do not shed light on the debate between the "harmonicity", and the "cultural" theories of Consonance at high frequencies a mistuning of ∼ 6% that was very difficult to detect in Gockel and Carlyon (2018)'s study.

481
The reasoning behind this is that given that the distance between the root and interval notes 482 of a minor second dyad is 100 cents, a harmonic template at the F0 of the root note with a 483 tolerance ≥ 100 cents would pass through all components of a minor second dyad, just as it 484 would pass through all components of a unison dyad. Given that the unison, together with 485 the octave typically receive the highest pleasantness ratings amongst all musical intervals, 486 the fact that the minor second received the lowest pleasantness ratings in our study clearly 487 shows that it was treated differently than a unison. Therefore our results, combined with 488 those of Gockel and Carlyon (2018) suggest that harmonic templates at high frequencies 489 may not be larger than at low frequencies, but the mechanism that leads to the perceptual 490 segregation of the mistuned harmonic may be absent at high frequencies.

491
A. Is neural phase locking necessary for the perception of consonance?

492
Although phase locking is thought to be severely degraded or absent above ∼ 5 kHz, 493 some computational models suggest that, theoretically, some residual temporal information suggesting that a transition from a temporal to a place code may occur ∼ 8 kHz rather 498 than ∼ 5 kHz as once commonly thought. On the basis of this evidence it has been argued 499 that, although phase locking may be too weak to support musical pitch perception for 500 individual pure tones above 5 kHz, the combined temporal information across several > 5 501 Consonance at high frequencies kHz harmonics of a complex tone may be sufficient to support musical pitch perception. are available (Johnson, 1980;Palmer and Russell, 1986;Winter, 2005). Recordings of the 516 compound action potential using a technique that separates the auditory nerve neurophonic 517 from the cochlear microphonic, indicate that this limit is at best similar, and probably lower 518 than the 5 kHz limit recorded in the cat (Verschooten et al., 2018).

519
Given the results of our study, the question of whether neural phase locking is necessary unit similar to the models proposed by Goldstein (1973)   is unlikely that envelope cues could account for the high performance levels observed in the current study.