Noise edge pitch and models of pitch perception

Monaural noise edge pitch (NEP) is evoked by a broadband noise with a sharp falling edge in the power spectrum. The pitch is heard near the spectral edge frequency but shifted slightly into the frequency region of the noise. Thus, the pitch of a lowpass (LP) noise is matched by a pure tone typically 2%–5% below the edge, whereas the pitch of highpass (HP) noise is matched a comparable amount above the edge. Musically trained listeners can recognize musical intervals between NEPs. The pitches can be understood from a temporal pattern-matching model of pitch perception based on the peaks of a simplified autocorrelation function. The pitch shifts arise from limits on the autocorrelation window duration. An alternative place-theory approach explains the pitch shifts as the result of lateral inhibition. Psychophysical experiments using edge frequencies of 100 Hz and below find that LP-noise pitches exist but HP-noise pitches do not. The result is consistent with a temporal analysis in tonotopic regions outside the noise band. LP and HP experiments with high-frequency edges find that pitch tends to disappear as the edge frequency approaches 5000 Hz, as expected from a timing theory, though exceptional listeners can go an octave higher.

Monaural noise edge pitch (NEP) is evoked by a broadband noise with a sharp falling edge in the power spectrum. The pitch is heard near the spectral edge frequency but shifted slightly into the frequency region of the noise. Thus, the pitch of a lowpass (LP) noise is matched by a pure tone typically 2%-5% below the edge, whereas the pitch of highpass (HP) noise is matched a comparable amount above the edge. Musically trained listeners can recognize musical intervals between NEPs. The pitches can be understood from a temporal pattern-matching model of pitch perception based on the peaks of a simplified autocorrelation function. The pitch shifts arise from limits on the autocorrelation window duration. An alternative place-theory approach explains the pitch shifts as the result of lateral inhibition. Psychophysical experiments using edge frequencies of 100 Hz and below find that LP-noise pitches exist but HP-noise pitches do not. The result is consistent with a temporal analysis in tonotopic regions outside the noise band. LP and HP experiments with high-frequency edges find that pitch tends to disappear as the edge frequency approaches 5000 Hz, as expected from a timing theory, though exceptional listeners can go an octave higher. V C 2019 Acoustical Society of America. https://doi.org/10.1121/1.5093546 [JJL] Pages: 1993-2008 In 1963, B ek esy observed that an octave band of noise (400-800 Hz) produces two pitch sensations, one near each frequency edge of the noise spectral band. B ek esy attributed the pitch sensations to lateral inhibition at the edges and compared them to Mach bands in vision. Small and Daniloff (1967) simplified B ek esy's experiment by using lowpass (LP) noise bands and highpass (HP) noise bands. Their listeners adjusted the edge frequency of a noise band to produce a pitch an octave higher or lower than the pitch of a noise band with a standard edge frequency. Fastl (1971) performed pitch matching experiments in which the pitches elicited by LP and HP noise bands with sharp edges were matched by adjusting the frequency of a sine tone. In all of these early studies, the average matching frequencies were reported to be the same as the edge frequencies (e.g., Zwicker and Fastl, 1999), but later studies, as described below, showed pitch shifts where the matching frequencies deviate systematically from the edge frequencies.
In their work on binaural edge pitch, Klein and Hartmann (1981) recorded pitch matches for diotic LP and HP noise bands with sharp edges as a diotic analog to binaural edge pitch. Their pitch matching data consistently revealed pitch shifts away from the edge and into the noise. Thus, the pitch of a LP noise was found to be slightly below the edge frequency, and the pitch of a HP noise was found to be slightly above the edge frequency.
Klein and Hartmann generated their stimuli digitally in the frequency domain, leading to spectral edges with a 30-dB discontinuity at the edge frequency. These edges were much sharper than were available with the analog filters used by previous studies. For example, the filters used by Small and Daniloff (1967) had slopes of 35 dB/octave, and those used by Fastl were 120 dB/octave. Digital noise generation was able to reveal pitch shifts for several reasons. First, the sharp edges removed the uncertainty intrinsic to analog filtering about how the edge frequency, f e , should be defined. Second, the sharp edges led to relatively stronger pitch sensations that allowed for more precise matching. The observed pitch shifts were 5%-10% of the edge frequency, f e , in the range 200 Hz < f e < 400 Hz, and the shift percentages became smaller with increasing edge frequency. These observed shifts afford an opportunity for experimental tests of pitch perception models. Pitch shifts as reported in this paper indicate that a temporal theory, implemented here by an analytic model, is required for low edge frequencies while a place model, as envisioned by B ek esy, likely applies for high edge frequencies.
B. Plan of the paper In Sec. II, an autocorrelation-based model of pitch is presented. The model predicts pitches based on an apparent periodicity determined by the pattern of lag times of the peaks of the autocorrelation function (ACF). Because of the unusual structure of its ACF, the noise edge stimulus is an a) Electronic mail: hartmann@pa.msu.edu especially powerful test for pitch models based on neural timing. A sinc function approximation to the ACF is used to predict pitches. Model predictions are compared with the data of Klein and Hartmann (1981) for both LP and HP noise. Despite its simplicity, the sinc-autocorrelation function (sinc-ACF) model successfully reproduces major features of the data. Success depends on incorporating multiple peaks of the ACF in the pitch computation. In Sec. III, a competing, place-based lateral-inhibition-based model is presented using physiological and psychophysical parameters. This place model is almost as successful in matching the 1981 data. In Sec. IV, new experimental data for LP and HP noise test the low-edge-frequency limit for edge pitch. The relative weakness of pitch in the HP case is attributed to the restricted tonotopic region for temporal coding of the low-frequency edge. In Sec. V, pitch-interval identification data show that the edge pitch qualifies as a true musical pitch. Section VI presents experimental data for LP and HP noise with high edge frequencies, testing the upper limits of edge-pitch perception. In Sec. VII, edge pitch is related to more general models and experiments. Finally, Sec. VIII is a summary. The mathematical foundation for the sinc-ACF model is given in Appendixes A and B.

II. AUTOCORRELATION MODEL
As a consequence of phase-locking in auditory nerve fibers, the temporal pattern of neural spikes is highly correlated with the stimulus waveform, after taking into account cochlear filtering and auditory transduction. As per the models by Licklider (1959), Meddis and Hewitt (1991a,b), and Patterson et al. (2000), the temporal character of our model is represented by an ACF-a representation that highlights the periodic character, or approximate periodic character, of waveforms that lead to pitches. The model estimates pitches using an algorithm based on the lags of the peaks in the ACF. Noise edge pitch (NEP) offers a particularly interesting pattern of peaks.

A. Autocorrelation and pitch
Because the noise stimuli of interest are broadband, a physiologically detailed model might begin by dividing the noise spectrum into auditory filter bands. Cochlear auditory filtering might be followed by half-wave rectification and compression, known to apply to the auditory periphery. Then, autocorrelation may be calculated within each band. Subsequently, ACFs for the separate auditory bands may be summed to generate a "summary autocorrelogram" (Meddis and Hewitt, 1991a,b) or "population interval distribution" (Cariani and Delgutte, 1996a,b). Although we have investigated models like that in the past Hartmann et al., 2015), the model in this report is much simpler. Specifically, it makes a linear approximation for the periphery. With that approximation, a summary autocorrelogram, summed over contiguous, rectangular bands, is mathematically the same as the ACF for the broadband stimulus as explored here. The initial modeling in this section also assumes that the noise stimulus has no intrinsic variability.
The noise spectrum is approximated by its long-term average.

B. The sinc-autocorrelation model
The long-term average power spectrum for a noise edge stimulus can be represented as a rectangle. For the LP NEP, the rectangle extends from zero to the edge frequency f e . For the HP NEP, the rectangle extends from f e to infinity. This report first treats the LP stimulus in detail and then shows how a simple modification applies to the HP condition.

LP noise
The ACF is the inverse Fourier transform of the power spectrum. Because the long-term average power spectrum of a LP noise with mean power P is either zero or constant at P=ð2pf e Þ, the ACF is a sinc function of lag s, for P ¼ 1. This function can be thought of as an approximation to the all-order interspike-interval histogram. The function is shown in Fig. 1 for an edge frequency of 200 Hz.
We assume that the pitch of the LP NEP is determined by the values of lag s where a LP (s) has peaks. The prediction of an edge pitch from the peaks in the sinc-ACF is derived in Appendix A. To a good approximation, the first peak of a LP (s) (after the zeroth peak at the origin) occurs at f e s ¼ 5/4 cycles. To an even better approximation, all the other positive peaks are separated from the first by integer multiples of the period. Therefore, the nth peak occurs very near the lag value of s n ¼ ðn þ 1=4Þ=f e ; n ¼ 1; 2; 3; …; N: ( This approximation is tested in Appendix B. As noted by Cariani (2004), each of these peaks is a potential temporal pitch cue, but because of the finite auditory integration time, only the first N of them are important to the pitch sensation. The value of N is a critical matter addressed in this paper. If only the first peak is included (N ¼ 1), then [from Eq. (2)] the pitch cue is s 1 ¼ 1.25/f e , and the predicted pitch becomes p ¼ 1/s 1 ¼ f e /1.25. Thus, the ratio of pitch to the edge frequency is p/f e ¼ 1/1.25 ¼ 0.8, i.e., the pitch is predicted to be 20% below the edge frequency. That prediction gets one thing right: for a LP noise, the perceived pitch is lower than the edge frequency. However, this predicted pitch shift away from the edge is too large by a factor of about 5. The experimental pitch match is usually lower than f e by much less than 20%, typically near 4%. Apparently the first peak in the ACF is an inadequate cue for the pitch shift for NEP. The problem is solved by incorporating more peaks.
The model which relates the multiple peaks of the ACF to a pitch prediction is described in Appendix A. There, it becomes evident that including more peaks (larger N) in the computation of pitch maintains the sign of the predicted shift but reduces its magnitude. It is reasonable to suppose that N is limited by a temporal window, defined by a maximum lag time s max , over which peaks can be obtained. Because N is approximately equal to f e s max , the number of important peaks decreases as the edge frequency decreases. It will be seen that this effect predicts that the pitch shift percentage should increase with decreasing edge frequency, if it is assumed that s max is relatively insensitive to edge frequency or tonotopic region. Appendix A shows that the ratio of predicted pitch to edge frequency for LP noise is given by where c is the Euler-Mascheroni constant, c % 0.57722. This formula should hold good in the case that N is not too small. Also in that case, N % f e s max , and p/f e can be computed making that substitution for N in Eq. (3). However, N is an integer, and it is only reasonable to take the integer part of the continuous function, i.e., N ¼ INT(f e s max ). That was assumed for the computation of the final pitch predictions shown in Fig. 2.

HP noise
Appendix A also shows that the corresponding prediction for HP noise is Two predictions follow from the equations above: first, the pitch of HP NEP should be above the edge frequency just as the pitch of LP NEP should be below. Second, the percentage shift for HP NEP [Eq. (4)] should be somewhat larger in magnitude than the percentage shift for LP NEP [Eq. (3)]. The latter follows because:  Klein and Hartmann (1981), filled for HP noise and open for LP noise. Each diamond symbol shows the mean of at least four matches. Error bars are two standard deviations in overall length. Most error bars are smaller than the points, but error bars become larger below 200 Hz, where matching became difficult and data may be less reliable. Lines marked with circles, triangles, and squares indicate the predictions of the sinc-ACF model from Eqs.  (4)] for a 30-ms temporal window so that the number of peaks is N ¼ 5 (LP) or 6 (HP). The mathematics of Appendix A puts those two predictions between 1/f e and the first peak of the sinc-ACF at s 1 ¼ (1 6 1/4)/f e . Open diamond symbols show integer multiples ofŝ. The insets show the rectangular model power spectra. j1=ð1 À xÞ À 1j > j1=ð1 þ xÞ À 1j; (5) for x small and positive. The temporal model introduced here for LP and HP noise bands and based on a limited number of peaks of the sinc function will be called the "sinc-ACF model."

C. Edge pitch listening experiment
In Fig. 2, the predictions from the sinc-ACF model are compared to the pitch matches by three listeners in the experiment by Klein and Hartmann (1981). The details of the experiment can be found in the original publication. Briefly, there were three male listeners, G, M, and W (ages, 22, 22, and 41 yr) with normal hearing. They used a sine tone with adjustable frequency and level to match the pitch of noise bands with sharp edges (30-dB discontinuity). The 12-bit noise stimuli were made by adding 251 equally spaced sine components-equal amplitude random phase. Stimuli were presented through headphones at 60 dBA sound pressure level (SPL) in a sound treated room. 1 Figure 2 indicates notable qualitative agreement among the three listeners, at least above 200 Hz, where there was no exception to the rule that the mean matching frequencies were shifted into the region of spectral power for both HP and LP. The sinc-ACF model is consistent with that rule.

D. Comparing the temporal model and experiment
Predictions from the sinc-ACF model for LP and HP NEP are shown in Fig. 2 for three values of the maximum lag, s max ¼ 15, 30, and 60 ms. Figure 2 shows that the pitch shifts are larger as a percentage of the edge frequency for lower edge frequency though the experimental shifts below 200 Hz are uncertain. The model calculations agree; the predicted pitch shifts increase with decreasing edge frequency. The shifts increase because, with a given time window for autocorrelation (e.g., s max ¼ 30 ms), fewer autocorrelation peaks are in a window of given duration when the edge frequency is low. Figure 2 also shows that the pitch shifts are larger in magnitude for HP edges than for LP edges. As noted in Eq. (5), this is also a feature of the sinc-ACF model.
Although the model predictions in Fig. 2 are for fixed lag windows, s max , a careful comparison between matches and models indicates that the best fitting lag window increases in duration with decreasing edge frequency for all the listeners. The effect is evident also in Fig. 3(a) where the matching data are averaged over the three listeners. Whereas matches to edge frequencies near 2000 Hz seem to agree with the model for s max ¼ 15 ms, matches to edge frequencies below about 600 Hz agree better if s max ¼ 30 or 60 ms. This observation is consistent with other evidence that auditory integration times become longer as the frequency range decreases (Moore, 1982;Bernstein and Oxenham, 2005;de Cheveigne and Pressnitzer, 2006). The apparent dependence of window duration on edge frequency needs to be understood in context. Window durations depend on the characteristic frequencies of neural channels. The channels that are important for a given edge frequency f e are those with characteristic frequencies in the neighborhood of f e . According to Eqs. (3) and (4), the predictions in Fig. 3 depend only on the maximum lag s max through parameter N % f e s max . We used a two-parameter model to optimize the dependence of s max on edge frequency to fit the data in Fig.  3(a) from 200 to 2500 Hz.
The best fitting parameters and root-mean-square (RMS) errors are given in Table I(a). It is evident that the best fit requires a longer window s 200 at low edge frequencies than at high s 2500 by a factor between 2 and 4. The window durations in Table I(a) can be compared with the window durations suggested by Moore (1982) for a pitch model based on interspike intervals (first-order histogram): a minimum duration of 0.5/f c and a maximum of 15/f c , where f c is a characteristic frequency for the tonotopic region. The maximum duration, arguably applicable to our all-order distribution, is 75 ms and 6 ms for 200 Hz and 2500 Hz, respectively. However, unlike the Moore formulas, our optimum window duration does not scale with edge frequency. The factor between 2 and 4 is much less than 12.5. As evident in Eq. (1), the ACF oscillates with an approximate spacing given by the reciprocal of the edge frequency. Therefore, a pitch model that identifies pitch perception with the spacing of the oscillations of the ACF must fail in view of the observed pitch shifts. Reasoning like that caused Klein and Hartmann (1981) to abandon temporal models for edge pitch. By contrast, the autocorrelation model of Secs. II A and II B above uses the actual lag values of the peaks and not their regular spacing. Similarly, Yost et al. (1996Yost et al. ( , 1998 and Patterson et al. (2000) accounted for the pitch of iterated rippled noise in terms of the lag value of the first peak of the ACF. However, modeling NEP requires more than just the first peak because the first peak alone predicts a pitch shift that is too large.

III. LATERAL INHIBITION PLACE MODELS
Lateral inhibition is a neural phenomenon known to occur in the visual system (Hartline et al., 1956). Lateral inhibition has been a hypothetical element in model auditory systems (B ek esy, 1963), and masking data have been interpreted in terms of it (Carterette et al., 1970a,b). The lateral inhibition concept can account for NEPs in a very natural way because it enhances contrast at edges. Plausible quantitative models can be expected to predict the pitch shifts into the noise, as observed experimentally, because lateral inhibition causes the peaks to be on the large-excitation side of the edge. The purpose of this section is to examine the predictions of plausible lateral inhibition models and compare the predicted pitch shifts with observed shifts for NEP.
Lateral inhibition models are tonotopic, ultimately related back to the displacement pattern on the basilar membrane. 2 Because of the approximately logarithmic nature of the human tonotopic axis, it is natural to choose shifts of the excitation pattern on a logarithmic scale. For example, Shamma (1985) used 1/3 octave. Such a shift would lead to a flat line prediction for p/f e on a logarithmic plot like Fig.  3(b). It would not capture the tendency for shifts to increase as edge frequency decreases.
The model considered here relates frequency to basilar membrane place through the Greenwood formula (Greenwood, 1961), where f is the frequency in Hertz and z is the tonotopic coordinate in mm measured from the apex. Parameter a is 0.14 mm À1 for human cochleas. The Greenwood formula, based on psychoacoustical masking experiments, shows a low-frequency compression of the tonotopic coordinate when plotted as a function of the logarithm of the best frequency, i.e., changing the best frequency by a semitone corresponds to a smaller change in tonotopic coordinate (Dz) at low frequency than at high. Such low-frequency compression of the tonotopic scale is characteristic of auditory filter models of the periphery, e.g., Zwicker (Bark scale; 1961), Glasberg and Moore (Cam scale;1990). Our procedure for computing the peak caused by lateral inhibition first uses the Greenwood formula to convert the edge frequency to a tonotopic place z. Then it calculates the place of peak excitation z 0 by applying a constant shift in millimeters, and finally it reconverts place z 0 to a frequency to represent the pitch p. In order to model edge enhancement, the peak shift must be positive for HP edges and negative for LP edges. The comparison is shown in Fig. 3 where the data to be modeled are the same in Figs. 3(a) and 3(b). Figure 3(b) shows that the shift calculated from the lateral inhibition model has the right sign and approximately the right shape to agree with the data. Because of the lowfrequency compression of the tonotopic scale, the magnitude of the predicted shift is larger for lower frequencies than for high, in agreement with pitch matching data. However, the compression is only modest, and the curvature of the predicted shift functions is smaller for the lateral inhibition model [ Fig. 3 (b)] than for the temporal model [ Fig. 3(a)].
The lateral inhibition model was tested quantitatively against the data in Figs. 2 and 3 in the frequency range from 200 to 2500 Hz (actually 197 Hz to 2438 Hz). Tables I(a) and I(b) show the results of parametric best fits to the experimental matches. For the temporal model [ Overall, the RMS fitting errors are somewhat smaller for the temporal model, and the cause of this difference is the relatively flat nature of the predictions by the place model.
The rectangular bandwidth scale by Glasberg and Moore (1990) becomes an alternative tonotopic scale if the bands are stacked in order of center frequency. The formula  (7)], but the fit is still slightly inferior to the temporal model. Finally, the frequency-dependent predictions shown in Figs. 3(a) and 3(b) were replaced by the best straight linesseparately for LP and HP conditions. The straight lines correspond to a temporal model in which the duration of the lag window (s max ) depends on frequency range in such a way that the number of autocorrelation peaks (N) within the window is always the same, independent of edge frequency. Therefore, s max / 1/f e . The straight lines also correspond to a place model in which the displacement of the excitation peak caused by lateral inhibition is a constant fraction of the edge frequency as in the appendix in Shamma (1985). Table  I (d) shows that RMS errors are largest for these straight line fits indicating that the frequency dependences introduced into the temporal and place models make useful (though small) contributions.
All of the models compared in Table I have two adjustable parameters, which makes the comparisons fair. However, the place models from Tables I(b) and I(c) both require parameters to relate place to frequency, determined originally from other experiments. The temporal model is more economical in that respect. The sign of the pitch shift results from the logic of lateral inhibition and the fitting procedure for the place model, but it emerges automatically from the temporal model. Small and Daniloff (1967) reported that listeners were unable to do octave matching for HP noise with edge frequencies less than 610 Hz, but octave matches could be made for LP noise with an edge as low as 145 Hz. Similarly, Fastl (1971) found that highpassed noise with an edge frequency below 500 Hz did not produce a pitch. Fastl and Stoll (1979) asked listeners to rate the pitch strength of 12 different stimulus types, including LP and HP noises with edges ( 6192 dB/octave) at 125, 250, and 500 Hz. Their listeners found LP edge pitches to be stronger and HP edge pitches to be inaudible sometimes. Also, the data from Klein and Hartmann tended to show more variability for HP noise than for LP noise as the edge frequency decreased. By contrast, with window durations as long as 60 ms, the sinc-ACF model does not immediately suggest any particular difficulty for the HP condition. Low-edge-frequency experiments were done using our sharp edges to test the low-edge-frequency limit and compare with the previous experiments. The limit for HP noise was of particular interest.

IV. LOW-EDGE-FREQUENCY EXPERIMENTS
A. HP noise

Procedure
The HP noise bands were computed in the frequency domain with a sharp edge at f e at the low-frequency end, where the spectral discontinuity, as presented to the listeners, was more than 40 dB. Above f e the noise was equalamplitude, random-phase extending to 20 000 Hz except that the amplitude decreased linearly by 20 dB between 16 000 and 20 000 Hz to avoid a sharp edge at very high frequency. Because noise bands were very wide for all edge frequencies, the noise power was essentially constant. Noise bands were generated with 16-bit precision at a sample rate of 100 000 samples per second. They were presented diotically to listeners in a double-walled sound room through Sennheiser HD600 headphones (Wedemark, Germany). The level was 65 dBA. Noise stimuli were 520 ms in duration with 20-ms raised-cosine onsets and offsets.
A noise interval was followed by a 400-ms silent interval 3 and then by a 500-ms sine tone with a frequency that could be adjusted by the listener to match the pitch of the noise edge. The entire audible range of frequencies was available for the matching tone through a combination of push-button range switching and a ten-turn potentiometer on the response box. The control voltage from the response box was read by a 12-bit analog-to-digital converter, and the matching tone was then generated digitally so that the matching frequency was known precisely. The relationship between the potentiometer setting and the tone frequency was randomly offset at the start of each trial. The level of the matching tone was also adjustable by the listener, and the matching tone could be muted with the press of a button. The cycle of target noise and matching tone repeated indefinitely until the listener was satisfied with the match. After making a match, the listener received feedback, including the matching frequency, the edge frequency, and the percentage difference between the two. Listeners were told to expect matches to be close to the edge, but not necessarily identical to the edge. (See footnote 8 in Sec. VIII.) Our experiment looked for pitches far below the limits found by Fastl (1971) and Small and Daniloff (1967). It used eight edge frequencies: 50, 70, 100, 150, 200, 280, 400, and 560 Hz. The different edge frequencies were presented in random order in an experiment run.
As a control experiment, listeners adjusted the frequencies of sine tones to match the pitches of eight lowfrequency sine tones, ranging from 40 to 500 Hz. These were presented at a nominal level of 70 dB SPL, considering the elevated audiogram at 40 Hz.

Listeners
There were six male listeners in the matching experiments overall. Listeners A, B, I, S, and Z were between the ages of 20 and 25 yr. They were accepted as listeners based on their ability in a high-frequency sine-sine pitch matching test. Listeners A, I, and S were able to match sine tones with a standard deviation of less than 10 cents (0.6%) at least up to 13 kHz. Listener B's standard deviation was less than 10 cents between 350 and 9000 Hz. Listener Z was tested only up to 8 kHz where his standard deviation was 10 cents. Listener W 0 was the same as Listener W (data in Fig. 2) but tested 37 yr later. He only participated in low-frequency experiments. All listeners were amateur musicians except for listener A, who was a professional. The young listeners were all students at Michigan State University; they signed a consent form approved by the University Institutional Review Board (IRB).

Results
Four of the listeners participated in the low-edge-frequency HP experiments. The matching data are shown in Fig. 4. Small numbers near the horizontal axis show the total number of trials. Matching ratios are shown by circles. Matching ratios one octave (occasionally, but rarely, two octaves) higher than plotted are shown by upward pointing, filled triangles. For instance, an upward triangle at f e ¼ 70 Hz plotted near À300 cents represents a matching tone that was about 900 cents above 70 Hz (900-1200 ¼ À300). Downward pointing triangles indicate matches that were an octave lower than plotted. Solid lines show the predictions of the sincautocorrelation peak model for different window durations. The model is the same as for Figs. 2 and 3, but the plots look different because the model was evaluated at a fine mesh of points for Fig. 4.
Comparison with the predictions confirms the observation made in Sec. II that the optimum window duration is relatively long for edge frequencies below 600 Hz. Most of the matching frequencies in Fig. 4 agree best with the model with a 60-ms window.
All listeners made consistent matches for edge frequencies above 200 Hz with negligible octave errors and most of the matches within a semitone of the target. Listeners S and W 0 matched consistently down to 150 Hz. Consistent matches were almost all above the edge frequency as expected from the sinc-autocorrelation peak model. Listeners found the pitches evoked by those edges to be salient. By contrast, for edge frequencies below 150 Hz, there was no evidence for salient pitches. There were frequent octave errors, and many matches were well away from the target. Matches showed no consistent pattern except possibly for listener S. However, the consistency seen for listener S does not seem to indicate actual edge pitch perception below 150 Hz. 4 We conclude that the low-frequency limit for highpassed edge pitch is between 100 and 150 Hz, though values that low were not achieved by all listeners.

B. LP noise
Low-edge-frequency experiments were done using LP noise. The same edge frequencies and protocol from the HP experiments of Sec. IV A above were used. The levels were the same, too, except that the listener had the option of requesting a level increase for the lowest edge frequencies, particularly 50 Hz. Again, listeners A, B, S, and W 0 participated, and results are shown in Fig. 5. The data show that matches could be made reliably down to 50 Hz, the lowest edge frequency tested. Although the matching variance grew for decreasing edge frequency, the growth may relate more to low-frequency loudness and pitch acuity, in general, than to the strength of edge pitch. As expected for lowpassed noise, most (89%) of the pitch shifts were negative, though the percentage fell to 81% for listener A.

C. Discussion
A comparison of the HP and LP experiments shows that the LP edge pitch could be heard for edge frequencies at least an octave lower than the limit for the HP edge pitch. The difference can be understood within a temporal model by considering the amount of tonotopic axis available to represent the timing for these two noise types. The difference is FIG. 4. (Color online) Probing the low-frequency limit with HP noise. The ratios of matching frequency to edge frequency are expressed in cents and shown by circles. Matches that were an octave (occasionally two octaves) higher (lower) than plotted here are shown by an upward (downward) triangle. Small numbers indicate the number of matches (trials) for each frequency. The shaded region near zero cents is centered on the average match (six trials) for sine tones in the control experiment, and the width of the region is two standard deviations. Solid lines give the prediction of the sincautocorrelation model for temporal windows of 15, 30, and 60 ms. At these low frequencies there are few sinc function peaks in the temporal window, and the predicted plots show a succession of plateaus as new peaks enter the window. The plateaus appear because the plots use a fine mesh of points.
inconsistent with a place model that incorporates auditory filters with the usual tuning asymmetry.

Temporal model
For LP noise with a low edge frequency, the entire tonotopic axis with best frequencies (BF) above the edge experiences no on-frequency excitation. Instead, neurons with high BF mainly experience the excitation that is near the edge, especially because their low-frequency slopes are relatively shallow. Calculations with a gammatone filter model show an ACF with peaks at lag values determined by the edge frequency, independent of BF when BF is greater than the edge frequency. A strong ACF over a major part of the tonotopic axis can be expected to lead to a strong pitch, in agreement with experiment. At the same time, a LP noise with a low edge frequency has relatively few components in its spectrum, and that leads to a rough-sounding temporal envelope making the matching experience unpleasant and somewhat difficult.
For HP noise with a low edge frequency, the only neurons free of on-frequency excitation are in the region of the tonotopic axis with even lower BF. Because of their sharp cutoff in the high-frequency tails, these neurons experience only little excitation having a temporal structure determined by the edge frequency. Therefore, HP edge pitch can be expected to disappear for low edge frequency, as observed experimentally.
The above explanation for the differences between LP and HP noise for low edge frequency retains the temporal character of our autocorrelation model but augments it with place considerations to obtain a qualitative understanding of the strength of the temporal information that is available. Because excitation pattern models are well developed, it would be possible to make quantitative predictions for the relative strengths of edge pitches with different edge frequencies. Such calculations are beyond the scope of the present paper.

Place model
For LP noise with low edge frequency, the edge of the excitation pattern is broad because neurons with BF near, and slightly above, the edge are excited by noise components that are below the neuron BF. For HP noise with low edge frequency the edge of the excitation pattern is sharp because neurons with BF just below the edge are inefficiently excited by noise above their BF. Therefore, edge pitch is predicted to be stronger for HP noise, contrary to experiment.

V. INTERVAL IDENTIFICATION
Musical pitch is recognized if a listener can identify melodies without rhythm or adjust or identify musical intervals (Houtsma and Goldstein, 1972;Plack and Oxenham, 2005;Moore and Ernst, 2012;Oxenham et al., 2011;Gockel and Carlyon, 2016). To determine whether NEP qualifies as a musical pitch, interval identification experiments were inserted into the schedule of pitch matching runs for high and low edge frequency. Listeners made open set identifications of melodic intervals to verify that the pitch elicited by noise bands with a sharp edge qualifies as a musical pitch. Because the musical nature of the edge pitch itself was in question, there was no special concern with frequency range, and edge frequencies ranged from 600 to 2400 Hz.

A. Intervals-LP noise
In the LP experiment, the intervals and noise edge frequencies (Hz) were these: octave (600,1200), fifth (1600,2400), fourth (1200,1600), and major third (1600,2000). These four intervals were always melodic and ascending. For each experimental trial, an interval was randomly chosen from the four and presented to the listener four times-a standard cycle. After the cycle the listener could either identify the interval or could request a repetition of the cycle. Listeners were familiar with musical intervals, but they did not know which intervals were in the test. Stimuli were generated according to the procedure described in Sec. IV A. Stimuli were again 520 ms in duration with 20-ms raised cosine onsets and offsets. A pause of 400 ms separated the two noises of an interval.
Results of the experiment were as follows: • Listener A immediately identified all four intervals without waiting for the four intervals of a cycle to complete. • Listener I correctly identified the major third and the fourth after one cycle. He required two cycles to correctly identify the fifth, and misidentified the octave as a perfect fifth. • Listener S correctly identified three intervals but called the major third a minor third. When then presented with a minor third (2000,2400 Hz), the listener responded, "major third." Upon further testing with major thirds (3200,4000) and (3600,4500) and a minor third (3000,3600), the listener made one error, calling (3200,4000) a minor third. • Listener W 0 correctly identified all four intervals but required two cycles to identify the major third and the fourth. Listener W 0 designed the experiment and knew which four intervals were in the set, but not the order of presentation. • Listener Z correctly identified all four intervals after hearing one standard cycle for each.

B. Intervals-HP noise
In the HP experiment, the intervals and edge frequencies were the same as those in the LP experiment. Another interval, a minor third, (2000,2400) was added to the standard set.
• Listener A correctly identified four intervals, usually before the completion of a cycle, but he misidentified the octave, insisting that it was a minor 7th! Such a misidentification might have been predicted. The octave interval was made with relatively low edge frequencies, 600 and 1200 Hz, where the pitch shift gradient is large and negative (Fig. 2). The pitch of the 600-Hz edge is expected to be increased more than the pitch of the 1200-Hz edge, leading to a compression of the perceived interval. 5 • Listener S correctly identified all five intervals, always at the end of a single standard cycle. He made no mistakes in seven random trials.

C. Discussion-Interval identification
The results of the LP and HP interval identification experiments, as summarized in Table II, indicate that noises with a sharp spectral edge elicit a musical pitch. Although some listeners made some mistakes and others required more than a single cycle, the difficulties appear to represent only isolated cases with possible additional confusion from the pitch shifts. The edge-pitch noise stimuli are clearly capable of generating a musical pitch. Isochronous melodies made with edge pitches have been recognized by audiences at conferences (e.g., Hartmann et al., 2015). This positive result is hardly surprising. Akeroyd et al. (2001) found that binaural analogs of monaural edge pitches lead to musical pitch sensations, and binaural edge pitches are more challenging to listeners than the monaural NEPs investigated here.

VI. HIGH-EDGE-FREQUENCY EXPERIMENTS
If the pitch of a noise band with a sharp spectral edge is the result of a temporal process, as conjectured in Sec. II, then the pitch sensation requires the synchrony of neural firing. Neural synchrony, at all levels of the auditory system, is known to decrease dramatically with increasing frequency, though the frequency at which synchrony is no longer operative is a subject of ongoing debate. Whereas frequency difference limen data from Moore and Ernst (2012) suggested a limit from 8 to 10 kHz, scaling arguments (Joris and Verschooten, 2013) and cochlear measurements (Verschooten et al., 2018) suggest a limit no higher than a few kilohertz. At the level of the auditory nerve, it is common to set an upper limit near 5000 Hz based on Johnson's data on cat (Johnson, 1980). To determine whether edge pitch persists at high frequencies, we performed pitch matching tests for LP and HP noise with high edge frequencies. As a control experiment, we performed similar tests with sine tone targets.

A. LP matches
Pitch matches to LP noises are shown in two ways in Figs. 6 and 7: (1) The circles show the ratios of matching frequencies to edge frequencies, on a scale of cents, when the ratios were in the range À400 to 400 cents. Errors identified as octave discrepancies are shown by filled triangles. An upward pointing triangle indicates a match one octave above the plotted symbol. A downward pointing triangle indicates a match one octave (occasionally two octaves) below the plotted symbol. Some matches did not fall on the plot, even allowing for octave discrepancies. These are shown by circles on dashed lines above (match too high) and below (match too low) the plot. (2) The average value and standard deviation of the difference between the matching frequency and the edge frequency is shown as a percentage of the edge frequency. Matches with octave discrepancy assignment were not included in the averaging. Matches in the one-octave range from À600 cents to þ600 cents discrepancy were included in the average. The numbers of matches included in the average are shown by small numbers in the upper half plane. The averages themselves, followed by the standard deviation, are in the lower half plane. The shaded region indicates the matches in a control experiment where the listener matched a sine tone target with a sine tone probe. The center of the shaded region indicates the mean and width of the region is two standard deviations in overall width. Figure 6 shows that for listeners A and Z, matches became highly unstable as the edge frequency approached 5000 Hz. The same was true for listener I, but his data are not shown because his matches at 2000 Hz were not consistent enough to make a strong contrast with matches at higher edge frequencies such as 5000 Hz. For listener A, the standard deviation was less than a semitone for the eight edges below 4.5 kHz and greater than a semitone for the six highest edge frequencies-4.5 kHz and above. For listener Z, the standard deviation was greater than a semitone for the two edges above 4.5 kHz.
Listener S (Fig. 7) was an exception. Experiments with listener S began in the same range as for the other listeners, 2-8 kHz. With time, it became evident that this listener could make successful matches for edge frequencies well above 5 kHz. 6 Therefore, listener S was retained for another four months and experiments were restarted using the range from 2 to 16 kHz. Figure 7 shows that the matching standard deviation was less than a semitone for the 14 edges below 9 kHz and was greater than or equal to a semitone for the 6 edges at 9 kHz and above.

B. HP matches
The pitch matches by listeners A and B for HP noise are shown Fig. 8. Similar to the matches for LP noise for listeners A and Z (Fig. 6), the scatter among the matches increases near 5000 Hz. For listener A, the standard deviation was less than one semitone for the six edges below 5 kHz and greater than one semitone for the six edges at 5 kHz and above. For listener B, the standard deviation was less than one semitone   6. (Color online) The ratios of matching frequency to edge frequency for listeners A and Z with LP noise are shown by circles. Some are slightly displaced horizontally for clarity. Matches that were an octave higher than plotted here are shown by an upward triangle. Matches that were an octave lower than plotted here are shown by a downward triangle. Small numbers in the lower rows show the mean and standard deviation of the percentage shift. Small numbers in the upper row indicate the number of matches (trials) included in the average for each frequency. Matches with discrepancies greater than 6400 cents are plotted above/below the main graph. The dashed line is the prediction of the sinc-autocorrelation model for a lag window of 15 ms. A longer window, such as 30 ms, would fit better for listener A. The shaded region near zero cents is centered on the matches for sine tones, and the width of the region is two standard deviations. for the two edges below 3 kHz and greater than one semitone for the ten edges at 3 kHz and above.
The exceptional listener, listener S was extensively tested with HP noise with high-frequency edges. His results are shown in Fig. 9. Listener S is indeed extraordinary. First, in the control experiment with sine tone matching, his standard deviation for 16-kHz tones was less than 15 cents. Only at 17 kHz did his standard deviation (86 cents) approach the 105 cents difference between 16 and 17 kHz. For edge pitches, the standard deviation was less than one semitone for the 13 edges below 10 kHz and greater than 1 semitone for 5 of the 6 edges at or above 10 kHz. It appears that a 12-kHz edge was not distinguishable from a 10-kHz edge.
It is very unlikely that the extraordinary performance of listener S was the result of an experimental artifact. The noise bands with spectral edges and the matching sine tones were generated by different electronic systems. It is hard to see how a stimulus generation artifact common to the noise and tone could serve as a cue. Further, the matching ability for very high frequencies was resistant to the introduction of LP masking noise (100-2000 Hz, 50 dBA). Experiment runs, including very high-frequency edges, always also included medium-frequency edges (2-4 kHz). Therefore, the listener was required to make large leaps in pitch range throughout.
It seemed possible that the matches by listener S were facilitated by the procedure whereby experimental runs either presented all HP noises or presented all LP noises. To test this idea, listener S did 26 runs where each run had 4 HP and 4 LP noises randomly ordered. Edge frequencies ranged from 400 to 12 000 Hz. The results showed that highfrequency edge pitch matching accuracy was not adversely affected by this mixing procedure. Standard deviations for edges at 6, 7, and 9 kHz were less than 1%. The standard deviation at 10 kHz was only 1.2%, but the standard deviation at 12 kHz was again large.
Clearly, the data from listener S do not agree with a model which posits a temporal origin for edge pitch with consequent limitation to 5 kHz. There are several possible explanations: perhaps listener S has an auditory system that preserves neural timing up to frequencies that are an octave higher than other humans-also cats (Johnson, 1980). Alternatively, listener S found some other process, presumably based on place mechanisms, for matching pitches. Unlike other listeners, listener S may be responsive to a tonotopic excitation pattern sharpened by an abrupt disappearance of inhibition at a high-frequency edge.

C. LP and HP NEP
Two of the listeners in the high-frequency experiments (listeners A and S) participated in both the LP and the HP versions testing the high-frequency limit for NEP. Identical edge frequencies were used for both versions enabling a paired perceptual comparison between LP and HP NEP. The relevant data are the standard deviations appearing in Figs. 6(a), 7, 8(a), and 9 for these listeners. For listener A, the standard deviation was smaller for HP NEP than for LP NEP for 8 out of 11 edge frequencies, and the 3 exceptions were all for f e > 5000 Hz, where matching was difficult for listener A. Similarly, for listener S, the standard deviation was smaller for HP NEP than for LP NEP for 15 out of 18 edge frequencies, and the 3 exceptions were all for f e > 11 000 Hz. Listener S remarked informally that he found HP NEP easier to hear, even though HP NEP was presented with lowfrequency masking noise.

Temporal model
A straightforward explanation for the advantage of HP noise over LP noise for high f e is the flip-side of the explanation of the advantage of LP noise for low f e . A HP noise with an edge frequency of 2 kHz or above leaves much of the pitch-critical apical region of the cochlea unexcited by lowfrequency components. Further, calculations with a gammatone filter bank show that this region is excited by the remote high-frequency components. The size of the ACF oscillations decreases as the best frequency becomes ever smaller than the edge frequency but, because it is normalized, the ACF does not disappear and it retains the periodicity expected from the sinc-autocorrelation model for BF at least several octaves below the edge frequency. For a LP noise with high edge frequency the (basilar) region for spectrally remote excitation is smaller. Therefore, a temporal model predicts that HP noise leads to a stronger pitch sensation, in agreement with experiment.

Place model
For LP noise with high edge frequency, the edge of the excitation pattern is broad because neurons with BF near the edge are excited by noise below their BF. For HP noise with high edge frequency, the edge of the excitation pattern is sharp because neurons with BF near the edge are excited by noise above their BF. Therefore, the usual asymmetry of auditory tuning predicts that edge pitch should be stronger for HP noise, again in agreement with experiment. For very high edge frequency, the place model has an advantage over the temporal model because it does not require neural synchrony.

VII. DISCUSSION
The edge pitch stimuli are of special interest for temporal models of pitch because of the pattern of peaks in their ACFs. The pattern can be viewed with reference to a periodic stimulus for which the ACF exhibits a set of regularly occurring major and minor peaks. The major peaks are found at time delays (lags) corresponding to integer multiples of the fundamental period (n ¼ 0,1,2,…). Patterns of minor peaks in the ACF depend on the amplitudes of harmonics in the stimulus. By contrast, the peaks of the ACF for aperiodic noise with a sharp spectral edge are displaced from integer multiples by 1/4 unit. Therefore, the edge pitch stimuli represent a temporal analog to the "pitch shift" stimuli (e.g., Schouten et al., 1962) in that the predicted pitch appears as a best fitting parameter in a model that is ideal for a periodic stimulus.
Early temporal models for pitch (Meddis and Hewitt, 1991a,b;Cariani and Delgutte, 1996a,b), estimated pitch by finding the first major peak in the ACF, or summary autocorrelation function (SACF), but more robust estimations are obtained from more recent models (Cariani, 2004;Bidelman and Heinz, 2011) that incorporate multiple peaks. Using patterns of multiple peaks can reduce octave errors and make successful pitch predictions for a wider range of stimuli. The edge pitch matching data also require multiple autocorrelation peaks (Fig. 1). Edge pitch experiments can reveal the number of peaks that contribute to pitch perception as a function of frequency range.
The more recent models also incorporate more realistic physiology. In order to handle pitch multiplicity-hearing out multiple pitches from double vowels, musical dyads and triads-the autocorrelation process needs to be preceded by cochlear filtering and rectifying neural transduction. Bandpass filtering and half-wave rectification of a temporally detailed waveform avoids cancellations between peaks and valleys within the SACF. The noise band with a sharp spectral edge, as treated in the present paper, is simpler, and useful predictions can be made with an autocorrelation model based only on the average stimulus power spectrum. The only oscillations in the ACF are from the edge itself, and they provide similar information in every off-frequency tonotopic region. For spectral-edge stimuli, the ACF is simple enough that an elementary model, accepting all the peaks without regard for their heights, can be used.
Inspired by B ek esy's 1963 report of pitches at both edges of a noise band, Small and Daniloff (1967) studied the pitches of LP and HP noise. They initially considered a sinetone pitch matching experiment, similar to the experiment reported here, but gave it up as "time consuming and difficult for subjects." Instead, they asked subjects to adjust a filter to produce an edge pitch that was an octave above or below a standard noise having an edge. Despite the difference in methods and the very shallow slopes used by Small and Daniloff (1967; 635 dB/octave), there are a number of parallels between their results and ours. Their subjects were unable to hear reliable edge pitches for HP noise with edge frequencies (f e ) below 610 Hz, but subjects had no trouble with LP noise having much lower edge frequencies. That result resembles our experience with low f e , though our HP limit was about two octaves lower than 610 Hz, probably because of our sharper edge. Small and Daniloff (1967) had ten listeners, and five of them were able to attempt octave matches above a 9620-Hz HP edge. Three of them attempted octave matches above a 9660-Hz LP edge. Evidently they too had some listeners, like our listener S, who were capable of hearing edge pitch well above the 5-kHz limit expected for neural timing. 7 Apparently they, too, found HP noise easier than LP noise for high edge frequencies.
For both LP and HP noise, Small and Daniloff (1967) found that attempts to match an octave above a standard were too high (matching edge frequency more than a factor of 2 greater than standard edge frequency) and attempts to match an octave below a standard were too low. Given the increasing magnitudes of the relative pitch shifts observed in our experiments as the frequency decreases and the signs of those pitch shifts, we would have predicted their results for HP noise but the opposite to their results for LP noise. Possibly the comparison is frustrated by the well-known octave enlargement (Ward, 1954).
Pitches evoked by sharp spectral edges have been studied through pitch matching experiments on periodic complex tones with many strong harmonics (Martens, 1981;Kohlrausch and Houtsma, 1992). Such complex tones include multiple pitch cues-the low pitches of the complex and the pitches of resolved or partially resolved components. The stimulus is more complicated than noise bands. Kohlrausch and Houtsma (1992) found that pitch matching variance monotonically decreased as the fundamental frequency of the complex decreased, increasing the spectral density. By extrapolation, one might expect the smallest variance for noise bands-the ultimate in spectral density. These authors also found that the pitch of a lowpassed complex tone with a 2000-Hz edge was usually matched by a sine tone above 2000 Hz. This is also the prediction of a periodicity analysis of the waveform (Hartmann, 1998). The stark contradiction between that observation and the unambiguous evidence that edge pitches occur below the edge frequency for LP noise, as well as the complexity of the complex tone stimulus, discourages attempts to unify these two edge pitch effects.

VIII. SUMMARY
Open-set melodic interval identification experiments show that NEPs qualify as musical pitches for both LP and HP noise, although there are pitch shifts away from the edge frequency (f e ). The pitch shifts found by Klein and Hartmann (1981) were similarly found in the experiments reported here using higher quality digital stimulus generation. Specifically, the pitch of a LP noise is below f e and the pitch of a HP noise is above f e . These pitch shifts were helpful data in evaluating models of edge pitch perception: a temporal model, and a place model.

A. Temporal model
The temporal model of NEP hypothesized that pitch is determined by a characteristic autocorrelation lag, which is a mean of the weighted lags of the peaks of the broadband noise ACF. Parsimony was served by approximating the ACF by a sinc function, corresponding to a rectangular power spectrum. This model makes several predictions in agreement with experiments: (1) The predicted sign of the pitch shift agrees both for LP and for HP noise. (2) The pitch shift magnitude is larger, as a percent of the f e , for lower f e .
(3) The pitch shift magnitude is larger for HP than for LP. (4) The pitch prediction as a function of f e has about the right curvature. (5) The window duration, which is the adjustable parameter in the temporal model, is within a reasonable range. (6) The optimum window duration increases with decreasing f e as expected. 8

B. Place model
An alternative model for NEP is based on a place theory of pitch perception in which the edge of the excitation pattern is sharpened by lateral inhibition. In this model, tonotopically ordered neurons at some level of the auditory system are inhibited by excitation of neighboring neurons with higher and lower characteristic frequencies. At an edge, the primary excitation of neighbors beyond the edge disappears and so does their ability to inhibit excitation. The result is an enhancement of excitation of neurons near the edge that do receive primary excitation. The enhancement therefore occurs at places having characteristic frequencies below the edge frequency for lowpassed noise and above the edge frequency for HP noise, in agreement with experimental pitch shifts. Predicting the pitch shift requires an estimate of the displacement of the peak of the enhancement away from the edge. If the displacement is a constant fraction of the edge frequency (Shamma, 1985) the predicted pitch shift would be a flat line on a pitch shift plot such as Fig. 3(b). If the displacement is related to neural coordinates as initially established in the cochlea (e.g., Greenwood, 1961), the relative displacement increases for decreasing frequency-a behavior that is consistent with the experimental observations of edge pitch shifts as shown in Fig. 3(b).

C. Critical experiments
Experiments using low and high edge frequencies were done to try to distinguish between the temporal and place models.

Low edge frequency
The experiments of Sec. IV showed that as the edge frequency decreased below 150 Hz, the pitch persisted for a LP noise but disappeared for HP noise. As argued in Sec. IV C, this result is consistent with the timing model but not with a place model that assumes asymmetrical auditory filters.

High edge frequency
The experiments of Sec. VI showed that for high edge frequencies, the pitch was stronger for HP noise than for LP noise. As argued in Sec. VI C, this result is consistent with both timing and place models.
As the edge frequency approached 5000 Hz, the pitch disappeared for most listeners, but for at least one listener a pitch persisted up to an edge frequency of 10 000 Hz. The latter result argues against a timing model. The temporal model for NEP requires that neural synchrony be maintained in the frequency region of the edge. If useable neural synchrony disappears as the stimulus frequency passes 5 kHz, then the edge pitch ought to disappear as well. Shamma's 1985 lateral inhibition model also requires synchrony. Experiments with both LP and HP noise found that edgepitch matching deteriorated considerably above 5 kHz for listeners A, B, I, and Z. However, listener S made matches with a standard deviation of less than a semitone for an 8-kHz LP edge and also for an 11-kHz HP edge. Further, although the matches by the other listeners may have been inaccurate for edge frequencies near 5 kHz, their data do not support the conclusion that the edge pitch disappeared entirely. Either temporal synchrony is, at least weakly, maintained at high frequencies in human listeners or listeners are able to exploit place of encoding to hear an edge pitch. Place of excitation as enhanced by lateral inhibition at an edge might provide some high-frequency information for all listeners and lots of information for listener S.
This equation makes intuitive sense if we imagine that s n % n/p, in which case we obtain the consistent result However, the calculation ofŝ from Eq. (A2) is counter intuitive in that this equation gives increasing weight to peaks of longer lag (i.e., larger n). Therefore, we abandon the minimization of E 1 in Eq. (A1). Instead we imagine that the lag of each peak beyond the first represents the period of a subharmonic. In the context where the peaks are approximately equally spaced, the nth peak points to a frequency that is the reciprocal of s n /n. Thinking this way, we choose the bestŝ to minimize the error E 2 , ðs n =n ÀŝÞ 2 : The best estimate ofŝ then becomeŝ For the LP sinc-ACF, the lag values of the peaks are given by the approximation in Eq. (2), s n ¼ (n þ 1/4)/f e . This approximation is tested in Appendix B. Therefore, The sum P 1=n is called a "harmonic series" by mathematicians (because of the wavelengths of successive harmonics in a periodic complex tone), and it is approximately equal to ln(N) þ c þ 1/(2 N), where c is the Euler-Mascheroni constant, c % 0.57722.
The reciprocal ofŝ is the predicted value of pitch p. Therefore the ratio of the pitch to the edge frequency for LP noise is (3) in the text.

HP
The ACF for a rectangular HP noise band, a HP , can be computed by starting with the ACF for an infinite band, having power at all frequencies from zero out to infinity (a 1 ), and then subtracting the ACF for a LP rectangular band, a HP ðsÞ ¼ a 1 ðsÞ À a LP ðsÞ ¼ a 1 À sinð2pf e sÞ=ð2pf e sÞ: Function a 1 is a delta function, and its area can be chosen to fulfill the usual normalization requirement for ACFs, a HP (0) ¼ 1. Because of the minus sign in Eq. (A8), a HP (s) has peaks where a LP (s) has valleys. The two functions differ by half a period. For the sinc-ACF, the peaks of a HP (s) occur at s n ¼ (n -1/4)/f e , which is just the same as Eq.
(2) except for the minus sign. It follows that the pitch prediction for HP NEP is the same as Eq. (3) [or Eq. (A7)] for LP NEP except for a sign, namely This equation is Eq. (4) in the text. Because it is calculated from the components that are missing from the spectrum instead of the components that are present in the spectrum, the ACF for HP noise, a HP , is less well defined than the ACF for LP noise. The rest of this Appendix is a test to examine a HP , especially the peak locations, with a numerical example. Figure 10 is the normalized autocorrelation for a noise with equal-amplitude, random-phase components from 500 to 20 000 Hz. Like the experimental stimuli, the spectrum was attenuated between 16 and 20 kHz, the duration was 500 ms, and the sample rate was 100 000 sps. Figure 10 shows that the peaks of the envelope agree reasonably with the n -1/4 approximation noted just below Eq. (A8), and that was an essential reason for the test. However, the actual values of autocorrelation are small because the function is normalized by all the components in the band. Restricting the band to the region where neural synchrony occurs or to a single auditory filter would increase the size of the normalized function. FIG. 10. (Color online) Autocorrelation for a HP noise band with equalamplitude, random-phase components from 500 to 20 000 Hz. It is normalized to a value of 1.0 (well off the plot) for a lag of zero. The vertical lines are drawn at lag values given by s n ¼ (n -1/4)/f e , f e ¼ 500 Hz, for 1 n 15 as a test of the n -1/4 approximation for the lag values of the peaks. Close inspection reveals a discrepancy for the leftmost peak, but discrepancies for the other peaks are too small to be seen. Diamonds indicate integer multiples of the period 1/f e . They are one-quarter period away from the vertical lines. 1 The 1981 experiment used only three different digital sound files with edge frequencies in three ranges. Finer gradation of edge frequencies was obtained by changing the sample rate. Because the edge frequencies from different ranges overlapped, small shifts attributable to details of the sound files themselves were observed, but the data from different ranges are combined in Figs. 2 and 3 for simplicity. The different ranges can be seen in the 1981 publication. 2 Lateral inhibition, if it exists anywhere in the auditory system, might occur at any level of the auditory neural pathway. An inhibition model based on the Greenwood formula for the cochlea makes the assumption that tonotopy at the site of the putative lateral inhibition follows the tonotopy in the cochlea. Therefore, place shifts at the relevant site can be represented by displacements along the cochlear partition as measured in millimeters. 3 The experiments by Klein and Hartmann (1981) maintained a broadband noise throughout an experimental trial, including the nominally "silent" intervals. It was thought that including a noise background (without an edge) along with the matching sine tone would benefit the listener by making the matching interval sound more like the target interval. Subsequent experience with edge pitch matching indicated no perceived benefit from the noise background and none was used in the current experiments. 4 The consistency in matches for listener S can be understood as a proclivity to match indistinguishable edges by sine tones near 50 or 60 Hz. For an edge at 100 Hz, 13 of 16 matches were closely spaced and too low (average -884 cents) pointing to 60.0 Hz. At 70 Hz, 11 of 16 matches were too low by an average of -333 cents and four others were an octave above that. These matches pointed to 57.8 Hz. At 50 Hz, 10 of 16 matches were too high by an average of 198 cents, and the other 6 matches were an octave above that. These matches pointed to 56.0 Hz. Therefore, the preponderance of matches (means,60.0,57.8,and 56.0 Hz) were made at about the same frequency for the low edge-frequency range, independent of what the edge frequency actually was. 5 The autocorrelation model for a HP noise predicts that the pitch for a 1200-Hz edge should be higher than for a 600-Hz edge by a factor of 1.96 (30-ms window) or 1.94 (15-ms window) instead of 2.00. Both ratios are considerably higher than the equal-tempered minor 7th at 1.78. If the perceptual octave above 600 Hz is enlarged by 1% per McKinney and Delgutte (1999), i.e., 1212 Hz, a minor 7th could correspond to a slightly larger ratio (1.8), but this ratio is still much smaller than 1.94. 6 Data from the last 12 runs of this initial series showed 11 of 12 matches at f e ¼ 6 kHz within 650 cents of the edge (12 of 12 with octave correction). At 7 kHz, 11 of 12 matches were within 100 cents of the edge frequency (with one octave correction). At 8 kHz, 11 of 12 matches were within 150 cents of 8 kHz. 7 In connection with unexpected results, it may be worth noting the caveat about octave switching in the filter as suggested by Small and Daniloff (1967). 8 The experiment protocol included trial-by-trial feedback to the listeners intended to maintain listener interest. The listeners knew that some pitch shift was an expected result, which would discourage them from adjusting their responses to reduce the shift. Also, the listeners were so skilled at the matching task, and the experiments continued for so many trials, that we think it unlikely that the feedback had a significant effect on the data. Nevertheless, some bias may have occurred, especially because most of the experiment runs were blocked-LP noise or HP noise. If it had occurred, bias would likely have reduced the measured pitch shift. Such reduction would have had the effect of artificially increasing the estimates of the autocorrelation window duration.
FIG. 11. (Color online) Sinc function for a LP edge at f e ¼ 100 Hz. Two vertical lines are drawn for each peak, the shorter at the exact peak for the sinc function, the longer at the approximate lag value (n þ 1/4)/f e . The long and short vertical lines become indistinguishable as either the lag increases or the frequency increases (or both).