Input-output functions of the nonlinear-distortion component of distortion-product otoacoustic emissions in normal and hearing-impaired human ears.

Distortion-product otoacoustic emissions (DPOAEs) arise in the cochlea in response to two tones with frequencies f1 and f2 and mainly consist of two components, a nonlinear-distortion and a coherent-reflection component. Wave interference between these components limits the accuracy of DPOAEs when evaluating the function of the cochlea with conventional continuous stimulus tones. Here, DPOAE components are separated in the time domain from DPOAE signals elicited with short stimulus pulses. The extracted nonlinear-distortion components are used to derive estimated distortion-product thresholds (EDPTs) from semi-logarithmic input-output (I/O) functions for 20 normal-hearing and 21 hearing-impaired subjects. I/O functions were measured with frequency-specific stimulus levels at eight frequencies f2 = 1,…, 8 kHz (f2/f1 = 1.2). For comparison, DPOAEs were also elicited with continuous primary tones. Both acquisition paradigms yielded EDPTs, which significantly correlated with behavioral thresholds (p < 0.001) and enabled derivation of estimated hearing thresholds (EHTs) from EDPTs using a linear regression relationship. DPOAE-component separation in the time domain significantly reduced the standard deviation of EHTs compared to that derived from continuous DPOAEs (p < 0.01). In conclusion, using frequency-specific stimulus levels and DPOAE-component separation increases the reliability of DPOAE I/O functions for assessing cochlear function and estimating behavioral thresholds.


I. INTRODUCTION
The healthy cochlea amplifies sound by actively (Gold, 1948) enhancing vibrations of the basilar membrane at low to moderate sound pressure levels (Sellick et al., 1982) and, thereby, establishes the high sensitivity, the large dynamic range, and the sharp tuning of the auditory system (for review, Robles and Ruggero, 2001). The system of biomechanical components involved in the amplification process is referred to as the cochlear amplifier, a term introduced in a review paper by Davis (1983). As a by-product of the amplification process, the cochlea emits sound waves measurable in the ear canal using a sensitive microphone, both in the absence of external sound, referred to as spontaneous otoacoustic emissions (SOAEs), and in response to external stimuli, referred to as evoked otoacoustic emissions (OAEs). OAEs are widely used in clinical routine as an objective and noninvasive measure of cochlear function, such as in newborns and young children or in serial monitoring of potentially ototoxic drugs (Probst et al., 1991).
One type of OAE commonly used in clinical applications and research is the distortion product otoacoustic emission (DPOAE) which, by definition, is produced when stimulating simultaneously with two tones, with frequencies denoted by f 1 and f 2 where f 2 > f 1 (Kemp, 1979;Avan et al., 2013). In humans, the most pronounced DPOAE is found at the cubic difference frequency f DP ¼ 2f 1 -f 2 and is assumed to be comprised mainly of two components generated by different mechanisms at different sites along the basilar membrane (Brown et al., 1996;Shera and Guinan, 1999). The first component arises directly from nonlinear interaction of the two traveling waves, which overlap maximally close to the tonotopic place of the f 2 tone and simultaneously deflect the stereocilia of the outer hair cells (OHCs) with frequencies f 1 and f 2 . Because of its nonlinear dependence on stereocilia deflection, the receptor current exhibits intermodulation products, which are coupled into the cochlear fluid as vibrations by mechanical forces from the electromechanical transducer of the OHC soma (Avan et al., 2013). In the case of the cubic intermodulation product, the vibrations are evident as two traveling waves of frequency f DP . One of the waves propagates retrograde toward the stapes and is referred to as the nonlinear-distortion component, in consequence of its direct origin in nonlinearity. The other wave propagates anterograde to the tonotopic place of f DP , where coherent reflection, presumably due to irregularities of mechanical properties along the cochlea, gives rise to another DPOAE component (Shera and Guinan, 1999), referred to as the coherent-reflection component. DPOAE amplitudes or levels are known to decrease with increasing hearing thresholds (Probst and Hauser, 1990;Gorga et al., 1993), an observation which is exploited for diagnostic purposes. However, the high variability of DPOAE amplitudes across subjects (Probst et al., 1991) and insufficient performance at low frequencies (Gorga et al., 1993) limit their accuracy for assessing behavioral thresholds. An alternative approach utilizes the dependence of DPOAE level on the stimulus level, L 2 , of the second stimulus tone, called the DPOAE input-output (I/O) function, to obtain DPOAEs at low stimulus levels. This approach has been shown to enhance the sensitivity of DPOAEs for detecting cochlear damage and to increase their correlation with auditory thresholds (Gaskill and Brown, 1990;Kummer et al., 1998;Dorn et al., 2001). Boege and Janssen (2002) introduced a refined procedure based on a semi-logarithmic plot of the DPOAE I/O function. Using stimulus levels chosen according to the so-called scissor paradigm, L 1 ¼ 0.4L 2 þ 39 dB (Kummer et al., 1998), the DPOAE-pressure amplitude was found to depend linearly on L 2 . This linearized form of the DPOAE I/O-function enabled determination of the so-called estimated distortion product threshold (EDPT) by extrapolating the linear regression line to the abscissa (Boege and Janssen, 2002). The EDPTs were shown to correlate significantly with auditory thresholds (Boege and Janssen, 2002;Gorga et al., 2003;Neely et al., 2009). However, despite this linearization of the DPOAE I/O functions, the standard deviation of the differences between auditory thresholds and EDPTs was higher than 10 dB, with individual threshold estimation errors being as much as 30 dB or more (Schmuziger et al., 2006).
One reason for the limited test performance when using DPOAEs to detect hearing loss or relating DPOAE thresholds to behavioral thresholds is interference between the nonlinear-distortion component and the coherent-reflection component, each of which has frequency f DP . When stimulating the cochlea with conventional continuous primary tones, wave interference occurs because the DPOAE signal measured in the ear canal is the vector sum of these two signal components (Brown et al., 1996), which is then usually quantified by spectral analysis (Shera and Guinan, 1999;Avan et al., 2013). In contrast to the above-mentioned studies, the present work investigates the impact of interference between the DPOAE components when using the semilogarithmic DPOAE I/O functions to estimate behavioral thresholds.
While the nonlinear-distortion component exhibits relatively constant phase as function of f 2 , the phase of the coherent-reflection component changes considerably with f 2 (Shera and Guinan, 1999). This frequency-dependent phase difference between the two components leads to quasi-periodic variation of DPOAE amplitude as function of f 2 , commonly referred to as DPOAE fine structure (Gaskill and Brown, 1990;Heitmann et al., 1998;Talmadge et al., 1999), and is characterized by amplitude maxima and minima corresponding to constructive and destructive interference, respectively. Depending on the relative differences in amplitude and phase between the two components, the measured DPOAE response might not accurately reflect the functional state of the cochlea at the f 2 place. For example, the two DPOAE components might almost completely cancel when the phase difference is close to 180 and their amplitudes are similar. Moreover, the locations of minima and maxima of the DPOAE fine structure can shift in frequency with increasing stimulus levels (He and Schmiedt, 1993). Such frequency shifts become apparent as valleys and peaks in three-dimensional plots of DPOAE amplitude as function of L 1 and L 2 (Zelle et al., 2015a). These intensitydependent interference effects can cause considerable deformations in DPOAE I/O functions, yielding large standard deviations in the estimates of slope and EDPT (Mauermann and Kollmeier, 2004;Dalhoff et al., 2013).
The two DPOAE components become distinguishable as short-and long latency components when converting a DPgram into its temporal counterpart using an inverse fast Fourier transform (IFFT) (Stover et al., 1996). The IFFT technique can be applied to reduce fine structure by exploiting the shorter latency of the nonlinear-distortion component relative to the coherent-reflection component in the time domain (Kalluri and Shera, 2001;Mauermann and Kollmeier, 2004). Similarly, acquisition paradigms with swept primary tones utilize the different latencies to estimate the nonlineardistortion component using a least-squares-fit (LSF) algorithm (Long et al., 2008;Abdala et al., 2015) or by means of timefrequency filtering (Moleti et al., 2012). Despite offering reliable extraction of the nonlinear-distortion component, these techniques either rely on time-consuming recordings of DPgrams or employ chirps with high frequency resolution at the expense of acquisition time, which can be disadvantageous if I/O functions at only a few frequencies are of interest, as in a clinical setting. An alternative method to obtain DPOAEs solely expressing the functional state of the cochlea at the f 2tonotopic place, is the use of a third tone to suppress the coherent-reflection component (Heitmann et al., 1998). This technique does not require recordings at multiple frequencies, but fails to improve accuracy or reliability when assessing hearing status (Dhar and Shaffer, 2004;Johnson et al., 2006b;Johnson et al., 2007).
The presence of two DPOAE components also becomes evident during the onset and the offset of the DPOAE signal, when using a pulsed f 2 stimulus and analyzing the DPOAE signal in the time domain (Whitehead et al., 1996;Talmadge et al., 1999;Konrad-Martin and Keefe, 2005). Because of their different latencies, the nonlinear-distortion component can be separated from the coherent-reflection component by a method called onset decomposition (OD) (Vete sn ık et al., 2009). This technique samples the envelope of the DPOAE signal at a time instant before the coherent-reflection component starts to interfere. Although a very promising technique, OD as was implemented by Vete sn ık et al. (2009) was unnecessarily time-consuming because the stimulus pulse duration was longer than required as the signal information after the sampling instant was discarded.
The present study extends previous research by using the OD technique to extract the nonlinear-distortion component from the DPOAE signal produced by stimuli of short duration and then using the DPOAE I/O function of the nonlinear-distortion component to deliver the EDPT and estimate auditory threshold. These so-called short-pulse DPOAEs utilize brief f 2 pulses with a duration similar to the relative delay between the two DPOAE components, in order to facilitate component separation in the time domain (Zelle et al., 2013). In this way, semi-logarithmic I/O functions based on the nonlinear-distortion component allow estimation of auditory thresholds without artifacts due to interference of the two DPOAE components. Moreover, DPOAE recordings were made with optimized, frequency-dependent stimulus levels (Zelle et al., 2015a), which account for the different compression of the primary-tone traveling waves at the generation site of the DPOAE close to the f 2 -tonotopic place (Robles and Ruggero, 2001). In contrast to previously reported primary-tone levels (Kummer et al., 1998;Johnson et al., 2006a), the optimal stimulus-intensity functions used here were based solely on the nonlinear-distortion component. For comparison, DPOAE I/O functions were also acquired conventionally with continuous primary-tone stimulation. Experiments were conducted with normal-hearing and hearing-impaired subjects in a clinically relevant frequency range from 1 to 8 kHz. Estimates of auditory thresholds based on both short-pulse and continuous DPOAEs are compared to behavioral thresholds measured by B ek esy audiometry to evaluate the utility of short-pulse DPOAEs for objectively determining behavioral thresholds. It is shown that the short-pulse stimulus and analysis paradigms allow estimation of auditory threshold with hitherto unprecedented high accuracy.

II. MATERIALS AND METHODS
A. Study design and subjects DPOAE I/O functions were recorded unilaterally from 20 normal-hearing and 21 hearing-impaired subjects with sensorineural hearing loss. Subjects were between 18 and 70 years and the normal-hearing group was significantly younger (mean age: 27.6 6 4.2 years) compared to the hearingloss subjects (mean age: 49.7 6 13.0 years, p < 0.001). In order to identify hearing-impaired ears, behavioral thresholds (BTs) were recorded with clinical pure-tone audiometry (Audiometer AT 900, Auritec, Medizindiagnostische Systeme, Hamburg, Germany). Subjects were classified as normal-hearing if all BTs for frequencies between 1 and 8 kHz were better than 20 dB hearing level (HL). BTs for hearing-impaired ears ranged from 0 to 77 dB HL with an average value of 24 6 18 dB HL (normal-hearing: 7 6 5 dB HL). All subjects were free of any conductive hearing impairment as ascertained by standard 226-Hz tympanometry (Madsen-Zodiac 901, GN Otometrics, M€ unster, Germany) and otoscopy. Measurements of clinical, notchednoise, auditory-brainstem responses (ABR) for 1, 2, and 4 kHz and stimulus levels from 25 to 75 dB nHL in 10-dB steps (Evoselect ERA system, Pilot Blankenfelde Medizinisch-Elektronische Ger€ ate, Blankenfelde, Germany) and acoustic reflex measurements at 0.5, 1, 2, and 4 kHz (Madsen-Zodiac 901) were used to exclude possible severe neural conditions for the hearing-impaired group. Subjects were included only if ABR waves V were detectable for at least one of the investigated frequencies. To avoid falsepositive exclusions in cases without identifiable ABR signals, subjects were also included if at least one ipsilateral stapedius reflex could be detected. In 17 hearing-impaired subjects, ABR-wave V was detectable for at least one test frequency for stimulus levels equal to or below 65 dB nHL. In the remaining four subjects, ipsilateral stapedius reflexes were detectable at two or more frequencies. The subjects had no history of tinnitus.
The study was approved by the Ethics Committee of the University of T€ ubingen in accordance with the Declaration of Helsinki for human experiments. An informed consent in written form was provided by all subjects.

B. Measurement system and calibration
OAE measurements and B ek esy audiometry were performed unilaterally using an ER-10 C DPOAE probemicrophone system (Etymotic Research, Elk Grove Village, IL) connected to a 16-bit analog output card and a 24-bit signal acquisition card (NI PCI 6733 and NI PCI 4472, National Instruments, Austin, TX) situated in a commercially available PC. The sampling frequency was 102.4 kHz. Stimulus generation and data acquisition were controlled by a custom-built toolbox implemented in LabVIEW (version 12.0, National Instruments, Austin, TX). The sound pressure of the ER-10 C speakers was ascertained by in-ear calibration, which was repeated every 120 to 240 s depending on the acquisition progress. Both the output of the speakers and the recorded microphone signal were corrected for the transfer functions of an artificial ear simulator (B&K type 4157, Br€ uel & Kjaer, Naerum, Denmark) and of the ER-10 C microphone to yield DPOAEs, which are considered to correspond to recordings close to the tympanic membrane. Further details of the calibration routine are given elsewhere (Zelle et al., 2015a). Signal post-processing and data analysis were done in MATLAB (version 9.0, MathWorks, Natick, MA).

C. Assessment of behavioral thresholds
A modified method of B ek esy tracking audiometry was performed using the ER-10 C ear probe to assess behavioral thresholds in each subject directly before the OAE data acquisition started. The sound pressure of the continuous tone was controlled by the data acquisition software while the subject was required to indicate perception of the stimulus by pressing or releasing a button. The output level, L, started at À20 dB sound pressure level (SPL), well below hearing threshold, and increased in 0.1-dB steps with an alteration rate of 8 dB/s. The acquisition setup gradually decreased the intensity-rate change to avoid clicks in the presentation of tones with high output level (ultimately, 2 dB/s at L > 60 dB SPL). The subject was instructed to press and hold down a button if the sound was perceived, thereby establishing an upper pure-tone threshold. While the button was held down, the system decreased the output level until the subject lost perception of the sound. Releasing the button indicated a lower pure-tone threshold. The mean value of the lower and upper threshold provided an estimate of the auditory threshold. On average, the elapsed time between the detection of these two thresholds was 2.25 6 0.79 s. The maximum output level was set to 85 dB SPL. As in Dalhoff et al. (2013), behavioral thresholds were recorded not only at each f 2 frequency, but additionally at five to nine (mostly seven) neighboring frequencies, in order to account for the frequency-dependent bandwidth of the short-pulse f 2 stimulus. The frequency range spanned by the lowest and highest neighboring frequencies was 80 Hz at f 2 ¼ 1 kHz and increased to 480 Hz at f 2 ¼ 8 kHz. These frequency ranges were similar to the bandwidths of the associated f 2 pulses.
Three successive B ek esy measurements were recorded and averaged to obtain a reliable estimate of the behavioral threshold. To reduce the impact of outliers, a correction algorithm similar to the one introduced in Dalhoff et al. (2013) was implemented. For each frequency group formed by f 2 and its neighboring frequencies, the median values and the standard deviations were computed separately for the lower and upper thresholds across the three B ek esy recordings. A behavioral threshold which differed from a loweror upper-threshold median by more than three times the associated standard deviation was classified as an outlier and replaced by the median value of the lower or upper threshold of the frequency group. This procedure enabled frequencyspecific outlier correction even for hearing-impaired subjects. Finally, the estimates of the behavioral thresholds at the f 2 frequencies, denoted as L BT , were computed by averaging across frequencies for each frequency group, in order to mimic the spectral spread of the pulsed DPOAE stimuli.

D. DPOAE acquisition and analysis
DPOAE I/O functions were collected at eight frequencies for 1 f 2 8 kHz with a constant frequency ratio of f 2 /f 1 ¼ 1.2. L 2 values ranged from 25 to 75 dB SPL in 5-dB steps with frequency-dependent L 1 values representing preliminary results based on a subset of a recently published study (Zelle et al., 2015a). That study proposed optimized stimulus level pairs, which maximize the amplitude of the nonlineardistortion component. Figure 1 shows the frequency-specific levels of the f 1 tone, L 1 , as a function of L 2 according to (1) from Zelle et al. (2015a) (dashed lines) with the frequencydependence of a and b given by their Eq. (5). The average deviation of the stimulus level pairs used here (symbols) from the optimal stimulus-level path was 0.30 6 1.87 dB, which is within the standard deviation of the population data in Zelle et al. (2015a).

Pulse stimulation
DPOAEs were evoked using a recently introduced multi-frequency acquisition paradigm, which utilizes a sequence of short stimulus pulses for a given set of primarytone levels L 1 and L 2 , to enable extraction of the nonlineardistortion component for multiple stimulus-frequency pairs with frequencies f 1,i and f 2,i from a single recording (Zelle et al., 2014;Zelle et al., 2015a). Each sequence was composed of four stimulus pairs, i ¼ 1,…, 4, each of which comprised a f 1,i pulse of 30-ms duration and a f 2,i pulse with frequency-dependent half width corresponding to the expected relative delay between the two DPOAE components, estimated from the results of Vete sn ık et al. (2009). The sequence of the frequency pairs was chosen to provide sufficient distance in both the frequency and the time domain to enable unambiguous extraction of the DPOAE signal by band-pass filtering. For each L 2 value, two separate measurements were performed with different frequency sequences of either f 2 ¼ [1, 3, 1.5, 6] or f 2 ¼ [8, 4, 2, 5] kHz, yielding a total duration of a single acquisition block of 120 ms. A detailed description of the acquisition technique can be found elsewhere (Zelle et al., 2015a). Cancellation of the stimulus pulses and related stimulus-frequency OAEs was achieved by suitable phase shifts in four consecutive acquisition blocks together with ensemble averaging (Whitehead et al., 1996). Signal averaging was performed until the DPOAE associated with the lowest signal-to-noise ratio (SNR) in the sequence, typically at f 2 ¼ 1 or 8 kHz, yielded a SNR of at least 10 dB, called the 10-dB SNR criterion, or a maximum number of 400 acquisition blocks was reached. Acquisition blocks not enhancing the SNR for a specific DPOAE were excluded from averaging.
For each stimulus pair, the corresponding DPOAE signal, p i ðtÞ, at the frequency f DP,i ¼ 2f 1,i À f 2,i , was extracted from the averaged datasets by zero-phase band-pass filtering using a finite impulse response (FIR) filter with an order of 1200 and filter coefficients computed using a Hamming window. The filter bandwidths were defined as FIG. 1. Frequency-specific stimulus level pairs accounting for compressibility of the basilar-membrane response at the f 2 -tonotopic place, optimized to facilitate DPOAE acquisition with maximum amplitude of the nonlineardistortion component. Dashed lines show the optimal stimulus levels of the f 1 tones, L 1 , as a function of the stimulus levels of the f 2 tones, L 2 , provided by the empirical relation in Zelle et al. [2015a;their Eq. (5)] which was derived experimentally from normal-hearing adult subjects. Symbols are stimulus-level pairs used here; they derive from preliminary results of that study. The frequencies f 2 were chosen to correspond to frequencies commonly used in clinical routine.
with the cutoff frequency f c defined as the frequency at which the attenuation of the filter was 6 dB, the normalized frequencyf 2 ¼ f 2 =f 2;max , and the normalized levelL 2 ¼ L 2 = L 2;max of the corresponding f 2 stimulus. The maximum values were f 2;max ¼ 8 kHz and L 2;max ¼ 75 dB SPL. If a DPOAE signal did not comply with the 10-dB SNR criterion, the bandwidth was gradually reduced using an iterative algorithm and an initial bandwidth defined as with the parameters c 1 ¼ 0.49 Hz, c 2 ¼ 0.71 Hz, and the dimensionless parameter c 3 ¼ À0.245. These parameters were determined by applying a nonlinear least-squares curve fitting method to data from the normal-hearing subset. The iterative algorithm decreased the bandwidth with a scaling parameter a according to with a ¼ 0.9, until the 10-dB SNR criterion was satisfied or a maximum of ten iterations was reached.
The SNR for the short-pulse DPOAE measurements was defined by the ratio of the amplitude of the extracted nonlinear-distortion component,P OD , and a noise estimate in the time domain computed as the root-mean-square value of remaining signal parts without DPOAE components or other coherent signals. This iterative adaptation of the filter bandwidth increased the detection rate of DPOAEs in subjects with generally low SNR or in hearing-impaired subjects, while reducing the filter effect on DPOAE pulse responses with large amplitudes. Due to the broadband character of the pulsed signals, narrowing the bandwidth reduced the DPOAE amplitude and, therefore, limited the potential improvement of SNR by means of band-pass filtering.

Continuous stimulation
For comparison with conventional acquisition paradigms, DPOAEs were also recorded with continuous primary tones and the DPOAE amplitude was evaluated in the frequency domain by sampling the amplitude of the spectrum at the frequency bin associated with f DP . This yielded DPOAEs which represent the vector sum of the nonlineardistortion component and the coherent-reflection component (Brown et al., 1996). The frequencies of both stimulus tones were adjusted to yield an integer number of periods within the acquisition-block length of 100 ms. This adjustment resulted in a slight deviation (magnitude 0.0048) from the constant frequency ratio of 1.2 for some stimulus pairs. Data acquisition was continued until a SNR of at least 10 dB or a maximum iteration number of 100 was reached. Zero-phase high-pass filtering using a FIR filter with a filter order of 1024 was applied to each acquisition block before ensemble averaging. Filter coefficients were computed using a Hamming window with a 3-dB cutoff frequency of 290 Hz, which yielded sufficient attenuation of unwanted low-frequency signals (at least 50 dB below 80 Hz). Because of high-pass filtering, windowing was not required before computing the amplitude spectrum using the fast Fourier transform. Again, acquisition blocks which did not improve the SNR were not included in the ensemble averaging.

Extraction of nonlinear-distortion product components
For the short-pulse DPOAE data, the nonlineardistortion component was extracted in the time domain from the averaged and filtered dataset using an adapted version of the onset-decomposition technique introduced by Vete sn ık et al. (2009). This method samples the envelope of the DPOAE signal to obtain an estimate of the amplitude of the nonlinear-distortion componentP OD (black dot in Fig. 2) at a time point before interference with the coherent-reflection component begins. The envelope was obtained from the absolute value of the Hilbert transform of the DPOAE signal, jHfp i ðtÞgj.
In order to achieve reliable separation of the two DPOAE components, the OD method requires a priori knowledge of DPOAE latencies for proper selection of the FIG. 2. Stimulus and analysis paradigms. Upper part: Short-pulse DPOAE signal, p i ðtÞ, corresponding to the stimulus-frequency pair f 1;i and f 2;i after ensemble averaging and band-pass filtering (thin gray line), and its envelope computed as the absolute value of its Hilbert transform, jHfp i ðtÞgj (thick dark-gray line). Black dotted lines indicate envelopes of the nonlineardistortion component, jHfp i;1 ðtÞgj, and the coherent-reflection component, jHfp i;2 ðtÞgj, extracted from the DPOAE signal using a nonlinear leastsquare curve fitting algorithm (Zelle et al., 2013;Zelle et al., 2015b). For visualization purposes, jHfp i;1 ðtÞgj and jHfp i;2 ðtÞgj are shown in reverse ydirection. The shorter latency of p i;1 ðtÞ enables separation of the two DPOAE components in the time domain. An automated detection algorithm computes the onset of the DPOAE signal (black cross) as the intersection of the tangent (black line) with the abscissa. Using this DPOAE onset, the sampling instant for onset decomposition (black dot) is chosen according to Eq. (5) to estimate the amplitude of p i;1 ðtÞ before p i;2 ðtÞ starts to interfere. Lower part: Schematic of the arrangement of the stimulus pairs (amplitudes not to scale) interlaced in the time domain with f 1 pulses of 30-ms duration and f 2 pulses of frequency-dependent half widths, T HW . Data from subject S054, f 2 ¼ 1.5 kHz, L 2 ¼ 45 dB SPL. sampling instant. However, latencies of OAEs vary across subjects, depend on stimulus frequency and level (Stover et al., 1996;Zelle et al., 2015b), and are expected to change with hearing status (Engdahl and Kemp, 1996;Konrad-Martin and Keefe, 2005). Therefore, the OD technique was extended with an automated signal-detection algorithm to determine the sampling instant independently of the individual DPOAE latency. This algorithm detects the local maxi-mumP of p i ðtÞ closest to the onset of the f 2,i pulse, T 2,i , and sets a tangent (black line, Fig. 2) at the inflection point that is located nearest toP and which exhibits a curvature change from convex to concave. The intersection point of the tangent with the abscissa yields an estimate of the DPOAE onset T 0 (black cross, Fig. 2). Then, the sampling instant for OD is computed by where TP is the time instant of the local maximumP and the factor 2 3 was chosen empirically to avoid estimation errors due to a constructively interfering coherent-reflection component. Figure 3 shows two short-pulse DPOAE responses recorded at 1 and 8 kHz, where both DPOAE components are evident from a notch in the time response (for details, see figure caption). Despite the onset of the f 2 primary being identical in both examples, the delays of the DPOAE responses are considerably different. Using this automated signal-detection algorithm, the OD-technique was able to estimate the amplitude of the nonlinear-distortion component [black dot in Figs. 3(A) and 3(B)] before wave interference began, regardless of DPOAE latency.

E. Determination of estimated distortion-product thresholds
Semi-logarithmic DPOAE I/O functions were derived from the amplitudes of the extracted nonlinear-distortion components for short-pulse stimulation and from the amplitude spectra of the DPOAE signals for continuous stimulation. For each f 2 , the I/O function was linearly extrapolated to the abscissa to yield the EDPT, by definition, the L 2 value at which the DPOAE amplitude is equal to zero (Boege and Janssen, 2002;Gorga et al., 2003). Only DPOAEs complying with the 10-dB SNR criterion (Sec. II D) were included in the regression analysis. At least three data points were required for the regression analysis, otherwise the I/O function was excluded from the data set. EDPTs were accepted for auditory-threshold estimation if they complied with the three objective evaluation criteria introduced in Boege and Janssen (2002): (1) a squared correlation coefficient of r 2 I=O ! 0.8, (2) a standard deviation of the EDPT of r EDPT 10 dB, and (3) a slope of the regression line of s I=O ! 0.2 lPa/dB SPL. Furthermore, EDPTs smaller than À10 dB SPL were excluded from further analysis because this criterion was shown to improve the performance of auditory-threshold prediction by preventing the inclusion of physiologically unrealistic, low EDPTs (Gorga et al., 2003;Dalhoff et al., 2013).
Approximately 38% of the semi-logarithmic I/O functions acquired with continuous primary tones and 25% of the semi-logarithmic I/O functions recorded with short-pulse stimulation exhibited extensive deviation from the expected straight-line behavior, especially at high stimulus levels where saturation was observed. Some I/O functions also showed "deformations" (e.g., notches), particularly at moderate levels, which were evident for both continuous and short-pulse stimulation. Therefore, a correction algorithm was implemented, similar to the saturation-correction algorithm introduced by Dalhoff et al. (2013), to increase the accuracy of the linear regression analysis at low-to-moderate levels.
Beginning at the highest stimulus level, the correction algorithm used an automated procedure to remove a set of sequential data points if they deviated from the presumed linear relationship normally apparent at low-to-moderate levels. The algorithm of Dalhoff et al. (2013) was extended by using not only r 2 I=O but all three statistical evaluation parameters to find a suitable set of data points for regression analysis. For a given f 2 , let N be the number of stimulus levels for which the 10-dB SNR criterion was satisfied (Sec. II D) and M the Dash-dotted lines represent the onset and offset of the f 2 pulses with frequency-specific full widths at half maximum corresponding to the expected delay between the two DPOAE components. With increasing stimulus frequency, both latency and duration of the DPOAE responses decrease considerably. The automated signal-detection algorithm accounts for individual variations in the delay to enable reliable, objective DPOAEcomponent separation using OD. In both examples, a notch in the DPOAE signal (gray arrow) indicates the presence of the two DPOAE components. For subject K003 (A), the notch is associated with a phase jump of 152 in the instantaneous phase (not shown) suggesting destructive interference. For subject S082 (B), the associated phase jump is 321 , which indicates that the notch stems from a delay between the two DPOAE components exceeding the duration of the nonlinear-distortion component. maximum number of stimulus levels (N M). The levels associated with an I/O function are numbered sequentially from L 2,1 at the lowest level to L 2,M at the highest level. Since an I/O function requires at least three valid data points, the removal of high-level data points allows N -2 possible solutions. Each solution is identified by an integer j representing the number of data points removed from the I/O function; that is, j ranges from 0 to N -3. Then, N -2 candidate vectors comprising the three statistical evaluation parameters were defined as z j ¼ ½ r 2 I=O;j s I=O;j r EDPT;j T , where the superscript T denotes the transpose. In order to select the highest value of L 2 to be included in the regression analysis, for each candidate vector the Euclidean norm n j was computed according to where z nadir ¼ ½ minfr 2 I=O g minfs I=O g maxfr EDPT g T is the vector of the worst-case evaluation parameters and z utopia ¼ ½ maxfr 2 I=O g maxfs I=O g minfr EDPT g T is the vector of the best possible, but generally unachievable combination of evaluation parameters. Hereby, for a given evaluation parameter, f g denotes the set of those parameters for j ¼ 0,…, N -3. Both vectors were determined from the (N -2)-tuple of possible I/O functions. n j can take values from 0 to ffiffi ffi 3 p with a value approaching 0 representing the best possible solution. Finally, the value j associated with the minimum of n j , denoted by j min , was used to determine the index k ¼ M -j min of the largest stimulus level, L 2,k , to be included in the regression analysis. As examples, j min ¼ 0 represents the unaltered I/O function where all available DPOAE amplitudes will be included in the computation of the regression line, while j min ¼ 8 indicates the exclusion of DPOAE amplitudes associated with the eight highest L 2 values, resulting in L 2,k¼3 ¼ 35 dB SPL. This method not only corrects for saturation effects, but also accounts for deviations from a straight-line semi-logarithmic I/O function induced by deviations from the optimal stimulus-level path [Eq. (1)] or by two-component interference. Therefore, the algorithm is referred to as the high-level correction algorithm, abbreviated as the HLC algorithm.

F. Estimation of fine-structure contribution
To estimate the number of I/O functions affected by two-component interference, the nonlinear-distortion component and the coherent-reflection component were extracted from the short-pulse DPOAE signal elicited at L 2 ¼ 45 dB SPL. In the case of insufficient SNR at L 2 ¼ 45 dB SPL, the DPOAE signal at the first higher L 2 complying with the 10-dB SNR criterion was selected for the analysis. Extraction was achieved by decomposing the DPOAE signal into so-called pulse basis functions (PBFs) (Zelle et al., 2013). PBF decomposition assumes that the short-pulse DPOAE signal can be described by a vector sum of windowed sine waves, called the pulse basis functions. The sum is least-mean-square fitted to the recorded signal in the time domain to extract the underlying DPOAE components. The fitted function was accepted for further analysis if the normalized squared error of the fit was less than 10% and the squared correlation coefficient between the DPOAE signal and the fitted function was greater than 0.9. A detailed description of the PBF algorithm can be found elsewhere (Zelle et al., 2013;Zelle et al., 2015b) and six examples are given in the supplementary material. 1 Denoting the amplitudes of the nonlinear-distortion and coherent-reflection components byP 1 andP 2 , respectively, the I/O functions were grouped into fine-structure (FS) affected and no-FS affected, depending on whether their amplitude ratio,P 2 =P 1 , was greater than 0.25 at L 2 ¼ 45 dB SPL. This lower bound corresponds to a maximal amplitude error due to wave interference of 2.5 dB. Depending on the relative phase difference, Du ¼ u 2 À u 1 , between the extracted components, FS-affected I/O functions were further classified into constructive interference (Du ¼ 0 645 ), destructive interference (Du ¼ 180 645 ), and quadrature otherwise. Despite its expected dependence on stimulus level (He and Schmiedt, 1993;Zelle et al., 2015a), the interference type was evaluated at only one pair of primary-tone levels (L 2 ¼ 45 dB SPL) and, consequently, only one type was assigned to an I/O function.

A. DPOAE I/O-functions
The proportion of DPOAE I/O functions with three or more points satisfying the 10-dB SNR criterion (Sec. II D and II E), called here "computable" DPOAE I/O functions, was higher for continuous stimulation than for short-pulse stimulation; namely, 92.1% (302/328) as opposed to 83.5% (274/328) ( Table I). Applying the acceptance criteria based on the parameters r 2 I=O , s I=O , and r EDPT (Sec. II E) derived from the linear regression analysis, the number of I/O functions accepted for auditory-threshold estimation, N a , decreased from 274 to 237 (86.5%) in the case of short-pulse DPOAEs and from 302 to 238 (78.8%) for continuous DPOAEs; that is, these acceptance criteria resulted in a greater proportion of the continuous DPOAEs being rejected. However, incorporating the HLC  Table I also shows the median values of the evaluation  parameters for the accepted I/O functions, denoted asr 2 I=O , r EDPT , ands I=O , for both acquisition paradigms with and without the HLC algorithm. A two-sided Wilcoxon rank sum test was applied to the pooled evaluation parameters for all frequencies and subjects to identify differences between stimulus paradigms and variations due to the HLC algorithm. The HLC algorithm yielded small but significant improvements in r 2 I=O and r EDPT for both acquisition paradigms, with p < 0.0001 forr 2 I=O and p < 0.01 forr EDPT . The slope parameter, s I=O , was not changed significantly by the HLC algorithm (continuous DPOAE: p ¼ 0.85; short-pulse DPOAE: p ¼ 0.51). None of the evaluation parameters exhibited significant differences between acquisition paradigms for the unmodified I/O functions (r 2 I=O : p ¼ 0.12;r EDPT : p ¼ 0.74; s I=O : p ¼ 0.30), nor when corrected for high-level deviations from the expected straight-line behavior (r 2 I=O : p ¼ 0.14; r EDPT : p ¼ 0.74;s I=O : p ¼ 0.13).

B. Interference effects
According to PBF decomposition of the short-pulse DPOAE responses at a level L 2 ! 45 dB SPL (Sec. II F), 46.4% (127/274) of the computable I/O functions exhibited a coherent-reflection component, p 2 ðtÞ, with an amplitude, P 2 , greater than or equal to 25% of the amplitude,P 1 , of the nonlinear-distortion component, p 1 ðtÞ [ Fig. 4(A)]. The associated I/O-functions were rated as FS-affected and further grouped into the three underlying interference types in compliance with the relative phase difference between the two DPOAE components. Referring to Fig. 4(B), 62.2% (79/ 127) of the I/O functions exhibited quadrature, i.e., a phase difference close to 90 , while destructive and constructive interference were less frequent with 20.5% (26/127) and 17.3% (21/127), respectively. These proportions slightly differ from the values expected for phase differences uniformly distributed across frequency; namely, 50% for quadrature and 25% each for destructive and constructive interference.  Fig. 4(C)], yields a value of F c (m) ¼ 0.585 for m ¼ 0.5, implying that in 41.5% of the subjects more than half of the computable I/O functions are FS-affected. There was no correlation between the ratiô P 2 =P 1 and L BT (r ¼ 0.00; p ¼ 0.996). The portion of FSaffected I/O functions was similar for normal-hearing (L BT < 20 dB HL) and hearing-impaired thresholds with 46.4% (109/235) and 46.2% (18/39), respectively, suggesting that an interfering coherent-reflection component may also occur in hearing-impaired subjects.
The impact of a pronounced coherent-reflection component on the growth behavior and shape of the I/O functions was quantified with r 2 I=O , using all computable DPOAE I/O functions without applying the HLC algorithm. In the case of FS-affected I/O functions, the median value for the shortpulse DPOAEs,r 2 I=O ¼ 0.96, was significantly larger than that for the continuous DPOAEs,r 2 I=O ¼ 0.94 (one-sided Wilcoxon rank sum test, p ¼ 0.03), with corresponding interquartile ranges (IQR) of 0.06 and 0.12. For the continuous DPOAEs, 38.6% of the I/O functions exhibited r 2 I=O < 0.9, whereas it was only 18.9% for short-pulse DPOAEs. For the non-FSaffected I/O functions,r 2 I=O for the short-pulse DPOAEs (r 2 I=O ¼ 0.96, IQR ¼ 0.05) was not significantly different to that for the continuous DPOAEs (r 2 I=O ¼ 0.96, IQR ¼ 0.08; two-sided Wilcoxon rank sum test, p ¼ 0.396) and the proportion of I/O functions with r 2 I=O < 0.9 was similar (short-pulse DPOAEs: 19.9%; continuous DPOAEs: 24.0%). Figure 5 illustrates I/O functions for various types of interference patterns recorded for continuous (blue dots) and short-pulse (red dots) stimulation for six subjects. EDPTs used for auditory-threshold estimation, defined as the intersection of the linear regression lines (blue and red lines) with the abscissa, are exemplarily indicated in Fig. 5(A) by blue and red arrows. Circles represent DPOAE amplitudes excluded from the computation of the linear regression lines by the HLC algorithm (Sec. II E). Insets show phasor diagrams illustrating the phasors P 1 ¼P 1 e iu 1 (red arrow) and P 2 ¼P 2 e iu 2 (black arrow) associated with the nonlinear-distortion and coherent-reflection components, respectively. AmplitudesP and phases u correspond to the parameters extracted from the short-pulse DPOAE responses at L 2 ¼ 45 dB SPL using PBF decomposition (Sec. II F; supplementary material). 1 The blue arrow is the phasor sum of the two extracted components, P c ¼ P 1 þ P 2 , and represents an estimate of the phasor for DPOAEs measured with continuous stimulation.
The impact of wave interference on the shape of the I/O functions varies according to the underlying type of interference defined by the phase difference, Du ¼ u 2 À u 1 , and the relative phasor amplitudes,P 2 =P 1 . Figures 5(A) and 5(C) show two examples of phase difference close to quadrature. In Fig. 5(A), the contribution of the coherent-reflection component is relatively large withP 2 =P 1 ¼ 0.86, and leads to an increase of the amplitude,P c , of the phasor sum due to the phase difference of Du ¼ À290 ; these relative values explain the shift of the I/O function for continuous DPOAEs toward lower L 2 values. In contrast, the example in Fig. 5(C) illustrates the case of a relatively small coherent-reflection component (P 2 =P 1 ¼ 0.27) with quadrature phase tending to destructive interference (Du ¼ À123 ), which yielded a (small) decrease in the amplitude,P c . In other words, the larger amplitudes observed experimentally in this example for the continuous DPOAEs at low intensities cannot be due to such wave interference; presumably other factors are influencing the response, such as noise or calibration differences between the two measurements. Figure 5(B) shows an example of destructive interference (P 2 =P 1 ¼ 0.46; Du ¼ 146 ) shifting the continuous I/O function toward higher L 2 values, while Fig. 5(D) lacks a significant coherent-reflection component (P 2 =P 1 ¼ 0.10) and both I/O functions nearly superimpose. Figure 5(E) is an example of DPOAE components with similar amplitude (P 2 =P 1 ¼ 0.87), showing a pronounced variation of the interference condition with increasing stimulus level. While quadrature dominates at L 2 ¼ 45 dB SPL (Du ¼ À126 ), constructive interference prevails at low primary-tone levels and destructive interference begins at L 2 ! 60 dB SPL. The I/O functions shown in Figs. 5(C) and 5(F) exhibit distinct deviations from the expected linear relationship. The HLC algorithm reduces the impact of these deformations by excluding DPOAE amplitudes for L 2 values exceeding a threshold level determined by the algorithm. In Fig. 5(C), both short-pulse and continuous DPOAE I/O functions show a notch around 60 dB SPL, whereas only the continuous data differs from the linear relationship in Fig. 5(F). All three "deformed" I/O functions in Figs. 5(C), 5(E), and 5(F) would yield considerably lower EDPT values, if these data points were to be included in the linear regression fit.

C. Relation between behavioral thresholds and EDPTs
For both acquisition paradigms, EDPTs were related to behavioral thresholds (BTs) estimated by the adapted version of B ek esy tracking audiometry (Sec. II C). Figure 6 shows the level of the B ek esy threshold, L BT , as a function of the EDPT level, L EDPT , for the high-level corrected data comprising all subjects and all frequencies for short-pulse [ Fig.  6(A)] and continuous [ Fig. 6(B)] stimuli. For both stimulus paradigms, the BTs show a significant correlation with EDPTs, with the short-pulse data presenting slightly higher squared correlation coefficients (r 2 ¼ 0.64; p < 0.001) than EDPTs based on continuous DPOAEs (r 2 ¼ 0.60; p < 0.001). Regression analysis between L BT and L EDPT reveals a linear relationship, enabling estimated hearing thresholds (EHT), L EHT , to be derived from EDPTs according to FIG. 5. DPOAE I/O functions based on continuous (blue dots) and shortpulse (red dots) stimulation for six subjects. The intersections of the linear regression lines (blue and red lines) with the abscissa define the EDPTs (exemplarily indicated by the blue and red arrows in A). Empty triangles correspond to DPOAEs not complying with the 10-dB SNR criterion (Sec. II D), while empty circles depict data points excluded from the regression analysis by high-level correction (HLC; Sec. II E). Insets show diagrams of the phasors P 1 and P 2 (both rotating at 2pf DP rad/s) of the nonlinear-distortion (red arrows) and coherent-reflection (black arrows) components extracted with PBF decomposition at L 2 ¼ 45 dB SPL (Zelle et al., 2013;Zelle et al., 2015b), as well as the phasor sum P c ¼ P 1 þ P 2 (blue arrow), which provides an estimate of the continuous DPOAE phasor and serves as a comparison with the measured continuous DPOAE. The short-pulse DPOAE signals together with the statistical parameters associated with the PBF decomposition are given in the supplementary material. 1 The phasor amplitudes of P 1 and P 2 in A, B, E, and F indicate pronounced coherent-reflection components capable of altering I/O functions depending on the phase difference between P 1 and P 2 . The coherent-reflection component in A enhances the amplitude of P c , shifting the I/O function toward lower L 2 levels. In contrast, in B, destructive interference shifts the continuous I/O function toward higher L 2 values. In E and F, interference conditions vary considerably with stimulus level yielding an unreasonably flat I/O function in E and considerable deformations in F. Data in C and D do not contain references to pronounced coherent-reflection components. However, C depicts deformations in both I/O functions; these data points (empty circles) were detected by the HLC algorithm as being systematic deviations from the straight-line growth evident at low intensities and were, therefore, excluded from the regression analysis.
The fit parameters a and b are given in Table II for both stimulus paradigms, averaged for each stimulus frequency. All I/O functions were subjected to the HLC algorithm (Sec. II E) before the regression analysis.
The accuracy of the auditory-threshold estimation procedure was assessed using the standard deviation, r DL , of the differences between L EHT and L BT , both for each stimulus frequency and also for all frequencies of the pooled data (Table II). Figure 7 shows the histograms of DL for shortpulse [ Fig. 7(A)] and continuous [ Fig. 7(B)] stimulation. Pooled over all frequencies and subjects, the standard deviation, r DL ¼ 6.52 dB, for the short-pulse data was significantly less than the r DL ¼ 7.60 dB for the continuous data (onesided F-test for variances, p < 0.01). To estimate the impact of two-component interference on the accuracy of L EHT , the data were partitioned into FS-and non-FS-affected I/O functions. Figures 6(C)-6(F) show the scatter plots of L BT as a function of L EDPT for the two groups and both DPOAE paradigms. Comparing the no-FS groups between the two DPOAE paradigms reveals that there is no statistically significant difference in the variance of DL between the short-pulse and the continuous data [two-sided F-test, p ¼ 0.24; Figs. 6(C) and 6(E)]. In contrast, EDPTs from the FS group exhibit a significantly smaller variance if recorded with short-pulse stimuli compared to continuous stimuli (onesided F-test, p ¼ 0.02), which is also evidenced by the lower standard deviation, r DL ¼ 6.61 dB, for short-pulse DPOAEs [ Fig. 6(D)] compared to r DL ¼ 8.04 dB for continuous DPOAEs [ Fig. 6(F)].
The smaller number of accepted EDPTs for auditorythreshold estimation in the case of short-pulse stimulation ( Table I, row labelled N a and columns labelled HLC) results mainly from the lower acceptance rates at f 2 ¼ 1 and 8 kHz of only 65.9% and 46.4%, respectively (Table II). While continuous DPOAEs enhanced the acceptance rate for these frequencies, they did not yield a more accurate threshold estimate, particularly at f 2 ¼ 8 kHz where r DL ¼ 8.82 dB. However, EDPTs based on continuous DPOAEs can be more precisely related to subjective thresholds at f 2 ¼ 2 kHz (r DL ¼ 5.60 dB). Short-pulse EDPTs offered more accurate auditory-threshold estimates for f 2 from 1.5 to 3 kHz and at 6 kHz, where all standard deviations were below 6 dB. The best performance was achieved with short-pulse EDPTs at f 2 ¼ 3 kHz with r DL ¼ 4.93 dB. .7% (33/46), respectively, but declines notably at thresholds above 40 dB HL to 18.9% (7/37) for the continuous data and 5.4% (2/37) for the short-pulse data. Figure 8(B) depicts the histogram of the standard deviations of the BTs computed from the three consecutive recordings for each subject. The median value of the standard deviations wasr BT ¼ 2.37 dB (IQR ¼ 1.25 dB). This relatively small range means that the subjective thresholds used as the basis for determining the accuracy of the objectively derived auditory thresholds is accurate and reproducible.

D. Individual threshold estimation
Exploiting the linear relationship between BTs and EDPTs enables estimation of hearing thresholds using Eq. (7), which provides an indication of the integrity of the biomechanical part of the hearing system. Plotting the estimated hearing threshold (EHT) as function of f 2 yields an objectively measured audiogram for each subject. Figure 9 shows examples for objective audiograms based on continuous (blue line) and short-pulse (red line) DPOAEs for three subjects. For comparison, BTs are shown in black. Shaded areas correspond to r EDPT and r BT , respectively. The accuracy of the individual auditory-threshold estimates was quantified with the standard deviation r DL;ind of the differences between L EHT and L BT across all frequencies for that subject. In general, both stimulus paradigms yielded objective audiograms  (7)] for both stimulus paradigms. DPOAE I/O functions were subjected to high-level correction (HLC; Sec. II E). f 2 : Stimulus frequency and, in the case of DPOAEs, the frequency of the second primary tone. Parameters are pooled across frequency, denoted by the row label "1,…,8," and also partitioned and pooled across frequency according to the absence or presence of fine structure (FS) at L 2 ¼ 45 dB SPL, denoted by the row labels "no FS" and "FS," respectively. N a : Number of accepted EDPTs (Sec. II E, and Table I). r 2 : Squared correlation coefficient [Eq. (7)]. r DL : Standard deviation of the differences between L EHT [Eq. (7)] and L BT . a and b: Slope and constant parameters of the linear regression line [Eq. (7)]. All correlations were significant (p < 0.001), except for short-pulse EDPTs at f 2 ¼ 1 kHz (p ¼ 0.07) and 8 kHz (p ¼ 0.22). For both stimulus paradigms, at f 2 ¼ 1 and 8 kHz, the slope values are those from the pooled data (row label "1,…,8") because of limited dynamic range in the DPOAE I/O function at these two frequencies.
Short-pulse DPOAE Continuous DPOAE  7. Histograms of the difference DL between L EHT given by Eq. (7) and L BT , for short-pulse (A) and continuous (B) acquisition. The data are normally distributed with zero mean and standard deviations of 6.52 dB (onesample Kolmogorov-Smirnov test, p ¼ 0.82) and 7.60 dB (p ¼ 0.38) for short-pulse and continuous stimulation, respectively. The variance of the differences for the short-pulse data is significantly lower than that of the continuous data (one-sided F-test, p < 0.01). matching the subjective threshold closely, with the shortpulse paradigm producing significantly smaller mean individual estimation errors, r DL;ind ¼ 5.44 6 2.16 dB, than the continuous paradigm, 6.38 6 2.57 dB (one-sided t-test, p ¼ 0.006). Despite using the HLC algorithm, continuous DPOAEs remained prone to large deviations due to twocomponent interference-they result in maximum deviations, DL max , between subjective and estimated thresholds of up to 25.0 dB [cf. Fig. 9(B)], whereas I/O functions based on short-pulse DPOAEs yielded maximum errors not larger than 18.4 dB. On average, short-pulse EDPTs yielded DL max in the objective audiograms of 10.39 6 3.34 dB, which is slightly but significantly less than 12.20 6 5.13 dB obtained using the continuous EDPTs (one-sided t-test, p ¼ 0.003).

IV. DISCUSSION
DPOAE I/O functions based on the extracted nonlineardistortion components enable the estimation of auditory thresholds with high accuracy and, therefore, offer a promising approach for objectively assessing hearing status. Section IV A assesses the efficacy of short-pulse stimuli for separating the two DPOAE components and is followed by a discussion (Sec. IV B) of error sources for the regression analysis resulting from systematic deviation from a straightline semi-logarithmic DPOAE I/O function. The next two sections compare the acceptance rate of the DPOAE I/O functions for the purpose of EDPT estimation (Sec. IV C) and the accuracy of the auditory-threshold estimate (Sec. IV D) with previously published results. Section IV E discusses the accuracy of EDPTs for assessing hearing status. The concluding section (Sec. IV F) discusses implications of the current findings for employing DPOAE I/O functions as a clinical tool.

A. Separation of DPOAE components
Short-pulse stimulation enabled the separation of the two DPOAE components by means of onset decomposition (OD). The fidelity of the separation can be directly assessed in DPOAE responses with destructive interference, where both components become readily distinguishable in the time signal [ Fig. 3(A)] and in the instantaneous phase (supplementary material). 1 However, for other interference conditions, such as quadrature or constructive interference, the DPOAE components are not always easily distinguishable. In such cases, comparison with other methods allows assessment of the quality of the algorithms presented here.
Vete sn ık et al. (2009) acquired high-resolution DPgrams to compare OD with the time-windowing technique by Kalluri and Shera (2001) and showed that OD successfully reduced DPOAE fine structure in a frequency range of f 2 ¼ 1.5,…, 2.5 kHz. That study employed a pre-defined sampling instant between 8 to 10 ms relative to the f 2 onset. However, the optimal sampling instant for OD was found to decrease with increasing stimulus level. This finding is in accordance with other studies showing that latencies of DPOAEs vary considerably with stimulus frequency and level (Stover et al., 1996;Zelle et al., 2015b). Recently, the OD technique was extended to frequencies of f 2 ¼ 1,…, 8 kHz, to extract the nonlinear-distortion component from short-pulse DPOAEs using pre-defined, frequency-specific sampling instants (Zelle et al., 2015a). That algorithm yielded a considerably smoother dependence of DPOAE amplitude on stimulus levels L 1 and L 2 compared to data from continuous stimulation. This result indicated successful extraction of the amplitude of the nonlinear-distortion component by OD. Alternatively, the time course of the underlying DPOAE components can be visualized with pulse basis functions (PBFs) in the time domain by fitting the DPOAE short-pulse response to a mathematical model that mimics the superposition of the components (Zelle et al., 2013). This technique has the advantage that both the amplitudes and the phases of each component can be extracted.
The modified OD approach used in the present study, in which the onset of the DPOAE signal was detected objectively by an automated algorithm (Sec. II D), was additionally compared to extraction by PBF decomposition for short- FIG. 9. Individual auditory-threshold estimation for three subjects utilizing estimated hearing thresholds (EHTs) computed from EDPTs according to Eq. (7) for short-pulse (red; pulsed) and continuous (blue; cont.) DPOAEs. Black lines: Behavioral thresholds (BTs). Shaded areas: Standard deviations of EDPTs, r EDPT , derived from the linear regression analysis of the DPOAE I/O function, and of BTs, r BT , derived from the three consecutive B ek esy measurements. In general, EHTs match BTs closely, but the continuous data are prone to large differences between EHTs and BTs (e.g., at 5 kHz in B) due to wave interference. Both stimulus paradigms exhibit large deviations from BTs for f 2 ¼ 1 kHz (A) or 8 kHz (B and C), which is also reflected in the statistics given in Table II. Evaluation parameters (see Sec. III D for definitions) from subjects K001 (A), S004 (B), and S064 (C) are as follows. For short-pulse EHTs: (A) r DL;ind ¼ 4.67 dB, DL max ¼ 11.00 dB; (B) r DL;ind ¼ 5.58 dB, DL max ¼ 13.50 dB; (C) r DL;ind ¼ 6.95 dB, DL max ¼ 11.95 dB. For continuous EHTs: (A) r DL;ind ¼ 7.36 dB, DL max ¼ 11.94 dB; (B) r DL;ind ¼ 9.26 dB, DL max ¼ 19.12 dB; (C) r DL;ind ¼ 11.26 dB, DL max ¼ 25.32 dB.
pulse stimuli with f 2 ¼ 1,…, 4 kHz in six subjects (data not shown). Both methods provide a generally reliable extraction of the nonlinear-distortion component, as supported by the almost complete removal of fine structure. OD slightly underestimated the amplitude of the nonlinear-distortion component because it samples the DPOAE signal prior to its maximum. PBF decomposition resulted in extracted components, which reproduced known properties of the two DPOAE components reported by others (Shera and Guinan, 1999). However, successful decomposition into PBFs requires the absence of additional signals in the recordings which might otherwise hinder separation, e.g., SOAEs or further DPOAE components (Zelle et al., 2015a;their Fig. 5 and Fig. 6). In contrast, component extraction using OD does not depend on extensive assumptions to model the DPOAE signal and, currently, proves to be the more robust technique.

B. Irregularities in DPOAE I/O functions and deviation from linearity
The squared correlation coefficient r 2 I=O between DPOAE amplitudes and L 2 values was used to test I/O functions for the expected straight-line semi-logarithmic relationship. One major cause for deviation from linearity is interference between the DPOAE components (Mauermann and Kollmeier, 2004;Dalhoff et al., 2013) which, in the case of the fine-structure (FS) group, is indicated by the significantly higher r 2 I=O when using short-pulse as opposed to continuous stimulation [Figs. 6(D) and 6(F), respectively]. For the no-FS group, there were no significant differences in r 2 I=O between stimulus paradigms [Figs. 6(C) and 6(E)], whereas for the FS group, the continuous DPOAE data yielded a higher interquartile range of r 2 I=O and a larger number of I/Ofunctions with r 2 I=O < 0.9 as compared to the short-pulse DPOAE data. This observation adds further support to the notion that an interfering coherent-reflection component leads to deformations in a sizable number of I/O functions when using continuous DPOAEs. However, quantification of the interference with the aid of r 2 I=O might underestimate the impact of the coherent-reflection component if its phase remains constant with varying L 2 . For example, Figs. 5(A) and 5(B) exhibit a distinct coherent-reflection component shifting the I/O functions along the abscissa without significantly altering its linear growth behavior. A variation of the interference condition with L 2 , as observed in shifts of minima and maxima in the DPOAE fine structure by others (He and Schmiedt, 1993;Kummer et al., 1998), enlarges the deviation from linearity [Figs. 5(E) and 5(F)].
Nevertheless, even using short-pulse DPOAEs, approximately a fifth of I/O functions in the no-FS group exhibited r 2 I=O < 0.9, indicating other potential sources for deviation from straight-line semi-logarithmic behavior. This observation was most pronounced for f 2 1.5 kHz. At these frequencies, short-pulse DPOAE recordings acquired at high stimulus levels revealed additional short-latency contributions, which became evident as considerably varying instantaneous phases and interference effects during DPOAE onset. These disturbances were similar to waveform complexities described by Martin et al. (2013), putatively indicating distributed DPOAE components generated basally to the f 2 -tonotopic place. For the cubic distortion product at f DP ¼ 2f 1 -f 2 and frequency ratios f 2 /f 1 ¼ 1.2, the basally distributed contributions to the DPOAE signal were shown to exhibit horizontal phase banding implying a wave-fixed source (Martin et al., 2010) and, hence, indicating a similar generation mechanism as for the nonlinear-distortion component. Some I/O functions presented saturating or decreasing DPOAE amplitudes at high stimulus levels, putatively reflecting compressional behavior of the cochlear amplifier or two-tone suppression between the primary tones in the case of L 1 exceeding the optimal level for DPOAE generation (Robles and Ruggero, 2001). Optimized frequencydependent stimulus levels have been shown to yield DPOAE I/O functions with linear growth over a wider intensity range compared to those based on the (frequency-independent) scissor paradigm, as well as larger slopes and less variation across stimulus frequency (Johnson et al., 2006a;Zelle et al., 2015a). However, for an individual subject, deviation from the optimal stimulus-level path defining L 1 values as function of L 2 to evoke maximum DPOAE amplitudes may yield deformations in the linear shape of semi-logarithmic I/ O functions [e.g., Fig. 5(C)]. Furthermore, mathematical analysis (Lukashkin and Russell, 1999) has shown that deformations/irregularities may also be inherent to the nonlinear characteristics of the mechanosensitive channels in the OHC stereocilia which, dependent on the transducer operating point, can produce a notch in the DPOAE I/O function, as found for example in Fig. 5(C). Several methods have been proposed to compensate for deviations of DPOAE amplitude from the expected straightline semi-logarithmic relationship with L 2 : (1) fitting the data with different slopes depending on the DPOAE growth behavior (Goldman et al., 2006;Neely et al., 2009), (2) using a regression line weighted according to SNR and stimulus level (Oswald and Janssen, 2003), or (3) excluding those DPOAE data points with saturation behavior at high stimulus levels . The present study also employed saturation correction, but extended the algorithm of Dalhoff et al. (2013) by not only using the squared correlation coefficient to establish the quality of the linearization process but also the regression-line slope and the standard deviation of the EDPT. Using all three parameters to maximize quality avoids a preference for I/O functions with only a few DPOAE data points at low stimulus levels. The algorithm, called the high-level correction (HLC) algorithm (Sec. II E), is also effective at moderate stimulus levels, where it can reduce the impact of other sources of deformations and irregularities in the I/O functions, such as notches. The HLC algorithm yielded I/O functions with linear growth over a wider intensity range while minimizing the number of neglected data points [Figs. 5(C) and 5(F)].

C. Acceptance rate of DPOAE I/O functions for threshold estimation
The number of DPOAE I/O functions complying with the objective evaluation criteria (Sec. II E), defined originally by Boege and Janssen (2002), relative to the number of computable I/O functions was similar for the two acquisition paradigms; the acceptance rates were 92.7% and 91.4% for short-pulse and continuous stimulation, respectively. These values were considerably higher than the acceptance rates of 68.5% reported by Boege and Janssen (2002) and 67.1% by Gorga et al. (2003) for a similar study design. One explanation for the lower acceptance rates in their studies may be the larger proportion of I/O functions at f 2 1 kHz and the higher number of hearing-impaired subjects than in the present study. Furthermore, for both acquisition paradigms, the HLC algorithm used in the present study appears to be another beneficial factor for the acceptance rate. While the acceptance rate for the corrected continuous DPOAE data was larger than the 84.7% reported by Dalhoff et al. (2013), there is a notable discrepancy between the short-pulse DPOAE data presented here and their pulsed data, namely, none of their I/O functions had to be excluded from the regression analysis after component separation and saturation correction. For comparison with the results of Dalhoff et al. (2013), the acceptance rate was re-evaluated for a subset of the present data by including only I/O functions at frequencies 1.5 f 2 3 kHz and only from the normal-hearing population (i.e., L BT < 20 dB HL). For this subset, the acceptance rate for I/O functions recorded with short-pulse stimulation increases to 97.8% (90/92), which is close to the results of Dalhoff et al. (2013). Since SNR represents the major limiting factor for short-pulse data, the acceptance rate cannot be improved extensively by the HLC algorithm, in contrast to the continuous data.

D. Relation between EDPTs and behavioral thresholds
Both DPOAE acquisition paradigms yielded EDPTs, which allowed the prediction of behavioral thresholds in a clinically relevant frequency range from f 2 ¼ 1 to 8 kHz with hitherto unreported accuracy of r DL ¼ 6.52 dB and r DL ¼ 7.60 dB, respectively, for short-pulse and continuous stimulation (Sec. III C; Table II). These values are notably smaller than those reported in previous studies utilizing continuous primary tones and stimulus levels based on the scissor paradigm. Boege and Janssen (2002) reported a value of 10.9 dB for a study population including normal-hearing and hearing-impaired ears, which was reproduced by Gorga et al. (2003) with 10.1 dB and Oswald and Janssen (2003) with 11.2 dB. Several reasons might be responsible for this poorer accuracy compared to the data presented here. First, contrary to the present work, in previous studies the DPOAE amplitude was estimated in the frequency domain from continuous recordings, which yielded amplitudes representing a superposition of the nonlinear-distortion and coherentreflection components. When relating EDPTs from I/O functions to BTs associated with f 2 , the coherent-reflection component may induce errors in the threshold estimate. Therefore, extracting the nonlinear-distortion component from short-pulse DPOAE recordings, as was done in this study using OD, can be one reason for the increased accuracy. This suggestion is supported by the significantly smaller value of r DL ¼ 6.52 dB for short-pulse data compared to the continuous data, both in the overall dataset DPOAE data [ Fig. 6(A) and Table II]. In absence of additional information, it is simply assumed that half of this variance derives from the IHC-neural source, i.e., r IHCþ ¼ 4.37 dB. Then, an estimate of the diagnostic accuracy of cochlear amplifier function based on short-pulse DPOAE data is r spDP ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi r 2 DL À r 2 IHCþ À r 2 BT q ¼ 4.64 dB.
Since the continuous DPOAEs were recorded within the same subjects, applying r IHCþ ¼ 4.37 dB to the continuous DPOAE data [r DL ¼ 7.60 dB; Fig. 6(B) and Table II] yields r cDP ¼ 6.07 dB for the diagnostic accuracy of cochlear amplifier function using continuous stimulation. The essential step in this analysis is the (arbitrary) partitioning of the variances ðr CAEDPT Þ 2 and ðr IHCþ Þ 2 . However, the estimate of diagnostic accuracy does not critically depend on this step because ðr BT Þ 2 and ðr EDPT Þ 2 are relatively small compared with ðr DL Þ 2 . In summary, this analysis leads to two conclusions. First, for short-pulse DPOAEs, the error associated with diagnosing the state of the cochlear amplifier with this method has a standard deviation below 5 dB. Second, there is no evidence for an increase in the variance of the data with increasing hearing loss (cf. Fig. 6). Thus, for the interindividual variations in IHC and neural pathway functions, as given by r IHCþ , a value below 5 dB appears to be a reasonable estimate, at least for the range of hearing thresholds investigated here. Nevertheless, a deviation from the reported variance might occur in studies with a larger portion of hearing-impaired subjects.

F. Implications for clinical applications
The present data suggest that for a population with normal hearing or mild-to-moderate hearing loss, short-pulse DPOAE I/O functions enable accurate estimation of behavioral thresholds, not only for the pooled data but also for individual subjects. In the case of continuous stimulation, interference of the DPOAE components leads to significantly larger errors in the individual objective audiogram. The present work provides only a limited statement about the measurement time of short-pulse acquisition necessary in clinical routine, because identical stimulus levels were used for both the normal-hearing and the hearing-impaired group. Furthermore, the 10-dB SNR criterion for the short-pulse multi-frequency acquisition was restricted to the DPOAE with the lowest SNR within each multi-frequency acquisition sequence, causing averaging times for the remaining DPOAEs in the same sequence to be longer than necessary. On average, the measurement time to obtain threshold estimates for all eight frequencies was 16.45 6 1.65 min and 6.85 6 2.76 min per subject for short-pulse and continuous stimulation, respectively. These measurement times include the acquisition of eleven DPOAEs per I/O function. In the case of normal-hearing subjects, this procedure leads to oversampling of the I/O function, while for hearing-impaired subjects a large number of the L 2 levels cannot evoke a DPOAE with suitable SNR. By extending the acquisition software to enable the selection of stimulus levels adaptively according to the SNR of the acquired DPOAEs, it should be possible to reduce the acquisition time to well below 5 min for short-pulse stimulation, feasible for daily clinical routine.

V. CONCLUSIONS
Both DPOAE acquisition paradigms, incorporating either short-pulse stimuli or continuous primary tones, yield estimates of behavioral thresholds with high accuracy, supporting the use of frequency-specific stimulus levels and the high-level correction of semi-logarithmic I/O functions for deviations from the expected linear shape. Onset decomposition successfully extracts the nonlinear-distortion component from short-pulse DPOAE recordings. Utilizing I/O functions solely based on the extracted nonlinear-distortion components significantly improves auditory-threshold estimation for normal-hearing subjects and patients with mild-to-moderate hearing loss induced by an impaired cochlear amplifier. The high correlation of the EDPTs with behavioral thresholds demonstrates that individual audiograms representing the state of the hearing path up to the IHC stereocilia can be acquired with high reliability.