The impact of reverberation on speech intelligibility in cochlear implant recipients

Listening to speech in an environment with reverberation can be challenging for both the normal and impaired auditory system. However, it has been shown for both normal- and impaired-hearing listeners that it is the late reflections that are responsible for degrading intelligibility, whereas early reflections actually aid intelligibility by increasing the effective signal-to-noise ratio. Contrastingly, studies conducted with cochlear implant (CI) recipients have suggested that CI recipients have almost no tolerance for reverberation and that they are negatively impacted by both early and late reflections. The main objective of the current study is to re-evaluate the influence of reverberation on speech intelligibility in CI recipients using more authentic virtual auditory environments. Unlike previous studies in this area, this study was conducted using a loudspeaker-based auralization system rather than non-individualized binaural room simulations. Speech intelligibility was measured in simulations of a range of actual physical rooms with plausible source-receiver distances, both with and without late reflections. The results show that the effect of reverberation is much smaller than previously suggested, especially with short source-receiver distances. Furthermore, the results suggest that, in contrast to previous literature, early reflections may not actually be detrimental to CI recipients.


I. INTRODUCTION
One of the most challenging situations for understanding speech is in reverberant, multi-talker environments. This phenomenon is often referred to as the "cocktail party effect" (Cherry, 1953;Bronkhorst, 2000). The specific impact of reverberation on speech intelligibility has been studied extensively in normal-hearing and hearing-impaired listeners (e.g., Duquesnoy and Plomp, 1980;N ab elek and Dagenais, 1986;George et al., 2008;George et al., 2010;Neuman et al., 2010;Schepker et al., 2016) and can, at least to some extent, be predicted by applying the concept of the speech transmission index (Duquesnoy and Plomp, 1980;Houtgast et al., 1980;George et al., 2010;IEC 60268-16, 2011). In particular, normal-hearing listeners can tolerate a substantial amount of reverberation before speech intelligibility is disturbed (Duquesnoy and Plomp, 1980;George et al., 2008). Although hearing-impaired listeners are more affected by reverberation, listeners with mild to moderate hearing losses can still tolerate a considerable amount of reverberation (Duquesnoy and Plomp, 1980;N ab elek and Dagenais, 1986;George et al., 2010;Schepker et al., 2016).
In comparison, only a few studies have investigated the impact of reverberation on speech intelligibility in cochlear implant (CI) recipients. These studies have suggested, by and large, that even mild amounts of reverberation lead to significant reductions in intelligibility for CI recipients.  reported a decrease in mean word recognition scores from 84% correct in an anechoic condition to only 20% in reverberation with a reverberation time of RT ¼ 1.0 s at a source-receiver distance of 1 m.  extended this study by testing a range of reverberation times from RT ¼ 0 s and RT ¼ 1.0 s and similarly found that the mean word recognition scores dropped from 90% in the anechoic condition to 20% in reverberation with RT ¼ 1.0 s. Furthermore, they found that performance decreased exponentially as reverberation times increased, where mean scores dropped to 60% already with RT ¼ 0.3 s. Hazrati and Loizou (2012) subsequently investigated the effect of reverberation with a source-receiver distance of 5.5 m and found that average intelligibility scores decreased from 87% in the anechoic condition to 44% with RT ¼ 0.6 s and 33% with RT ¼ 0.8 s. Additional studies by Hazrati et al. (2013), Desmond et al. (2014), and Hu and Kokkinakis (2014) further suggest that intelligibility strongly decreases with the presence of reverberation. More recently, Hersbach et al. (2015) investigated the effect of the source-receiver distance, in addition to the reverberation time of a) Electronic mail: aakress@elektro.dtu.dk a room, on speech intelligibility and suggested that this distance plays a significant role for CI listeners. However, none of these studies have systematically tested a comprehensive range of acoustic scenes which vary in both reverberation time and source-receiver distance. Given the challenges of comparing across studies, one of the primary goals of the current study is to measure speech intelligibility for CI listeners in a range of plausible acoustic scenarios in order to better characterize the relative impacts of the amount of reverberation inside a room and source-receiver distance.
Sound in reverberant environments consists of the direct sound component, the early reflections, and the late reflections. The direct sound is the sound that travels directly from the source to the listener along a straight line. Early and late reflections, on the other hand, are the sound waves which arrive indirectly from the source after having been reflected off the surfaces in a room. Typically, early reflections are defined as the sound waves which reach the listener within the first 50-80 ms after the direct sound, whereas the late reflections are the sound waves which arrive thereafter. Bradley et al. (2003) showed that both normal-hearing listeners and listeners with a hearing impairment benefit from the presence of early reflections when listening to speech in noisy environments, which is likely due to the fact that early reflections enhance the effective signal-to-noise ratio (SNR) (Arweiler and Buchholz, 2011;Roman and Woodruff, 2011). In contrast, late reflections have been shown to be detrimental to intelligibility for normal-hearing listeners and listeners with a hearing impairment since these listeners are unable to integrate the late reflections with the direct sound component (Boothroyd, 2004;Roman and Woodruff, 2013). Hu and Kokkinakis (2014) investigated the effects of early and late reflections on speech intelligibility in quiet in CI recipients and concluded that early reflections do not enhance intelligibility for CI recipients, and in some conditions, may even be detrimental. They also concluded that late reflections were severely detrimental to CI recipients.
Beyond the categorization of degradation into the effects of early and late reflections, the degradation of reverberation can also be categorized into different types of masking effects: degradation due to self-masking and degradation due to overlap-masking (Bolt and MacDonald, 1949). Selfmasking is the masking of a phoneme by the reverberant energy of the phoneme itself. Overlap-masking is the masking of a phoneme by the reverberant energy from a preceding phoneme. Generally, self-masking is dominated by early reflections, whereas overlap-masking is dominated by late reflections.  investigated the relative contributions of overlap-and self-masking in CI listeners. To do this, they concatenated spectrograms of non-reverberant vowels and reverberant consonants in one condition, which they identified as their overlap-masking condition, and reverberant vowels and non-reverberant consonants in another condition, which they identified as their self-masking condition. With these artificial stimuli, they concluded that self-masking has a more detrimental effect on CI listeners than overlap-masking, since the CI recipients had poorer intelligibility when the vowels were degraded by reverberation. Several other studies however concluded that overlap masking is more detrimental Hazrati and Loizou, 2012;Hazrati et al., 2013;Hu and Kokkinakis, 2014). Most recently, Desmond et al. (2014) found that mitigating either self-or overlap-masking improved intelligibility to a similar degree for CI listeners.
In most of the studies that investigate the effect of room acoustics on CI intelligibility outcomes, room reverberation was realized by convolving anechoic speech material with room impulse responses (RIRs) measured from a loudspeaker to the ears of a manikin inside a small, rectangular room, and stimuli were streamed to the CI via the direct audio input. Thereby, the amount of reverberation was varied by applying different amounts of absorption to the walls of the room. Even though varying the absorption allows one to systematically change reverberation time in a controlled way, it does not accurately reflect the complex changes seen in the reverberation patterns of acoustic scenes in rooms with different shapes and dimensions, as well as with different source and receiver positions. In particular, larger reverberation times are often associated with larger rooms in the real world. Therefore, by keeping the volume the same while increasing the reverberation time, it leads to a much lower direct-to-reverberant ratio (DRR) compared to that which would typically occur in a real room. Moreover, the directivity of the loudspeaker used in the RIR recordings will be different from the directivity of a human talker, which will modify the amount and characteristic of the applied reverberation. Finally, utilizing the direct audio input of a CI for stimulus playback does not provide the natural localization cues that are provided by individual head-related transfer functions (HRTFs). Even though it is unclear how much influence these factors may have had on the outcomes and conclusions of the current body of literature, the current study aims at obtaining a characterization of outcomes in more realistic scenarios while still maintaining the benefits of laboratory testing-namely control and reproducibility (see for example, Favrot and Buchholz, 2010;Rycht arikov a et al., 2011;Mueller et al., 2012;Oreinos and Buchholz, 2015). To accomplish this, acoustic scenes were created to simulate the detailed acoustic properties of three actual physical rooms, including the consideration of factors such as the specific dimensions of the rooms and the furniture within, as well as the directivity of a human talker and the exact placement of the listener in the room relative to the talker. The resulting acoustic models were then translated into multichannel RIRs, and these were convolved with anechoic speech and presented to CI recipients via a threedimensional loudspeaker array inside an anechoic chamber, which notably allowed the participants to utilize their own HRTFs. This approach provides a wide range of reverberant scenes with a high ecological validity, along with control and reproducibility.
The main contributions of this study are to characterize the relative impact of both reverberation and source-receiver distance on speech intelligibility in quiet in CI recipients and to examine these effects in more realistic acoustic environments than in the CI-related literature to date. Furthermore, this study re-investigates the effects of early and late reflections on CI intelligibility, and thereby also provides some clarification on a conflicting body of literature regarding the relative impacts of self-and overlap-masking.

II. METHODS
Three different physical rooms were modeled using ODEON software (Rindel, 2000), and subsequently, virtual acoustic scenes were created inside a three-dimensional loudspeaker array using these models together with the methods provided by the loudspeaker-based room auralization (LoRA) toolbox (Favrot and Buchholz, 2010). Speech intelligibility was then measured with CI listeners within these highly authentic virtual rooms, using reconstructions both with and without late reflections. Recipients wore a real-time CI research platform device (Goorevich and Batty, 2005) connected to a standard behind-the-ear (BTE) shell, which emulated the settings of their own speech processor while ensuring consistent processing across subjects. This allowed recipients to utilize their own HRTFs and to move their head during testing.

A. Participants
Seven CI recipients (three female and four male) using Cochlear Limited (Sydney, Australia) devices participated in this study. Audiologists invited CI recipients to participate in the study if they were at least 18 years of age, were postlingually deafened, were native Australian English speakers, had a CI24RE, CI422, or CI512 implant model, had at least one year of experience with their implant, and could obtain at least 50% on a word recognition task in quiet. Participants gave informed consent, and they were paid a small gratuity for their participation. The treatment of participants was approved by the Australian Hearing Human Research Ethics Committee and conformed in all respects to the Australian government's National Statement on Ethical Conduct in Human Research. Table I outlines the biographical data for the collection of participants. The age of the participants ranged from 40 to 84 yr with an average age of 65 yr. All subjects were tested unilaterally with their preferred ear, which is designated in the table as being either their right (R) or left (L) ear. The cause of hearing loss (HL) is given for the test ear, as well as the duration of implantation and which device the participant used in their everyday life. Implant use on the test ear ranged from 1.7 to 8.5 yr, with an average of 4.8 yr. For the non-test ear, the four frequency average hearing loss (4-FAHL)-i.e., the average of thresholds at 500, 1000, 2000, and 4000 Hzis given for the participants who wore a hearing aid (HA), whereas the non-tested implant is given for the bilateral participants. Treatment of the residual hearing for the bimodal participants is described later in Sec. II C.

B. Stimuli
The speech testing was conducted in a spherical loudspeaker array in an anechoic chamber at the National Acoustic Laboratories. Outside the anechoic chamber, a PC running MATLAB generated the sound. The PC was fitted with an RME HDSPe MADI sound card connected to two RME M-32 D/A converters. The analog output of the converters was amplified by eleven 4-channel Yamaha XM4180 amplifiers. The output of the amplifiers was subsequently fed into the anechoic chamber through an acoustically dampened passage and then connected to each loudspeaker in the array. The loudspeaker array consisted of 41 Tannoy V8 loudspeakers arranged symmetrically in rings on a sphere with a radius of 1.85 m (with 16 loudspeakers in the horizontal plane at 0 elevation with an angular separation of 22.5 , eight loudspeakers at both þ30 and À30 elevation with an angular separation of 45 , four loudspeakers at both þ60 and À60 elevation with an angular separation of 90 , and one loudspeaker at þ90 elevation). Participants were seated on a chair in the center of the loudspeaker array, and the height of the chair was adjusted in order to situate the participants' head in the center of the loudspeaker array.
The acoustics of three different physical rooms were modeled with ODEON software (Rindel, 2000), such that the models mimicked the walls, windows, tables, chairs, etc., within each of the rooms, and each object was given the same individual positioning, orientation, and absorption coefficients as in the real rooms. The first room was modeled after one of the meeting rooms at the National Acoustic Laboratories [ Fig.  1(a)], which has a low amount of reverberation; the second room was modeled after the Dennis Byrne Seminar room at the National Acoustic Laboratories [ Fig. 1(b)], which has a moderate amount of reverberation; and the third room was modeled after an auditorium at the Technical University of Denmark, which is included with the ODEON software (room auditorium21) and has a high amount of reverberation  The sources were placed in natural positions within each of the rooms: sitting at the table in the meeting room, standing at the front in the seminar room, and standing at the podium in the auditorium. These source positions remained fixed throughout. In the meeting room, two different receiver positions were then modeled: one at 1 m as if a person was sitting at the table adjacent to a person talking and one at 3 m as if a person was sitting across the table from a person talking. There were also two receiver positions modeled in the seminar room: one at 1 m as if a person was sitting at a table in the front row of the seminar room and one at 3 m as if a person was sitting at a table in the back row of the seminar room. In the auditorium, there were three receiver positions: one at 1 m as if a person was standing at the chalkboard at the front of the auditorium), one at 3 m as if a person was sitting in the front row of the auditorium, and one at 6 m as if a person was sitting a few rows back from the front in the auditorium. For each sourceÀreceiver pair, realistic talker directivity was included by applying ODEON's directivity file Tlknorm_natural.so8.
Since the direct sound level was fixed at 59.5 dBA throughout the experiment, this resulted in variable sound levels for each acoustic scene, as indicated in Table II. In total, the three different rooms, together with their variable source-receiver distances, and three anechoic controls (i.e., one for each source-receiver distance) gave 10 different listening scenarios.
The acoustic paths in each room between each source and receiver pair, as defined by the RIR, were calculated with the LoRA toolbox (Favrot and Buchholz, 2010) using the reflectograms and decay curves provided by the ODEON models. Specifically, the direct sound and specular early reflections up to the third order were mapped to the nearest loudspeaker in the array by way of the reflectograms. Any late reflections were subsequently added by applying frequency-and direction-dependent decay envelopes to uncorrelated noise. This technique resulted in 41 impulse responses (IRs)-one IR for each loudspeaker in the arrayfor each room and sound-receiver pair. Table II lists the estimated critical distances for each of the reverberant rooms (i.e., the source-receiver distance inside a room at which the reverberant energy is equal to the direct sound energy), as well as the DRR, early-to-late reverberant ratio or clarity (C50), and speech transmission index (STI) for each of the tested scenarios. The DRRs were calculated directly from the RIRs. The critical distances were estimated by applying a linear interpolation on a double logarithmic scale (i.e., DRR in dB versus log-distance) and then finding the distance at which the interpolated DRR crossed 0 dB. All other parameter values were provided by the ODEON software.
In order to evaluate the effect of the early versus late reflections on speech intelligibility, a second set of IRs was generated. This second set contained only the first 50 ms of the IRs from all of the scenarios in the original set which contained reverberation. Taken altogether, this gave a total of 17 conditions: three anechoic, seven reverberant, and seven reverberant that only contained early reflections.
In order to create the stimuli, anechoic speech recordings were convolved with each of the 41-channel IRs. The anechoic speech material was taken from an Australian corpus of sentences, which was designed by the Cooperative Research Centre for Cochlear Implant and Hearing Aid Innovation (CRC HEAR) in a similar manner to the original Bamford-Kowal-Bench (BKB; Bench et al., 1979) sentences (i.e., the "BKB-like" corpus) (Keidser et al., 2013). The corpus is made up of 80 lists of 16 meaningful sentences each. All of the sentences consist of four to six words or six to eight syllables, using vocabulary that is familiar to a five-year-old and that is not specific to any particular region of Australia (e.g., "He locked the car door."). The sentences were recorded by a female Australian English speaker at 44.1 kHz. The root-mean-square (RMS) levels of all individual sentences were equalized. To reduce the impact of the individual variation of each of the 41 loudspeaker sensitivities, equalization filters were designed for each loudspeaker and applied to all stimuli before presentation. The study was conducted using a computer system that emulated a basic version of each participant's sound processor with their own personal fitting. The system consisted of a performance real-time "target" machine from Speedgoat TM (Liebefeld, Switzerland) and a "host" computer. The target machine was responsible for executing the real-time model of a sound processor, whereas the host machine was responsible for programming the target computer using the Mathworks (Natick, MA) Simulink and xPC target framework. The Simulink model mimicked the behavior of the Nucleus V R 5 and 6 systems (Cochlear Limited, New South Wales, Australia) without the directional microphone technology, the automatic scene classifier, the noise and wind reduction technologies, and automatic gain control (Goorevich and Batty, 2005;Mauger et al., 2014). Therefore, the real-time model of the sound processor consisted of a spectral flattening filter, spectral-temporal decomposition, the Advanced Combination Encoder (ACE TM ) stimulation strategy, a loudness growth function, and current level mapping. The hardware required to connect the xPC system to the CI was purpose-built and provided by Cochlear Limited.
In the case that subjects had residual hearing in the other ear (i.e., Participants 4, 6, and 7), the non-test ear was fitted with a deeply inserted foam ear plug. The ear plug provided 30 to 35 dB attenuation at lower frequencies (i.e., below 2 kHz) and up to 45 dB attenuation towards higher frequencies (i.e., above 2 kHz). Taking this attenuation into account together with the degree of hearing loss ensured that the speech signals were inaudible in the non-test ear.
Both the host and target machines were situated inside the anechoic chamber just outside of the loudspeaker array, whereas the stimuli-generating PC remained outside of the anechoic chamber. The researcher was inside the anechoic chamber together with the participant, but sat on a chair outside of the loudspeaker array. The researcher connected remotely via a PC to the stimuli-generating PC outside of the anechoic chamber in order to administer the test. Preceding the main speech test, a short training session was conducted to familiarize the participants with the task. The phrase "The sentence…" was presented before each sentence to cue the participant at the start of each sentence. This phrase was appended to the beginning of each anechoic sentence before the auralization of the stimuli, thereby ensuring that the acoustics of the prompting phrase matched that of the sentence itself. The participants were asked to repeat back what they heard, and the responses were scored by morpheme as they were spoken. For the training, the participants heard sentences from the first BKB-like list. The results obtained from this were discarded. Thereafter, the participants heard sentences from two randomly selected lists per condition. None of the lists were repeated, and each condition was presented once in the first half of the session and once in the second half of the session. The condition order within each half was also randomized. Breaks were given as needed, and the entire session lasted approximately 2 h.

D. Analysis
Intelligibility scores are reported as the percentage of correctly identified morphemes. The scores were computed per list by counting the total number of morphemes correctly identified out of the total number possible. The scores were then averaged across the test-retest lists. Statistical inference was then performed by fitting a linear mixed-effects model to the "rationalized" arcsine transform (Studebaker, 1985a) of the scores. The fixed effects terms of the mixed model were the source-receiver distance, the reverberation time, and the presence of late reflections. Scores for the anechoic conditions were included using a reverberation time of 0 s and a false logical value for the presence of late reflections. The model also included subject-specific and list-specific intercepts (i.e., the participants were treated as a random factor, as is standard in a repeated-measures design, and an additional random factor was included for the test list to account for the fact that, due to a technical error, the first three participants heard the same set of lists, as did the second set of three participants).
The model was implemented in the R software environment using the lme4 library (Bates et al., 2015), which handles balanced and unbalanced data in a unified framework and thereby facilitated analysis of the unbalanced design in the current study. Further, model selection was carried out with the lmerTest library (Kuznetsova et al., 2017), which uses step-wise deletion of model terms with high p-values to perform backward elimination of random-effect terms and then backward elimination of fixed-effect terms (Kuznetsova et al., 2015). The p-values for the fixed effects were calculated from F-tests based on Satterthwaite's approximation of denominator degrees of freedom, and the p-values for the random effects were calculated based on likelihood ratio tests (Kuznetsova et al., 2015).
Post hoc analysis was performed through contrasts of estimated marginal means using the emmeans library (Searle et al., 1980;Lenth, 2018) and the lme4 model object. The p-values were calculated using the Kenward-Roger's degrees-of-freedom method, and a correction for the multiple comparisons was included using the Tukey method. Significant differences are reported using a ¼ 0.05. III. RESULTS Figure 2 shows the mean scores for each subject across all conditions with a sourceÀreceiver distance of (a) 1 m, (b) 3 m, and (c) 6 m. The boxplots depict the distribution across participants, whereas individual participant responses are indicated by the transparent, gray lines. The anechoic conditions (i.e., auralizations with only the direct sound) are depicted with the light gray boxes, the conditions without the late reflections (i.e., auralizations with only the direct sound and early reflections) are depicted with the medium gray boxes and labeled with a "ÀLR" along the bottom axis, and the conditions with the late reflections (i.e., auralizations with the direct sound, as well as the early and late reflections) are depicted with the dark gray boxes and labeled with a "þLR" along the bottom axis.
Group results were modeled using the aforementioned linear mixed effects model. The model showed a significant main effect for the source-receiver distance [F(1, 139.57 Pairwise comparisons were conducted between all three rooms and scenarios, but separately for each source-receiver distance. The comparisons revealed, first and foremost, that at each source-receiver distance, the anechoic condition and all of the scenarios with only the early reflections were not significantly different from each other. Furthermore, the scenarios with the early and late reflections in both the meeting room and the seminar room were not significantly different from the respective anechoic conditions, nor from any of the scenarios with only the early reflections. These two conditions were, however, significantly different from each other Focusing first on the comparison between the anechoic and fully reverberant scenarios (i.e., the conditions that included both the early and late reflections), there was no significant change in intelligibility with the addition of the reverberation in both the meeting room and the seminar room at both of the tested source-receiver distances. With a source-receiver distance of 1 m, all of the participants except one scored at or above 90%, even in the auditorium simulation (RT ¼ 1.7 s). These results are largely in contrast to the results of , who reported that mean scores decreased already to 20% at 1 m with RT ¼ 1.0 s, and of , who reported mean scores dropping from 90% to 60% at 1 m with RT ¼ 0.3 s and furthermore to 20% with RT ¼ 1.0 s. Moreover, with a source-receiver distance of 3 m, speech intelligibility for the participants in the current study still remained relatively high in the meeting and seminar rooms. Speech intelligibility only first started to break down at 3 m with the auditorium simulation, but some of the participants still obtained high intelligibility even in this scenario. Only in the most challenging scenario, when the source-receiver FIG. 2. Boxplots of percent correct scores (averaged across test-retest) with source-receiver distances of (a) 1 m, (b) 3 m, and (c) 6 m in the anechoic room and the simulated meeting room, seminar room, and auditorium-both with late reflections (þLR) and without (ÀLR). Boxplots show the 25th, 50th, and 75th percentiles, outliers are marked with circles, and the whiskers extend to cover all data points not considered outliers. distance was 6 m in the auditorium, was speech intelligibility substantially worse for all participants. These results suggest that at least some CI recipients are, contrary to previous literature, able to tolerate even moderate levels of reverberation.
With regard to the comparison between the scenarios with and without late reflections, scores for the scenarios without the late reflections were typically the same as or better than the respective scores in the anechoic scenarios, even in the auditorium with a source-receiver distance of 6 m. Therefore, after taking into account the effect of the source-receiver distance, the effect of the early reflections appeared to be minimal. These results refute the previous suggestion in Hu and Kokkinakis (2014) that early reflections are detrimental to speech intelligibility for CI recipients. In contrast to the early reflections having minimal impact, the late reflections started to create problems for some of the listeners at 3 and 6 m in the auditorium. This difference supports the notion that, like in normal-hearing listeners and listeners with a hearing impairment (Arweiler et al., 2013), late reflections are more detrimental to intelligibility than early reflections in CI recipients. However, the effect of the late reflections in both the meeting and seminar rooms (i.e., in low to moderate amounts of reverberation) was negligible, and therefore, these results suggest that the detrimental effect of late reflections is actually less severe than previously suggested in Hu and Kokkinakis (2014).
Given that the late reflections were more detrimental than the early reflections, the results indicate that overlap-masking would be more detrimental to intelligibility than self-masking. This is in line with the growing consensus among many of the more recent studies, but of course, in opposition to the recent study by Desmond et al. (2014), wherein all four of the CI recipients that were tested obtained significantly better intelligibility with the removal of either self-masking or overlapmasking. Their finding is especially surprising, however, given that Qazi et al. (2013) found that distortions in the stimulation current levels during speech segments (i.e., the kind of distortions that self-masking can cause) have little to no effect on intelligibility for CI listeners.
Overall, the analysis supports the notion that the sourceÀ receiver distance plays a large role in speech intelligibility outcomes for CI recipients, which is in line with the findings of Hersbach et al. (2015). Moreover, the analysis also supports the notion that, because the presence of early reflections resulted in non-significant changes to intelligibility after adjusting for the effect of the source-receiver distance, it is only the late part of the reverberant reflections that are significantly detrimental. Last, the presence of this detrimental effect of the late reflections is, however, dependent on the fact that the talker and listener are relatively far apart in a room with a high amount of reverberation.

IV. DISCUSSION
The effect of reverberation on speech intelligibility was in general much smaller than expected when compared to the previous body of literature. This discrepancy may be explained by a variety of factors, including differences in the recipients' individual abilities, differences in the sound processing, and differences in the acoustic reproduction methods. In the current study, recipients wore a research platform device that accurately mimicked their own processors while also facilitating full control of which gain manipulation algorithms were active. In many of the previous studies, recipients either wore their own sound processors (e.g., Hu and Kokkinakis, 2014), which inherently allows for less control over the gain manipulation, or they wore a research processor which may not have been matched as closely to their everyday processor (e.g., Hazrati and Loizou, 2012;Hazrati et al., 2013), which inherently increases the potential influence of experience and training. Furthermore, the sound was delivered in majority of these studies via direct audio input (e.g., Hazrati and Loizou, 2012;Hazrati et al., 2013;Desmond et al., 2014;Hu and Kokkinakis, 2014) rather than via the microphones in the speech processor. Although direct audio input is not by itself detrimental, it necessitates the measurement of individual HRTFs in order to obtain accurate auralization. In contrast, input through the microphones, as in the present study, permits recipients to utilize their own HRTFs.
In addition to differences in research equipment, another contributing factor to the differences between the present study and previous studies could be that the majority of the previous studies considered a single laboratory room in which the amount of reverberation was adjusted by varying the absorption of the walls. The reverberation inside such a room can be very different from the reverberation inside rooms that are more commonly encountered in real life, in which the volume of the room typically increases with increasing reverberation time. The potential effect of this manipulation on the resulting reverberation can be best illustrated by considering the critical distance. Following a diffuse-field approximation, the critical distance can be predicted by d % 0:057 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi cV=RT 60 p , with V the volume of the room and c the directivity of the source (Kuttruff, 2009). By considering a constant room volume, as done in the previous studies, any variation in the reverberation time will have resulted in a large change in critical distance. Moreover, since , Hu andKokkinakis (2014) all considered a rather small room with a volume of about 77 m 3 , the critical distance will have already been relatively short. As a consequence, these studies may have considered scenarios that contained an unrealistically high amount of reverberation relative to the direct sound component. This could then explain why they found much stronger effects of reverberation on intelligibility than in the present study.
In addition to reverberation time and room volume, the critical distance is also dependent on the directivity of the source (c). In all of the previous studies, the speech source was a single, small loudspeaker rather than a human talker, and because the loudspeaker will likely have provided a lower directivity (Studebaker, 1985b), the critical distance may have been even further decreased, which again, could have resulted in rather high levels of reverberation relative to the direct sound. Hence, using the microphone input of the real-time CI device together with the realistic room auralization system in the present study made it possible to include more realistic spatial cues, as well as more natural reverberation-including talker directivity, room volume, reverberation time, absorption coefficients, and reflection patterns. The extent of the impact of each of these factors in the current study is indeed unknown; however, it has been shown that these factors are essential for generating auditory environments that provide realistic acoustic scenes to the listener (Favrot and Buchholz, 2010), and therefore, these factors likely account for at least some of the differences in the tolerance to reverberation between the CI recipients in the current study and those in previous studies.
When comparing the intelligibility scores with the room acoustic parameters in Table II, one can see that the intelligibility scores are not directly related to either the DRR or the critical distance. For instance, ceiling intelligibility scores were measured at 3 m distance in the meeting room, which has a DRR of À6.2 dB, whereas a 3 m distance in the auditorium, which has a DRR of À3.8 dB (i.e., less reverberation relative to the direct sound), resulted in a median intelligibility score of approximately 66% correct. In other words, significantly lower intelligibility was measured despite lower relative levels of reverberation, which is counter-intuitive. Furthermore, high intelligibility scores were measured at both 1 and 3 m distances, despite the 3 m distance being outside of the estimated critical distance.
However, the picture changes a bit when considering the C50. At 3 m distance in the meeting room, the early-tolate reverberation ratio was 12.4 dB, whereas at 3 m distance in the auditorium, the ratio was 2.5 dB. Therefore, a larger portion of the reverberation in the auditorium was late reverberation, and even though there was overall less reverberation relative to the direct sound, the fact that a large portion of it was late reverberation meant that it was significantly more detrimental to intelligibility. This pattern holds, at least qualitatively, for all of the conditions. Besides C50, it may additionally be worthwhile to consider whether transmission channel methods such as the STI can further explain CI outcomes. STI may be particularly well-suited for explaining CI outcomes since this metric is based on the modulation transfer function, and it has been shown that the integrity of speech modulations are crucial for intelligibility in CI listeners (Qazi et al., 2013). Transmission systems need to achieve an STI above 0.75 to be labeled as good to excellent for normal-hearing listeners (IEC 60268-16, 2011), and all but two of the listening scenarios in the present study obtained STI values above this cutoff, with the two exceptions being exactly the scenarios in which intelligibility was substantially degraded for many of the listeners (i.e., in the auditorium at 3 and 6 m distance). Interestingly, a non-native, but still experienced, listener requires transmission systems to obtain an STI above 0.86 in order to obtain intelligibility that is equivalent to an STI of 0.75 for their native listener counterparts. Furthermore, transmission systems with an STI between 0.60 and 0.75 fall into the fair to good category for normal-hearing listeners, whereas transmission systems need to achieve a performance of at least 0.68 in order for a non-native (category I) listener to achieve the same level of intelligibility (IEC 60268-16, 2011). Older, hearing-impaired listeners also need adjustments to the STI scale in a similar way as the non-native listeners, but notably, an intelligibility equivalent to an STI of 0.75 for the normal-hearing, native listeners cannot be achieved at all for the older, hearing-impaired listeners (IEC 60268-16, 2011). Since all but maybe one of the CI recipients in the present study obtained intelligibility scores at or near ceiling in the scenarios with high STI values, it seems that an adjustment closer to the adjustment for the nonnative (category I) listeners would be appropriate. Nonetheless, it seems that STI, together with CI-specific adjustments, could be a useful predictor for objectively rating speech intelligibility for CI recipients in reverberant environments, at least to a rough approximation.
It is also worth noting that for some of the subjects, intelligibility was reduced as the distance increased, even in the anechoic conditions. This reduction is likely due to reduced audibility of the signal, which was presented at as low as 43.9 dBA in the auditorium at 6 m. Considering the conclusions from Bradley et al. (2003) that early reflections can help normal-and impaired-hearing listeners in such scenarios by increasing the effective speech level, one might expect that including early reflections increases audibility and thereby also the intelligibility scores for CI listeners as well. There is, in fact, a slight tendency towards higher intelligibility scores at 3 m distance when only the early reflections were included, but since most results were already at ceiling in the anechoic condition, it is difficult to conclude whether the early reflections actually helped. In the future, a more systematic study of the impact of early reflections on CI listening would be useful, especially while also considering the effect of room acoustics in the presence of interfering noise.

V. CONCLUSION
The current study investigated the relative effects of reverberation and source-receiver distance on speech intelligibility in CI listeners in a range of plausible listening scenarios. The results show that CI listeners may be more tolerant to reverberation than has been suggested in literature to date. Furthermore, the results confirm that CI listeners are largely impacted by the source-receiver distance, wherein the listeners maintained good intelligibility even in rooms with very high reverberation times (e.g., as in an auditorium) when the talker was simulated as being 1 m away, but when the talker was simulated as being 3 m away, listeners maintained good intelligibility only in rooms with at most moderate reverberation times (e.g., as in a meeting or seminar room). Last, the results suggest that, like in normal-hearing listeners and listeners with a hearing impairment, early reflections are not, in fact, detrimental for CI recipients. Moreover, the detrimental effect of reverberation in the most challenging reverberant scenarios was attributed solely to the presence of the late reflections, since intelligibility was restored to anechoic levels once the late reflections were removed. Given this observation, it is clear that dereverberation algorithms focusing on the removal of late reflections