Head orientation benefit to speech intelligibility in noise for cochlear implant users and in realistic listening conditions

Cochlear implant (CI) users suffer from elevated speech-reception thresholds and may rely on lip reading. Traditional measures of spatial release from masking quantify speech-reception-threshold improvement with azimuthal separation of target speaker and interferers and with the listener facing the target speaker. Substantial beneﬁts of orienting the head away from the target speaker were predicted by a model of spatial release from masking. Audio-only and audio-visual speech-reception thresholds in normal-hearing (NH) listeners and bilateral and unilateral CI users conﬁrmed model predictions of this head-orientation beneﬁt. The beneﬁt ranged 2–5 dB for a modest 30 (cid:2) orientation that did not affect the lip-reading beneﬁt. NH listeners’ and CI users’ lip-reading beneﬁt measured 3 and 5 dB, respectively. A head-orientation beneﬁt of (cid:3) 2 dB was also both predicted and observed in NH listeners in realistic simulations of a restaurant listening environment. Exploiting the beneﬁt of head orientation is thus a robust hearing tactic that would beneﬁt both NH listeners and CI users in noisy listening conditions.

G r a n g e , Ja c q u e s A. ORCID: h t t p s ://o r ci d.o r g/ 0 0 0 0-0 0 0 1-5 1 9 7-2 4 9X a n d C ullin g, Joh n F. ORCID Thi s v e r sio n is b ei n g m a d e a v ail a bl e in a c c o r d a n c e wit h p u blis h e r p olici e s. S e e h t t p://o r c a . cf. a c. u k/ p olici e s. h t ml fo r u s a g e p olici e s. Co py ri g h t a n d m o r al ri g h t s fo r p u blic a tio n s m a d e a v ail a bl e in ORCA a r e r e t ai n e d by t h e c o py ri g h t h ol d e r s .

I. INTRODUCTION
Difficulty understanding speech in background noise affects everyone from time to time, but is a particular problem for hearing-impaired listeners. Speech intelligibility is powerfully affected by the speech-to-noise ratio (SNR); just a few decibels can separate perfect comprehension from complete incomprehension. Speech intelligibility in noise can consequently be measured with some precision using a speech reception threshold (SRT), defined as the SNR at which 50% intelligibility is achieved. Hearing impaired listeners often have SRTs only 4-6 dB higher (worse) than normal-hearing (NH) listeners (Plomp, 1986), but this difference is enough to make speech intelligibility in noise their most significant disability (Kramer et al., 1998). Amplification from hearing aids improves speech intelligibility in quiet, but it does not improve SNR and so makes no difference in noise unless the noise is inaudible (Plomp, 1986). Noise reduction algorithms improve SNR. Although they may reduce listening effort (Desjardins and Doherty, 2014), they provide little improvement in intelligibility for listeners with hearing aids, because the speech signal is distorted by the processing (Loizou and Kim, 2011). Cochlear implant (CI) users have even worse problems, with SRTs 10-20 dB higher than NH listeners . Some noise-reduction algorithms and the use of directional microphones have been shown to provide a benefit for CI users in limited conditions Mauger et al., 2012). Any other method of improving SRTs in noise by just a few decibels would provide significant benefits to all listeners, but particularly for users of auditory prostheses.
When speech and noise are spatially separated, there is an improvement in SRT called spatial release from masking (SRM). This effect results from a combination of acoustic differences between the stimulus at each ear and processing of these interaural differences by the brain. It is generally assumed that listeners directly face their conversation partner, and it is thought by both researchers and clinicians that this behavior is most natural (Bronkhorst and Plomp, 1990), most frequently encountered (Koehnke and Besing, 1996), or necessary for lip-reading (Plomp, 1986). However, it would clearly be useful to increase the SRM when possible.
We first noted the potential benefits of head orientation using a computer model of SRM in noise and reverberation (Jelfs et al., 2011;Lavandier and Culling, 2010). The Jelfs et al. version of the model is the one used here. The model computes an effective target-to-interferer ratio that is the sum of contributions from two mechanisms. The better-ear path computes the better ear SNR resulting from the headshadow effect. The binaural-unmasking path computes binaural-masking level differences in each channel from the interaural phase differences between target and masker and from the masker interaural coherence. Both contributions are weighted according to an importance function for speech, before being integrated across frequency bands, then summed. Head orientation affects both contributions to the model by changing target-to-interferer ratio at the ears as well as interaural time delays. The model uses binaural-room impulse responses in order to reflect the impact of reverberation, when present. The Jelfs et al. model has been validated against a wide variety of SRT data Jelfs et al., 2011;Lavandier et al., 2012), predicting the level of SRM in different spatial configurations with different numbers of masking noises and in different levels of reverberation. Increased SRM was predicted when listeners faced a location between the speech source and a single interfering noise source. This prediction is intuitive, because the head acts as an acoustic barrier, and the ear on the side of the speech is shielded from the interfering noise by the acoustic shadow of the head. In addition to this head-shadow effect, the ear on the side of the speech is more sensitive to sound coming from 30 to 60 because the head acts as a baffle and the pinnae increases sensitivity toward the front. Appropriate head orientation to place the speech source in this region of personal space may thus improve speech intelligibility. Existing quantitative studies of head orientation behavior in naturalistic settings have not been analyzed in such a way that they would identify a tendency to orient at 30 away from the target speaker (Ching et al., 2009;Ricketts and Galster, 2008). Most research on SRM assumes that the target speaker will be directly in front of the listener (Beutelmann and Brand, 2006;Bronkhorst and Plomp, 1992;Peissig and Kollmeier, 1997;Plomp, 1986). SRTs are rarely measured with the target speaker in any other location.
The selection of target speech and noise positions can have a substantial impact on the magnitude of SRM. For CI users, SRM is almost always tested speech-facing (i.e., the listener facing the target speaker head on) and with a masker at 90 [see reviews in Van Hoesel (2011) and Culling et al. (2012)]. In this configuration and in a sound-treated room, SRM reaches only 3 to 5 dB (e.g., Litovsky et al., 2009). However, three studies have tested CI users in the symmetrical situation where speech and noise sources are placed at equal and opposite azimuths (645 or 660 ) Laske et al., 2009;Laszig et al., 2004). These studies demonstrated that with speech and noise sources separated by 90 or 120 , a head orientated midway between the sound sources could lead to a significant head-shadow benefit of bilateral over unilateral implantation (10 to 18 dB). This benefit was defined as the SRT improvement from the spatial configuration that acoustically penalized the better ear (or CI) to the mirror-imaged configuration which favored it. The maximum head-shadow benefit predicted by the Jelfs et al. model and experimentally confirmed in Culling et al. (2012) is 18 dB for this case.
In a first study focused on the benefit of head orientation to speech intelligibility, Grange and Culling (2016) established a baseline for young NH listeners. In a sound-treated room, we demonstrated that a maximum of 8 dB headorientation benefit (HOB) was predicted and confirmed to occur at a 60 head orientation when speech and noise were placed at 0 and 180 azimuth, respectively. With the noise placed between 150 and 90 , HOB peaked at 4 to 6 dB at head orientations in the 30 to 45 range. In all these configurations, with noise placed in the rear hemifield, most of the available HOB could be obtained at a 30 head orientation.
The first experiment of the present report aims to show that in situations similar to those described in Grange and Culling (2016), CI users too, can obtain a significant HOB. We also aim to demonstrate that HOB can be obtained at a modest, 30 head orientation that does not detrimentally affect lip-reading, such that head orientation and lip-reading provide cumulative benefits. The second experiment, addresses the potential criticism that such effects are limited to artificial laboratory situations. The effect, while more limited in reverberation, was shown to be robust in real-life situations by creating a very realistic simulation of a restaurant with a target talker sat at the same table as the listener and many other voices distributed around the room.  Loizou et al. (2009). For a bilateral CI user, the model output the better-ear target-to-interferer ratio, assuming equal effectiveness of CIs for speech intelligibility in noise. For a unilateral CI user, the model output the target-to-interferer ratio at their only CI (assuming negligible hearing in the contralateral ear). Here, the Jelfs et al. model was used as per Culling et al. (2012), with the exception that we used as model input binaural room impulse responses acquired with a head-and-torso simulator in the test environment. Culling et al. (2012) argued that the position of a microphone on a processor has a very modest impact on SRM. Incorporating in the model unequal effectiveness of CIs was also found to be unnecessary since it only marginally changed the high correlation between CI data from previous reports and corresponding model predictions. Given the above, no modification of the model was deemed necessary.

Selection of spatial configurations
Four spatial configurations were selected: target and masker collocated and in front (T 0 M 0 ) served as a reference for SRM data computation; target in front and masker at the rear (T 0 M 180 ) was predicted to provide the maximum attainable HOB; target in front and masker at the side contralateral to the better ear (T 0 M 90 ) or on its ipsilateral side (T 0 M 270 ) were selected because these two configurations were utilized in most prior studies, as discussed in Culling et al. (2012).
The three spatially separated configurations are illustrated within each panel of Fig. 1. Jelfs et al. model predictions for SRM as a function of head orientation away from the target speaker are shown in the panels of Fig. 1, as derived from binaural room impulse responses acquired in the test environment. These predictions illustrate the benefit of head orientation in each separated spatial configuration for NH listeners and for bilateral (BCI) and unilateral (UCI) CI users, when the left ear (or CI) is the better ear. Arrows highlight SRM for the favorable 30 head orientation at which, according to the model, a large proportion of SRM can be obtained. Where shown, the difference between BCI and NH predictions corresponds to the binaural unmasking contribution to SRM, assumed to be only available to NH listeners; the difference between UCI and BCI predictions corresponds to the predicted benefit of bilateral, over unilateral implantation (see Culling et al., 2012, for in-depth discussion).
In this experiment, the listener either faced the target speaker or faced 30 away (typically favoring the better ear when sources were separated). A modest 30 head orientation was expected to provide a substantial HOB without detrimental impact on the lip-reading. All plots in the results section are transformed to present the left ear as the better ear for speech intelligibility in noise. When the better ear was the right ear, the data were mirrored about the median plane. NH listeners were tested assuming an arbitrary better ear (balanced across participants). Each BCI user's better performing CI in noise was established by comparison of SRTs obtained with speech in front and noise either to the right or to the left in initial practice runs. All CI users were tested in conditions favoring their better or only ear/CI. For UCI users, SRM was additionally measured with the masker at the side ipsilateral to their CI (T 0 M 270 ). Indeed, even in this worst-case scenario, UCI users were predicted to obtain a large HOB from a modest 30 head turn away from the speech direction.

Participants
Ten young NH (NH y ) participants, self-reported as normal hearing and aged 18-22 years (mean age 20 years), were recruited from the Cardiff University undergraduate population (through the School of Psychology's Experimental Management System).
Eight BCI-and nine UCI-user volunteers were recruited from England and Wales through the National CI User Association (NCIUA) and the Cochlear Implant User Group 2004 (Yahoo! CIUG-2004). Table I details the specifics of our CI participants. All but one BCI user (B1) had had their last implant fitted at least a year prior to testing and had sequential implantation with the second implant fitted between 2 and 12 years after the first. Participant B1 was simultaneously implanted and had the implants switched on 3 months before testing. All UCI participants had had their implant fitted at least 3 years before testing. All CI users but one (U9) had hardware and software settings such that no microphone directionality was used during testing. Participant U9 used the Esprit 3 G processor from Cochlear. This participant's data will be treated separately as an illustration of the effect of microphone directionality on HOB.
An additional ten NH listeners were recruited from the local Cardiff population, age-matched to the CI users within 65 years. All had normal hearing for their age, as confirmed via pure-tone audiometry screening (<20 dB hearing level from 500 Hz to 4 kHz). From the ten age-matched NH (NH am ) listeners, a subset was age-matched to each CI user group within 0.5 years on average.
All participants were briefed verbally and in writing prior to signing a consent form. All testing and forms were approved by the Ethics Committee of the Cardiff University School of Psychology.

Laboratory setup
Two sound-treated rooms were employed, one in Cardiff University (3.2 m Â 4.3 m, 2.6 m ceiling height) and one at University College London (2.7 m Â 4.3 m, 2.2 m ceiling height). Four Minx-10 speakers (Cambridge Audio, London, United Kingdom) fitted 1.3 m above the floor were arranged at cardinal points, at a distance of 1.5 m (Cardiff) and 1.3 m (UCL) from the center of the listener's head. The cross they formed was aligned with the walls and offset to one end of the room such that the rear and side speakers FIG. 1. Jelfs et al. (2011) model predictions, from binaural-room-impulse-response acquired in the sound-treated Cardiff room, of spatial release from masking as a function of head orientation away from the target for normal-hearing listeners (NH, solid black line), bilateral (BCI, solid grey line) and unilateral (UCI, dashed black line) CI users at the three separated spatial configurations: target in front and masker at the rear (T 0 M 180 , center panel), target in front and masker on the side favoring the better ear (T 0 M 90 , right panel) and target in front and masker on the side ipsilateral to a UCI user's CI (T 0 M 270 , left panel). All graphs assume the better ear to be the left ear and the arrows point to the prediction for a favorable 30 head orientation.
were equidistant from the nearest walls and the cross was as remote from the access door as practicable. Each channel of the audio chain was judged to be sufficiently consistent for our purposes in level and spectral response via acquisition of impulse responses and comparison of corresponding excitation patterns (Moore and Glasberg, 1983). The reverberation time (to 60 dB) of both rooms was measured to be approximately 100 ms from the impulse responses, using the reverse integration technique (Schroeder, 1965). The two rooms were acoustically matched as far as practicable with the use of twelve 30 cm Â 30 cm foam panels placed where side reflections were most likely to occur. The acoustical matching was judged sufficient for our purpose when the Jelfs et al. model predictions in Fig. 1 did not differ by more than 1.2 dB at any point and typically differed by less than 0.5 dB. HOB predictions all differed by less than 0.5 dB. Since all NH listeners and most CI users were tested in the Cardiff room, predictions from binaural room impulse responses obtained in that room were used throughout this report. An adjustable swivel chair was positioned in each room such that regardless of chair rotation, the listener's head was at the center of the loudspeaker array. The experimenter remained in the room at all times, outside of the loudspeaker array and as far as practicable from it. This arrangement was essential to aid interaction with CI users and obtain prompt feedback from them.
The speakers were powered by an Auna sixchannel solid-state amplifier (Chal-Tec, Berlin, Germany) driven by a MAYA44USBþ digital-to-analogue converter (ESI AudioTechnik, Leonberg, Germany) connected to a laptop computer. All stimuli were controlled by MATLAB (The MathWorks, Natick, MA) custom-designed programs, making use of the Playrec toolbox (Humphrey 2008-2014); For audio-visual presentations, the speech audio and video streams were synchronized by the VLC program (VideoLAN, Paris, France) and presented on a 17-in. video monitor placed immediately below the 0 azimuth loudspeaker.

Stimuli
Two SRT protocols were employed, each requiring its own set of stimuli. The first made use of Speech Perception in Noise (SPIN) sentences (Kalikow et al.,1 9 7 7 ) recorded audio-visually, so that audio and audio-visual SRTs could be measured and compared. The second employed previously used Grange and Culling, 2016)Institute of Electrical and Electronics Engineers (IEEE) sentences from the Harvard corpus (speakers DA and CW) in order to measure more accurate audio SRTs. For the first protocol, a set of 320 high predictability SPIN sentences were audiovisually recorded with an English male speaker (from southeast England). In addition to 200 original SPIN sentences and to complete the set required, 120 new sentences were generated, following the rules established by Kalikow et al. (1977). In high-predictability SPIN sentences, the target word is the last word, which is rendered easier to identify by the contextual information that previous words provide. The redundancy of these SPIN sentences was expected to assist CI users and help reduce the standard deviation of SNRs used in the SRT computation. The audio-visual recordings were such that the speaker's face covered two thirds of the video monitor height, delivering a near life-size face. The speaker faced the camera at all times, with his face well lit, for lip-reading purposes. The audio-visual files were batch-processed with FFmpeg (Bellard, 2013) to separate audio and video streams and enable adaptive alteration of sound levels. For the second SRT protocol, a set of 360 IEEE sentences was employed.
All audio files were equalized for root-mean-square power computed over the 3-4 s recordings. The voice associated with each test was utilized to synthesize the masking noise matched in long-term frequency spectrum to that voice. The speech-shaped noise was created using a 512point finite-impulse-response filter that was based on the calculated excitation pattern of the speech material (Moore and Glasberg, 1983).

Audio and audio-visual SRT protocol
Changes were made to our "standard" adaptive threshold method described in Culling et al. (2012) in an effort to better adapt the test to CI users. High predictability SPIN sentences (Kalikow et al., 1977) were used instead of IEEE sentences. Initial SNRs were set to À18 dB and À4 dB for NH listeners and CI users, respectively. For the pre-adaptive phase, the SNR increment for each repetition was þ4 dB. In the event that the listener failed to recognize the target word after 4 presentations, a new sentence was presented at the previous presentation SNR. The new sentence could be repeated a maximum of 3 times (with þ4 dB increments) before being replaced with another sentence (again, with no SNR increment). In fact, none of the listeners required more than two sentences (i.e., more than seven presentations) before recognizing a target word, the trigger required to start the adaptive phase. Once the staircase commenced, SNR was adaptively changed in 62 dB increments, as per the standard protocol. However, each sentence was presented up to three times at increasing SNRs, rather than being renewed at each SNR, until the target word was identified. Repetition of sentences following unsuccessful trials was intended to make more economical use of the relatively small number of audio-visually recorded SPIN sentences. Following Culling et al. (2012), the overall sound level throughout an experiment was maintained at 65 dB A (as measured by a digital sound-level meter): an increase in SNR was achieved by simultaneous increase of target level and decrease of masker level, such that overall stimulus level was fixed and could not become uncomfortable. This new protocol is hereafter referred to as the "SPIN AV protocol." The measurement precision of the SPIN AV protocol was compared to that of the standard protocol (that used ten sentences) as a function of the number of sentences used in an audio-only and collocated-source paradigm. The standard deviation of 40 T 0 M 0 SRT measurements per protocol with four NH y listeners asymptoted with the SPIN AV protocol at the same level (1.9 dB) as the standard protocol when using nine SPIN sentences per run. Nine sentences were therefore used for each SRT-experiment measurement. An SRT offset of À1 dB with the SPIN AV protocol compared to the standard was judged inconsequential, given our interest in SRM (i.e., relative) measures. Because of the large number of conditions and to avoid excessively long testing sessions, only two adaptive tracks were performed per condition.

Audio-only SRT protocol
Given that only two adaptive tracks per condition in the SPIN AV protocol might give rise to substantial data variability, an additional, audio-only protocol was developed that would enable five or six SRT measurements per condition, thereby leading to more accurate SRM measures. The audioonly protocol made use of IEEE sentences, following Grange and Culling (2016), but used the same sentencesubstitution regime as the SPIN AV protocol. The requirement for triggering the adaptive phase was also relaxed from the recognition of at least two, to the recognition of at least one of the five key words. The remaining sentences in the list of ten were presented only once following the standard protocol adaptive phase. Here too, the overall sound level was maintained at 65 dB A. This audio-only protocol is hereafter referred to as the "IEEE A protocol."

Testing sessions and condition rotation
A first session of SRT measurements employed the SPIN AV protocol. The five selected configurations were H 0 M 0 ,H 0 M 180 ,H 30 M 180 ,H 0 M 90 , and H 30 M 90 , where the subscripts denote the head (H) and masker (M) azimuths compared to the target speech. Audio and audio-visual SRTs were measured in separate blocks, each comprised of five spatial configurations. Half of the participants began with an audio-only block, the other half with an audio-visual block, and the sequence of spatial configurations was rotated. The order of the sentence lists remained constant for all participants. Two adaptive tracks were performed and SRTs subsequently averaged between runs.
A second session of SRT measurement in the same five spatial configurations later employed the IEEE A protocol. UCI users were also tested in the H 0 M 270 and H 30 M 270 configurations, so that we could explore the potential benefit of head orientation in a spatial configuration that is most detrimental to unilaterally deaf patients. Indeed, placing the masker on the same side as their CI was predicted to lead to negative SRM, if they remained facing the speech. BCI users were also tested in the H 0 M 0 ,H 0 M 90 , and H 30 M 180 configurations with each of their implants disabled in turn, which would later enable computation of summation and squelch in these configurations. For NH listeners and UCI users, these configurations were rotated within a block of five and seven configurations, respectively, and the blocks repeated six times. For the BCI users, the monaural conditions were run between binaural blocks and rotated within two dedicated blocks (right, then left CI disabled). All conditions were repeated five times.

C. Results
In each (separated) spatial configuration, for each participant and making use of SRTs measured with the IEEE A protocol, (1) speech-facing SRM was computed as the speech-facing SRT (condition H 0 M a6 ¼0 )s u b t r a c t e df r o mt h e collocated SRT (condition H 0 M 0 ) and (2) HOB was computed as the 30 head-orientation SRT (condition H 30 M a6 ¼0 )s u btracted from the speech-facing SRT (condition H 0 M a6 ¼0 ). Consequently, the sum of speech-facing SRM and HOB is the SRM resulting from concurrent spatial separation of sound sources and 30 head orientation. As such, speechfacing SRM and HOB can be displayed as cumulative measures. Figure 2 displays speech-facing SRM (lower panels), HOB (middle panels) and their cumulative effect (upper panels) averaged within each listener group for all three separated spatial configurations. The standard error of group means did not exceed 1 dB and averaged 0.65, 0.38, 0.55, and 0.63 dB for NH y and NH am listeners and BCI and UCI users, respectively. The isolated directional microphone case (UCId) had a mean standard error of 1 dB (across five repeat runs). SRM and HOB outcomes are compared below to Jelfs et al. (2011) model predictions computed from binaural-room impulse responses acquired in the Cardiff test room. Any concern relating to young NH listeners not having been specifically screened for hearing loss was alleviated by the standard deviation of audio-only SRTs averaged across spatial configurations being as low as 0.6 dB (1.7 dB range).

Speech-facing SRM
At T 0 M 180 and for all groups, speech-facing SRM was large (1.6-2.6 dB) compared to the 0.5-0.7 dB predicted by the model. At T 0 M 90 , speech-facing SRM measured 3.1-5.1 dB and compared favorably with predictions for all groups (within 0.4-1.4 dB). Speech-facing SRM was increased by 1.5-10 dB with a directional microphone, depending on masker location. At T 0 M 270 , UCI users' speech-facing SRM measured -2.1 dB and was comparable to prediction (-3.2 dB). Analyses of variance (ANOVAs) operated within each listener group on speech-facing SRTs confirmed a significant effect of masker separation [NH y  F(2,18)

Head-orientation benefit
At T 0 M 180 , HOB measured 1.9 to 5.0 dB across groups and was notably smaller than predicted by the model (5.0 to 7.6 dB). At T 0 M 90 , HOB measured 1.5 to 3.9 dB and was comparable to the prediction (4.1 dB), except for BCI users. Overall, BCI users obtained notably less HOB than predicted. At T 0 M 270 , UCI users' HOB measured 3.6 dB and was comparable to the prediction (4.3 dB). Across listener groups and configurations, 30 HOB was confirmed significant by an ANOVA that compared SRM between head orientations [F(1,32) ¼ 338.2, p < 0.001]. HOB was confirmed significant within each listener group by separate ANOVAs [NH y F(1,9) ¼ 146.4; NH am F(1,9) ¼ 141.0; BCI F(1,7) ¼ 18.9; UCI F(1,7) ¼ 129.2, p 0.005 for all groups].

Cumulative effect of masker separation and 30 head orientation on SRM
For NH listeners, adding speech-facing SRM and HOB led to SRM in reasonably good agreement with model predictions at T 0 M 180 (6.4 and 7.6 dB for NH y and NH am listeners, respectively, versus 8.3 dB predicted) and at T 0 M 90 (7.6 and 8.4 dB for NH y and NH am listeners, respectively, versus 10 dB predicted), but older NH adults obtained less SRM than their younger counterparts in both conditions. For UCI users, cumulative SRM was again in good agreement with predictions (1.5, 5.6, and 6.1 dB versus predicted 1.1, 5.5, and 7.6 dB at T 0 M 270 ,T 0 M 180 , and T 0 M 90 , respectively). For BCI users, cumulative SRM was lower than predicted (4.8 FIG. 2. Speech-facing SRM (bottom panels), head-orientation benefit (middle panels) from a beneficial 30 head orientation away from the speech and SRM resulting from the combination of source separation with a 30 head orientation away from the speech, as measured in each of the three separated spatial configurations [T 0 M 270 (left panels), T 0 M 180 (center panels) and T 0 M 90 (right panels)] and for each listener group [young NH adults (NH y ); bilateral and unilateral CI users (BCI and UCI); a single unilateral CI user with directional microphone enabled (UCI d ); NH adults age-matched to the CI users (NH am )]. Speech-facing SRM is the benefit of spatial separation of target and masker, when the listener faces the target speaker. HOB is the additional benefit of a 30 head orientation with the same spatial separation. Consequently, the sum of speechfacing SRM and HOB is the SRM resulting from concurrent spatial separation and head orientation. Error bars denote standard error of crossparticipant means, except for the unilateral CI user with a directional microphone, where error bars denote standard error of within-participant means. and 4.6 dB versus 5.5. and 7.6 dB at T 0 M 180 and T 0 M 90 , respectively), primarily due to their HOB being lower than other listeners'.

The directional microphone case
As can be seen in Fig. 2, speech-facing SRM increased by 10 dB at T 0 M 180 in our directional-microphone UCI user case, compared to the omnidirectional-microphone UCI user group mean. At T 0 M 90 , speech-facing SRM was also increased by nearly 1.5 dB. A significant HOB was found in all configurations; although it was reduced a little compared to that of omnidirectional UCI users.

BCI users' summation and squelch
Summation is defined here as the H 0 M 0 SRT improvement found when activating the worse-performing CI in addition to activating only the best-performing CI. Squelch is defined as the same benefit, but for spatially separated sound sources. Squelch is traditionally measured in the H 0 M 90 configuration, where only the masker signal is subject to interaural level differences. We measured it also in the H 30 M 180 configuration, where both speech and noise signals differ between ears. Summation and squelch outcomes are plotted in Fig. 3, as extracted from SRTs acquired with the IEEE A protocol. An average summation of 2.9 dB (1 dB standard error) was measured while squelch was 2.0 and 2.6 dB (0.5 and 1 dB standard error) at H 0 M 90 and H 30 M 180 , respectively. A within-subject t-test (2-tailed) comparing H 0 M 0 SRTs with both CIs enabled to SRTs with the best CI enabled showed the summation effect to be significant [t (7)

Lip-reading benefit
In each spatial configuration, for each participant and making use of SRTs measured with the SPIN AV protocol, the lip-reading benefit was computed as the audio-visual SRT subtracted from the audio-only SRT. Figure 4 displays lip-reading averaged within each listener group for the five configurations common to all groups (H 0 M 0 ,H 0 M 180 , H 30 M 180 ,H 0 M 90 , and H 30 M 90 ). The benefit of lip-reading measured typically 3 dB for NH listeners and 5 dB for CI users. Across listener groups and spatial configurations, an ANOVA for SRTs in the two presentation modalities confirmed a significant benefit of visual cues [F(1,32) ¼ 368.9, p < 0.001]. An interaction between modality (audio or audio-visual) and listener type indicates that CI users are better lip-readers and/or more dependent on visual cues [F(3,32) ¼ 7.45, p < 0.001]. The lack of interaction between modality and spatial configuration [F(4,128) ¼ 0.56, p ¼ 0.69] indicated that configuration had no impact on lipreading. Most relevant to our study was that a 30 head turn had no detrimental effect on lip-reading within each group [NH y F(1,9) Thus, a sidelong regard, i.e., orienting the gaze to compensate for a modest head orientation away from the target speaker, facilitates a significant benefit of head orientation, additive to that of lip-reading.

III. EXPERIMENT 2
Experiment 1 demonstrated the effectiveness of head orientation in a sound-treated room with a single interfering sound source. It also showed that the benefit of lip-reading is robust to head rotation of at least 30 . In a real listening environment, such as a bar or restaurant, there are likely to be multiple interfering sounds sources and there will certainly be reverberation. The second experiment addresses the question of whether the head-orientation benefit still occurs in such an environment. The approach taken is to simulate, as FIG. 3. Measures of summation in the collocated configuration (H 0 M 0 _SUM label) and squelch in separated configurations (H 0 M 0 _SQ and H 30 M 180 _SQ labels), averaged across bilateral CI users and defined as the benefit of activating the poorer CI in addition to the better CI (the CI that provides the better speech-in-noise intelligibility). Error bars are standard errors of the means. realistically as possible, a restaurant listening situation using a methodology similar to that of Culling (2016). A virtual simulation was created of a real restaurant, and the effect of head orientation in this virtual environment was measured.

Participants
Sixteen young, self-reported NH adults, aged 18-21 years (mean age 20.2 years) were recruited in the same manner as NH y participants of experiment 1 and participated in a 90-min session.

Stimuli and methods
The virtual simulated restaurant was created by convolving dry speech (i.e., without reverberation) with binaural-room impulse responses. The 475-ms impulse responses were recorded in a Cardiff restaurant (Fig. 5) during its closing hours using the tone-sweep method (Farina, 2007;M€ uller and Massarani, 2001). Ten-second exponential tone sweeps were presented from a Minx-10 loudspeaker (Cambridge Audio, London, United Kingdom) to a B&K-4100 head and torso simulator (Br€ uel & Kjaer, Naerum, Denmark). Source and receiver locations were chosen directly opposite each other at each of 18 tables in the restaurant. Impulse responses were recorded between every combination of source and receiver locations. The head of the B&K simulator was also oriented to each of three positions (À30 ,0 ,3 0 ). Thus, a total of 18 source positions Â 18 receiver positions Â 3 head orientations ¼ 972 impulse responses were recorded. A subset of 180 impulse responses were needed in this experiment.
In the simulations, the listener was seated at one of six tables and adopted each of the three head orientations at each table. Target speech was presented from the seat opposite. Nine interfering voices (five female and four male) with British accents, or nine interfering speech-shaped noises were distributed in a randomly selected, but fixed configuration across other tables (see Fig. 5). SRTs were measured with stimuli presented over headphones and using Harvard IEEE sentences standard methods (Culling and Mansell, 2013;Plomp and Mimpen, 1979) except that the interfering sources produced continuous speech or noise. Ten sentences were used to obtain an SRT. The interfering speech was taken from book readings posted on librivox.org. The interfering noises were filtered to match the interfering voices in excitation pattern.
SRTs were measured for 6 listener positions Â 3 head orientations Â 2 interferer types ¼ 36 conditions with 36 lists of ten sentences. Listeners were familiarized with the procedure by two practice runs with a single interfering noise, using spatial configurations different from those used in the experiment. Because of the large number of conditions, each participant received a random sequence of conditions, while the sentences were presented in a fixed order. Figure 6 shows the mean SRTs for each table, head orientation and interferer type (symbols). Also shown are predictions based on the Jelfs et al. (2011) model of speech reception in noise and reverberation (lines). It can be seen that SRTs are highest when the listener directly faces the speech source in the majority of cases. An analysis of variance for SRT, with factors listener table number, head FIG. 5. Plan view of the Mezzaluna restaurant (Cardiff) where impulse responses were acquired from 18 different listener seats and with 18 talker or interferer (opposite) seats. Blackfilled circles highlight the listener positions tested for, light-grey-filled circles the noise or female-voice interferer, dark-grey-filled circles the additional noise or male-voice interferers and the open circles the target male talkers facing listener positions. FIG. 6. SRTs obtained in situations with left (À30 )/front (0 )/right (þ30 ) head orientations (L/F/R labels, on the lower horizontal axis) for each of the listener/talker pairs (at Tables 3, 6 , 9, 12, 14, and 18, labels on the upper horizontal axis) and with speech (black-filled circles) or noise (open circles) interferers. Error bars are standard errors of the means. Black lines represent model predictions with their mean equalized to that of the noise-masker conditions. orientation, and interferer type, confirmed a significant benefit of head orientation [F(2,30) ¼ 23.3, p < 0.001]. From Fig.  6, orienting 30 away from the target source improved speech reception in speech-shaped noise (open symbols) in each listening position, in line with the predictions of the Jelfs et al. model. When interfering speech was used (filled symbols), the picture was a little more mixed, but shows the same average pattern, and the interaction between head orientation and interferer type was not significant. SRTs in speech and noise did not differ significantly. A main effect of table number [F(5,75) ¼ 53.7, p < 0.001] revealed that there are systematic differences between listening positions with some seats in the restaurant allowing lower SRTs than others. Averaging the mean SRTs for speech and noise, a strong correlation between data and predictions [r(1,17) ¼ 0.88, p < 0.001] confirmed that the model also predicts the variations across tables and head orientations accurately.

IV. DISCUSSION
SRTs measured in a sound-treated environment confirmed the predicted benefit to speech intelligibility in noise of a modest (30 ) head orientation away from a talker when a single steady-noise interferer is azimuthally separated from the speech by 180 or 90 . This HOB was significant for normal-hearing listeners (3-5 dB) as well as for UCI users (2.5-5 dB) and BCI users (1.5-2.5 dB). The lip-reading benefit extracted from comparing audio-visual to audio-only outcomes was significant and somewhat larger in CI users (5 dB) than in NH listeners (3 dB). Crucially, lip-reading was not detrimentally affected by a 30 head orientation. The SRT data therefore showed that significant HOB can be exploited by CI users, in addition to the lip-reading that nonblind hearing-impaired listeners rely on. Data from a UCI user that made use of a directional microphone suggest that a directional microphone does not remove this HOB.

A. Speech-facing SRM and HOB
The speech-facing SRMs for NHy listeners (2.6 dB at T 0 M 180 and 4.4 dB at T 0 M 90 ) were in reasonable agreement with those obtained by Plomp (1976), 3.0 and 5.4 dB, respectively. SRM obtained with our CI participants at the typical H 0 M 90 configuration (3-4 dB) falls within the range covered by previous reports and reviewed , although BCI users' SRM is on the low end. The headshadow effect measured from our UCI users (6 dB) also falls in the range covered by previous reports and reviewed by Van Hoesel (2011) and is a very good match to that measured by Culling et al. (2012). Summation and squelch results are compared with the results from Litovsky et al. (2006) in the bilateral-CI-users section below.

Addressing the main discrepancy with model predictions
The T 0 M 180 speech-facing SRM was higher across all listener groups than predicted by the model. Since the prediction was based on acoustic measurements of the sound-treated room itself, the result cannot be explained by modest reverberation in that room. When facing the speech, there is a sharp predicted improvement in SRT for any deviation in correct head orientation. As a result, the measured SRTs should be reduced by any misalignment of the head. In contrast, for other head orientations the predicted SRT changes in different directions with head misalignment, so the SRT measurements are not biased by random misalignments. Misalignment of the head orientation during the SRT runs thus seems the most likely explanation for the high speechfacing SRM in T 0 M 180 (see also Grange and Culling, 2016). The fact that UCI users (the only listeners predicted not to gain HOB by turning either way, see Fig. 1) obtained by far the lowest T 0 M 180 speech-facing SRM (see Fig. 2) reinforces the above interpretation of the data.

Group differences
The measures of SRM in configurations that facilitate binaural unmasking were lower for CI users than for NH listeners, which is consistent with the assumption made that CI users do not benefit from binaural unmasking. Both CI users and NH y also had lower HOB than predicted. If, as argued above, the T 0 M 180 speech-facing SRM was inflated by head misalignment, 1-2 dB of the measured T 0 M 180 speechfacing SRM may in fact have been HOB. This misattribution would account for a deflated measure of T 0 M 180 HOB. However, it does not fully account for the reduced HOB in NH am listeners. These older, NH adults may have suffered from a loss of binaural unmasking, consistent with recent reports of an age-related decline in the binaural processing of temporal envelope and fine structure (King et al., 2014;Moore et al., 2012;Hopkins and Moore, 2011) that reduced their HOB and their overall SRM.
The case of the UCI user who used a directional microphone setting demonstrated how, by suppressing sound waves coming from the rear, the T 0 M 180 speech-facing SRM was increased by over 10 dB for T 0 M 180 . However, the T 0 M 90 and T 0 M 270 speech-facing SRM values were increased by only 1.5 dB. Thus, if the masker were placed in the frontal hemifield, SRM was hardly affected by the sensitivity pattern of a directional microphone. Just as importantly, a significant 30 HOB remained in all three configurations, so microphone directionality does not remove HOB. This result is also predicted by the model, because the diffracting effects of the head alter the directional microphone sensitivity pattern to favor sounds 30 -40 away from the front. Figure 7 illustrates the effect of the head with the speech-weighted directional response of in situ directional microphones. These predictions were based on measurements of head-related impulse responses from the microphones of Oticon behind-the-ear hearing aids, placed on an acoustic manikin. The directional patterns in Fig. 7 represent only an illustrative example rather than the particular fixed directional pattern that would be produced by the Esprit 3 G processor, or the directional pattern that would be produced by the Oticon hearing aid on which it is based. Nonetheless, they capture an asymmetry in the left-and right-ear responses that would be common to any two-port in situ directional microphone which produces a stronger response to sounds from 630 -50 . It should be noted that this "distortion" in the directional pattern is probably a desirable feature for bilaterally implanted patients, because it reflects the fact that interaural level differences are preserved.

Bilateral CI users
BCI users stood out in that their measured HOB was less than half of model predictions. At T 0 M 180 this outcome may again be explained by inaccuracies in head orientations during testing. However, at T 0 M 90 , the HOB shortfall clearly requires another explanation, because the overall SRM sits 3 dB lower than predicted. Additional measures of summation (2.9 dB at H 0 M 0 ) and squelch (2.0 dB at H 0 M 90 and 2.6 dB at H 30 M 180 ) from BCI users were found to be significantly larger than previously reported in the literature. These correspond to the "diotic" and "binaural" benefits reviewed by Van Hoesel (2011). Compared to summation outcomes reported in the Litovsky et al. (2006) multi-center study (the effect they call binaural redundancy), our mean summation seems larger than their 1.5 dB, but their range, À6toþ9 dB, was comparable to ours, À3.5 to þ6.5 dB. Given their much larger sample, and standard errors being large (1 dB) in both studies, the difference is probably not significant. Their measure of squelch matched ours, at 2 dB. Consistently with Litovsky et al. (2006), binaural summation or squelch effect size in BCI users was much smaller than the T 0 M 90 SRM of our BCI users or the T 0 M 90 head-shadow effect of our UCI users.
Assuming BCI users do not benefit from binaural unmasking, both summation and squelch are believed here to be due to the information provided by the two CIs differing in spectral content, in a complementary manner such that spectral summation occurs. Our middle-aged or older BCI users are unlikely to have equal nerve survival along their spiral ganglia, and some CI electrodes may be disabled, so as to prevent, for instance, unintended facial nerve excitation. It is therefore plausible that their two CIs deliver information from complementary spectral regions. The model ignores the SNR at the poorer ear, but the poorer ear could still be relevant to speech intelligibility if it contains such complementary spectral information .
HOB may have been lower in BCI than in UCI users because BCI users already benefit from spectral summation when facing the speech and turning away from the speech might reduce the summation effect. Indeed, spectral summation should be maximum when SNRs at the two ears are similar. Orienting the head so as to bring the better ear closer to the target speech will not only improve the SNR at the better ear as the model predicts, it will also reduce the SNR at the poorer ear, thereby reducing the benefit of providing the speech information from that ear to the brain. Even if summation occurred only as a result of a reduction of internal noise at a central auditory brain level, the same principle would apply. The fact that with an additional CI, BCI users' SRM obtained with a 30 head turn is lower than UCI users' in both spatial configurations (by up to 1.5 dB at H 30 M 90 ) further reinforces the above interpretation of the data. It therefore seems that BCI users' HOB can be reduced by a loss of summation in some spatial configurations.

B. Reliance on lip-reading
A sidelong regard with a head orientation of 30 maintained the benefit of lip-reading at the same level as when directly facing the speaker. A linear regression analysis of lip-reading benefit versus H 0 M 0 audio-only SRTs showed a negative correlation between proficiency of listeners in recognizing speech in noise and the added benefit of visual cues (r ¼ 0.66, t ¼ 4.31, p < 0.001). This correlation is not surprising since an elevation in listeners' audio-only SRT will increase their reliance on lip reading and also can motivate individuals to improve their lip-reading skills (e. g., Strelnikov et al., 2009). Every 6 dB in SRT elevation was partially compensated for by 1 dB improvement in lipreading benefit. Since talkers differ in the ease with which they can be lip-read, the regression slope of data acquired with a different talker could be significantly different to the slope we found. One might expect that the easier the talker is to lip-read, the higher the slope. Thus, for more familiar talkers, lip-reading might go much further toward compensating for the threshold elevation CI users suffer from. Previous studies also showed that the lip-reading benefit is highly dependent on the ease of lip-reading of the sentence material (Macleod and Summerfield, 1987). To date, it has not been established whether stimulus material and talker contributions to the ease of lip-reading are independent or interact.

C. Realistic listening conditions
Experiment 2 examined HOB in realistic listening conditions, and showed that consistent benefits exist in the presence of multiple interferers and reverberation. One might FIG. 7. Sensitivity patterns of in situ directional microphones, generated from a simple broadband delay-and-subtract operation on impulse responses acquired from the two microphones of an Oticon behind-the-ear hearing aid fitted either side of an acoustic manikin. This figure aims to illustrate that a directional pattern is modified by the head-shadow in such a way that sensitivity maxima sit in the 630 À50 regions.
imagine that the effect of such distributed interference would be to suppress any effects based on head-shadow and betterear listening, because both ears would receive roughly the same level of noise. Indeed, Hawley et al. (2004) and  showed that if just two or three nearby interfering sources are located in different hemifields, effects attributable to better-ear listening become negligible. However, SNR depends on the levels of both the speech and the noise. While many of the interfering sound sources in a noisy room are in the reverberant field and consequently reach both ears at a similar sound level, the target speech is usually close by, in the direct field, and reaches the nearer ear at a higher sound level. Here, the benefit of "headshadow" is not a shadowing effect at all, but the amplification of a target wave of near-normal incidence reflecting back on itself after bouncing off the surface of the head. By turning the head, one can place one ear into this amplified part of the target's sound field. This benefit should occur for practically any listening situation and practically any listener, provided the target source is close.
The reader might consider the sidelong-regard posture unnatural or more effortful. Informal feedback from all CI users who participated in the study was that they did not perceive this strategy to be an issue for them or for familiar conversation partners. They actually welcomed it. In addition, it is not uncommon for listeners to instinctively use a sidelong regard in noisy situations. This strategy is common place in loud industrial settings, for instance. The human oculomotor range is limited to a 655 eye-in-head lateral angle (Guitton and Volle, 1987). Although maintaining a lateral angle up to 30 may be more effortful than viewing the speaker's face head-on, we feel that HOB will outweigh the potential extra effort. This expectation remains to be confirmed. D. Importance of our findings to the hearing impaired CI users are known to struggle to understand speech in noisy social settings. Despite all the recent efforts made to restore access to interaural time delays at low frequencies, BCI users exhibit negligible binaural unmasking and pitch cues are limited by the relatively sparse encoding of sound by CIs. As a result, CI users only benefit from head-shadow and lip-reading benefit effects, binaural unmasking being inaccessible (Churchill et al., 2014;Van Hoesel et al., 2008) and discrimination of voice fundamental frequencies very limited (Carroll and Zeng, 2007;Geurts and Wouters, 2004). Dip-listening is also much harder for CI users (Nelson et al., 2003). Given the limited cues available to CI users, any guidance about how to optimally combine head-orientation and lip-reading benefits could be highly valuable to them. Such guidance could make the difference between social isolation and active enjoyment of social interactions. As guidance may benefit interactions with a familiar, easierto-lip-read conversation partner, it is even more critically important for unfamiliar, harder-to-lip-read conversation partners. While the research presented here focusses on CI users, it can equally well serve to help other hearingimpaired listeners, whether partially and/or unilaterally deaf. Since binaural unmasking represents a small part of a NH listener's SRM and hearing-impaired listeners often exhibit a reduction in binaural unmasking, the conclusions drawn from the present studies may transfer to hearing aid users as well as unaided hearing-impaired listeners.

V. CONCLUSION
The presented study has shown that there is a substantial head-orientation benefit available to CI users' speech understanding in noise. In sound-treated rooms, NH listeners obtained a large benefit, which was somewhat reduced by a loss of binaural unmasking in the older NH adults, who were age-matched to our CI user participants. Despite the absence of binaural unmasking in unilateral CI users, their headorientation benefit matched that of young NH listeners (5 dB) with the masker initially at the rear. The benefit was reduced, but still significant with the masker initially to the side contralateral to their CI (2.5 dB). Bilateral CI users exhibited the lowest benefit of head orientation, presumably because they already benefitted from substantial spectral summation. A modest 30 head orientation did not affect the lip-reading benefit measured in NH listeners (3 dB) and CI users (5 dB). Head orientation up to 30 and lip-reading therefore provide cumulative benefits. In normal-hearing listeners, head-orientation benefit of >1 dB was found to be robust in a realistic listening environment with multiple interfering sounds sources (speech-shaped noises or voices) and reverberation. These findings with CI users and NH listeners may extend to other hearing-impaired listeners, so all listeners can enjoy the benefits of the sidelong regard in noisy environments.