Personalized signal-independent beamforming for binaural hearing aids

The effect of personalized microphone array calibration on the performance of hearing aid beamformers under noisy reverberant conditions is studied. The study makes use of a new, publicly available, database containing acoustic transfer function measurements from 29 loudspeakers arranged on a sphere to a pair of behind-the-ear hearing aids in a listening room when worn by 27 males, 14 females and 4 mannequins. Bilateral and binaural beamformers are designed using each participant’s hearing aid head-related impulse responses (HAHRIRs). The performance of these personalized beamformers is compared to that of mismatched beamformers, where the HAHRIR used for the design does not belong to the individual for whom performance is measured. The case where the mismatched HAHRIR is that of a mannequin is of particular interest since it represents current practice in commercially available hearing aids. The benefit of personalized beamforming is assessed using an intrusive binaural speech intelligibility metric and in a matrix speech intelligibility test. For binaural beamforming, both measures demonstrate a statistically significant (p < 0.05) benefit of personalization. The benefit varies substantially between individuals with some predicted to benefit by as much as 1.5 dB. c ©2019 Acoustical Society of America. [http://dx.doi.org(DOI number)]

The effect of personalized microphone array calibration on the performance of hearing aid beamformers under noisy reverberant conditions is studied. The study makes use of a new, publicly available, database containing acoustic transfer function measurements from 29 loudspeakers arranged on a sphere to a pair of behind-the-ear hearing aids in a listening room when worn by 27 males, 14 females and 4 mannequins. Bilateral and binaural beamformers are designed using each participant's hearing aid head-related impulse responses (HAHRIRs). The performance of these personalized beamformers is compared to that of mismatched beamformers, where the HAHRIR used for the design does not belong to the individual for whom performance is measured. The case where the mismatched HAHRIR is that of a mannequin is of particular interest since it represents current practice in commercially available hearing aids. The benefit of personalized beamforming is assessed using an intrusive binaural speech intelligibility metric and in a matrix speech intelligibility test. For binaural beamforming, both measures demonstrate a statistically significant (p < 0.05) benefit of personalization. The benefit varies substantially between individuals with some predicted to benefit by as much as 1.5 dB. c 2019 Acoustical Society of America.

I. INTRODUCTION
Multichannel speech enhancement is important in many applications, including telecommunications, robot audition and hearing aids. Signal-dependent beamformers adapt their filter weights according to the observed signals and so have the potential to be always-optimal according to some specified design criterion. However, errors in estimating the signal and noise statistics, for example due to inaccurate voice activity detection 1 , may lead to degraded performance. A common approach to implementing a signal-dependent beamformer is to use a generalized sidelobe canceller (GSC) comprising a signal independent beamformer, a blocking matrix and an adaptive noise canceller 2 . Signal-dependent beamformers are sensitive to signal cancellation due to steering errors and multipath propagation 3,4 .
Signal-independent beamformers, also known as fixed beamformers, use a priori knowledge or assumptions about the source direction and noise characteristics to determine the filter weights. They are computationally efficient and robust at low signal-to-noise ratios (SNRs) but suboptimal if the sound scene or array characteristics differ from those used in the filter design 5 . a) alastair.h.moore@imperial.ac.uk For example, using a free-field propagation model to describe the acoustic transfer function (ATF) between a source and a head-mounted array degrades performance 6 . The effect of steering errors can be inferred from the shape of the beampattern's main lobe, which in general gets narrower with the number of microphones and their spacing 5 . Differences in array element sensitivity, due for example to manufacturing tolerances and component ageing, have been mitigated through robust offline beamformer design 7 and adaptive approaches 8,9 . A number of studies have demonstrated a benefit in performance when microphone arrays are individually calibrated rather than by using an idealized model of the geometry [10][11][12] . This calibration accounts for any sensitivity or frequency response variations between microphones and also acoustic differences in the scattering effect of the array enclosure.
In the context of binaural hearing aids, it is common to design beamformers assuming an average head, as typified by an acoustic mannequin. To investigate the potential benefit of individual calibration, or personalization, in the context of binaural hearing aids requires an understanding of the variability of the ATF when a particular pair of hearing aids is worn by different people. We refer to this ATF as the hearing aid head-related transfer function (HAHRTF), or equivalently its time domain representation, the hearing aid head-related impulse response (HAHRIR). Most available databases [13][14][15][16][17] of HAHRIR measurements have used either a single mannequin or a single human 17 and so are unsuitable for investigating such differences; a notable exception is a recent database 18 that includes measurements of 16 subjects and 3 dummy heads. Comparing measurements between these databases is also not helpful since each uses a different hearing aid device. Measurements of conventional, two-channel head-related impulse responses (HRIRs) are typically made in the ear-canal or at the blocked entrance to the ear canal. Databases of such measurements, for example CIPIC 19 , LISTEN 20 , ARI 21 , suggest that there are large differences between individuals. These differences arise from differences in the head geometry and direction-dependent resonances of the pinnae and primarily affect localization accuracy in the horizontal and vertical dimensions, respectively. Quanitifying the extent to which the use of personalized or generic HAHRTFs in hearing aid beamforming affects intelligibility for a human listener is important in determining whether commercially available hearing aids could benefit from personalized processing.
In this article we describe a newly collected and publically available database of 46 HAHRIRs measurements which will, for the first time, allow the benefits of individual calibration of hearing aid arrays to be investigated. There are many ways in which inter-individual variability in the measured ATFs could be analyzed. Also, there are many algorithms and signal processing tasks which may benefit from individual array calibration. In this article we focus on signal-independent beamforming because it is widely used in practice 22 .
The remainder of this article is organized as follows. Section II briefly describes the acoustic measurements and post-processing. Section III presents the beamforming problem. Section IV presents the method by which the newly acquired database is used to simulate noisy, reverberant speech signals and to design matched and mismatched beamformers for its enhancement. The performance of the beamformers is evaluated and analyzed using signal-based metrics in Section V. In Section VI the validity of this analysis is confirmed using a headphonebased matrix speech intelligibility test in which 11 participants heard a virtual representation of unprocessed and enhanced sound scenes where all stimuli were created using the participant's own HAHRIRs. Finally, the article is concluded in Section VII.

II. DATABASE OF ACOUSTIC MEASUREMENTS
Acoustic measurements were performed in an acoustically treated listening room of dimensions 7.9 × 6.0 × 3.5 m with a reverberation time of 250 ms. Loudspeakers (Genelec 8050A) were arranged on the surface of a sphere of radius 1.9 m in three equally spaced horizontal rings. Following the AES standard 23 spherical co-ordinates system, loudspeakers were located at elevation, θ = 0 • , azimuth, φ ∈ {0 • , 22.5 • , . . . , 337.5 • } and at θ ∈ ±45 • , φ ∈ {30 • , 90 • , . . . , 330 • }. An additional loudspeaker at θ = 90 • was at a radius of 1.6 m giving a total of 1 + 6 + 16 + 6 = 29 loudspeakers. For brevity of notation, directions are expressed as unit vectors, u, in Cartesian coordinates according to Microphones were embedded in hearing aid shells (Oticon Epoq) with two behind the ear (BTE) microphones spaced 10 mm apart with an additional microphone in the ear canal secured by a generic vented silicone dome, as shown for the left ear in Figure 1. As far as possible, a typical positioning of the hearing aid devices was obtained by allowing participants to insert the inear microphones in the ear canal themselves. However, to avoid snagging of the cables, the BTE devices were placed over the ear by the experimenter, and, since the same transducers were used for all participants, the wire lengths adjoining in-ear microphones to the shells were not customized for each individual. Participants were seated on a chair positioned at the centre of the loudspeaker array and the height of the chair adjusted such that the BTE microphones were in the horizontal plane of the loudspeaker array, thereby ensuring the middle ring of loudspeakers corresponds to an elevation of 0 • .
Microphone signals were amplified using custom preamplifiers. Measurement signal output and acqui-sition was performed with 24-bit precision at 44.1 kHz sample rate using a Ferrofish DANTE A32 interface. A direct loopback connection between an output and input was used to measure the internal delay of the measurement system which was removed in post-processing.
Impulse responses, g m,i (t), between loudspeaker i and microphone m were measured using the exponential sine sweep method 24 using a 370 ms sweep between 50 Hz and 16 kHz. The sweep duration was chosen to give an acceptable compromise between the time taken to make the measurements, and hence the risk of head movements, and the SNR of the impulse responses. For the subset of microphones and directions used in this study (see Section II A), the average SNR of the direct path component was 55 dB. All 29 directions were measured in succession with 370 ms silence between each sweep giving a total measurement time of about 20 s. The measurements were repeated with the participant rotated by −15 • and −30 • so that direct path impulse responses are available with 7.5 • resolution on the horizontal plane and 30 • resolution at elevations ±45 • . For each rotation, the measurements were repeated 3 times to allow for consistency checks. Only one measurement of each direction/rotation combination is retained in the final database. In all, the acoustic measurements took about 10 minutes.
A total of 46 sets of HAHRIRs were made. This consists of 27 males, 14 females and 4 mannequins, of which one, a HATS, was measured twice. The other 3 mannequins were 3D-printed head models of real people mounted on an artificial torso.
While it is well known that head-related transfer functions (HRTFs) have a complicated dependency on the fine structural details of the outer ear, head and torso of an individual 19,25 a partial fit of generic HRTFs can be obtained using gross measurements of head size to control interaural cues 26,27 . Hypothesizing that a similar approach may also be possible for partially-personalized hearing aid beamforming without the need for individual acoustic measurements, the height, depth and width of each participant's head was measured using calipers and the circumference was measured using a tape measure, since these measurements could rapidly be made in an audiology clinic. Table I shows the distribution of head circumferences for the human (i.e. non-mannequin) participants. For comparison, the head circumference of the HATS is 55.9 cm.
Prior to each measurement session calibration measurements were made, without the chair present, from each loudspeaker to a G.R.A.S 46AE reference microphone and amplifier set positioned at the centre of the array. Similar calibration measurements were also made for the left and right hearing aids separately. Compared to the mean sensitivity, microphones varied by ≤0.4 dB. With the exception of the overhead loudspeaker, loudspeaker sensitivities varied by ≤ 0.8 dB and propagation delay from each loudspeaker to the center of the array varied by ≤93 µs.
Since the focus of this study is the acoustic differences between individuals and the signals emitted from each loudspeaker are uncorrelated, the experiments reported in Sections V and VI were performed without any further calibration.

A. Study database
A subset of the raw impulse responses was selected, comprising those between the 16 horizontal plane loudspeakers and the 4 BTE microphones, with the chair in its initial, front-facing, position. The complete measured hearing aid room impulse response (HARIR), g m,i (t), between the i th loudspeaker and the m th microphone may be decomposed into the sum of two components where h m (u, t), the HAHRIR, is the impulse response at time t due to a plane wave with direction of arrival (DOA) u andh m,i includes all the later-arriving room reflections.
For the purposes of this study h m (u i , t) is obtained from g m,i (t) by cropping the first 10.7 ms. A raised cosine fade out is applied to the last 1 ms of the cropped response. Cropping a reverberant measured impulse response to obtain a quasianechoic measurement is possible because the response to the direct path wavefront decays rapidly and is already approximately 30 dB below the peak response before the first reflection from the room arrives.

B. Complete database
Whilst it was not necessary for the current study, for some applications, such as simulating the experience of listening in a virtual sound environment 28 , very precise matching of the loudspeaker array is desirable. Therefore, for the convenience of future users, a calibrated database which compensates for transducer sensitivity differences and time of arrival offsets has also been produced.
The complete database of as-meaured HARIRs, calibration impulse responses, study database and calibrated database are publicly available 29 . Futhermore, Matlab scripts to produce the study and calibrated databases from the as-measured database are also publicly available 30 .

III. PROBLEM FORMULATION
Expressed in the frequency domain, the observed signal, Y m (ω), at the m th microphone in an array is where X m (ω) is the signal due to the desired source and V m (ω) is the unwanted signal due to reverberation, acoustic noise and sensor noise. The signals for an array of M microphones are expressed in vector notation as where T and (·) T denotes the transpose; the vectors x(ω) and v(ω) are similarly defined. For a target source signal, S(ω), incident from direction u j , the M channel observation at the array is Defining m = α to be the reference channel, and the relative transfer function (RTF), d α (u, ω), with respect to the reference channel as gives x(ω) = d α (u j , ω)X α (ω). Substituting into (4) gives in which the clean signal at the microphones is described in terms of the RTF, d α (u j , ω), and the clean signal, X α (ω), observed at the reference microphone. The aim of beamforming is to obtain an estimate, Z α (ω), of X α (ω), that is the observation of the target source at the reference microphone which is free from reverberation and noise, by applying a spatial filter, w α (ω), according to where (·) H denotes the conjugate transpose. The minimum variance distortionless response (MVDR) beamformer solution 31,32 to this estimation problem is where is the noise covariance matrix and E {·} denotes the expected value. Regularization of (8) can improve beamformer robustness 33,34 but is not considered in this study. Assuming d α (u j , ω) and R v (ω) are known precisely, substituting (8) into (7) gives which indicates the desired signal is passed undistorted and the beamformer filters (i.e. reduces) the unwanted noise. The extent of noise reduction that can be achieved depends on the number of microphones, the interchannel coherence of the noise and reverberation and the DOA of the target.
In bilateral beamforming the left and right hearing aids are considered as two independent arrays, each obtaining an estimate of the desired signal at its own reference channel using only the local M = 2 microphones. In binaural beamforming the spatial filter associated with each hearing aid again obtains an estimate of the desired signal at its own reference channel, but the two hearing aids are treated as a single array with M = 4 microphones. Therefore, in both bilateral beamforming (denoted '2:2') and binaural beamforming (denoted '4:4'), the MVDR solution ensures that, provided d α (u j , ω) is known, the signals will retain the correct binaural cues for the target source. In contrast, the binaural cues associated with the noise will not be preserved 35 . In the case of the bilateral beamformers the residual noise at each ear depends only on the microphone signals at that ear and so the coherence between the noise at each ear is no higher than the original microphone signals. For the binaural beamformer the enhanced signals at each ear are two different weighted combinations of the same M = 4 microphone signals and so the noise coherence is increased to unity 36 .
In practice, estimates of d α (u j , ω) and R v (ω) are obtained using calibration measurements and assumptions about the spatial distribution of the noise, or online using the received signals [37][38][39] .
The remainder of this paper is focused on investigating the impact of using mismatched ATFs for MVDR beamforming, where the common assumption of isotropic noise is used to calculate R v (ω) and the DOA of the desired source is assumed to be known a priori.

A. Simulated acoustic scene
Microphone signals are generated which simulate those encountered by a particular individual in the listening room described in Section II. Specifically, the time domain microphone signals, y where g (l) m,i (t) is the full HARIR for participant index l from direction index i = {1 . . . I} measured at micro-phone index m, j denotes the direction index of the target source, s(t), n b (t) is a babble signal with the same long-term average speech spectrum (LTASS) and power as s(t), ∆ i is a time offset associated with direction i, β is a scalar gain parameter and * denotes convolution. Sensor noise is not included in the simulated signals.
The babble signal in (10) is composed of concatenated utterances from each of 4 male and 4 female talkers from the IEEE sentences corpus all overlayed to form 8-talker babble and ∆ i is an arbitrary offset, randomly selected for each direction in each simulation, to select a different section of noise. All I = 16 measured source directions, spaced 22.5 • apart on the horizontal plane, are used to create the noise field. The resulting background noise is therefore approximately isotropic around a circle in the horizontal plane but, since g (l) m,i (t) contains the natural reverberation of the room, there is reflected sound energy arriving from all directions including above and below the horizontal plane. Note that the target source, s(t), is also filtered by the full reverberant response for the room and so the simulated sound scene is representative of a real listening environment.
The current study considers scenarios in which the target source is either to the front of the listener (φ = 0 • ), denoted frontal target, or else towards the listener's left side (φ = 67.5 • ), denoted lateral target. The reported SNR, 20 log 10 (β) dB, represents the ratio of desired source energy to noise energy input into the room. As such any direction-dependent change in level observed at any of the microphones due to the direction-dependent filtering of the HARIR is retained in the simulated stimuli. Further details of the target source material specific to the numerical and human evaluations are described in Sections V and VI, respectively.

B. Beamformers
All beamformers are designed with a priori knowledge of the DOA of the target source and under the assumption that the noise field is cylindrically isotropic, i.e. where uncorrelated noise sources are uniformly distributed around a circle in the horizontal plane. The a priori knowledge of the target DOA represents a realistic use case where the listener can choose to turn to directly face the target (frontal target) or independently steer the beamformer's look direction to be non-frontal (lateral target). The assumption of noise field isotropy is common 5 in signal-independent beamformers and cylindrical isotropy is appropriate since active sound sources tend to be on, or near, the horizontal plane of the listener and floors and ceilings tend to be more absorbent than walls 40 . As in most real rooms, the cylindrical isotropy assumption is slightly incorrect since there are reflections from the floor and ceiling and the contributions of these, along with reflections from the walls, are neither necessarily equally distributed in azimuth, nor uncorrelated with each other. This intentional mismatch between the assumed and simulated noise fields ensures that the re-sults of the current study are representative of a realworld use case.
Beamformers are implemented as linear time invariant finite impulse response (FIR) filters whose coefficients are designed in advance based on measured HAHRIRs and a simulated cylindrically isotropic noise field. The filter weights are designed in the frequency domain such that they depend only on the narrowband interchannel covariance of the simulated noise, as expressed in the noise covariance matrix (NCM), rather than its power spectral density (PSD).
For each participant, indexed l , the required HAHRIRs, h , is obtained, as in (5), where ν is the frequency index.
The NCM for a cylindrically isotropic noise field is obtained by simulating the microphone signals which would be observed in such a field according to where, similar to (10), noise sources are arranged at equally-spaced angles around the horizontal plane. In (11), only the direct path sound propagation is included and the individual noise source signals, n w,i (t), are independent, identically distributed, realizations of white Gaussian noise of 1 s duration. The frequency domain NCM, R (l) v (ν), is obtained as where V  m (u i , t), and non-overlapping, such that each frame is an independent sample of the noise process.
The frequency domain filter weights, w (l ) α (ν), are obtained as in (8). The inverse DTFT transforms the conjugated beamformer weights back into the time domain, a circular shift is applied to ensure causality and a Hamming window applied to ensure there are no discontinuities in the final FIR filters. This time domain post-processing of the beamformer weights avoids the possible introduction of artefacts due to sharp spectral features. Note that direct application of the frequency domain filter weights is avoided since the resulting filters are inexact 41,42 , time-variant and, in general, lead to STFT coefficients for which there is no realizable realvalued time domain signal 43,44 .
A beamformer is personalized when l = l, that is the same individual's measurements are used in (10) to simulate the microphone signals and to design the beamformer weights which process them. It should be emphasised that the personalized signal-independent beamformers investigated in this study are perfectly fit to the head and torso acoustics of a particular individual but not to the acoustics of the encountered sound scene.
A beamformer is generic when l = l. Of particular interest is the generic beamformer obtained from a HATS designed in accordance with ANSI standard S3.36 45 , since such mannequins are widely used.
In addition, this study investigates the effect of personalization, the effect of binaural (M = 4) vs bilateral (M = 2) beamforming, and the effect of target direction.

V. EVALUATION USING SIGNAL-BASED METRICS
In this section the inter-individual differences between HAHRIRs are investigated in terms of signalbased metrics of the resulting MVDR beamformer performance. An illustrative example of directivity patterns of the different beamformers is presented to give some initial insight. Subsequently, a systematic study of the effect of beamformer personalization is conducted using the modified binaural short-time objective intelligibility (MBSTOI) measure 46,47 to predict the expected intelligibility improvements offered by alternative beamformers.

A. Directivity patterns
The normalized A-weighted 48 directivity pattern, B α (u), is the power output from a beamformer in response to an A-weighted source plane wave with DOA u relative to the power at the reference microphone when the same A-weighted source plane wave is incident from the front, u 0 , i.e.
where S(ω) is an A-weighted source signal and e α is an M ×1 microphone channel selection vector with a 1 in the α th element and zeros elsewhere. When l = l the beamformer is said to be personalized, denoted 'Per', whereas when l = l it is said to be generic, denoted 'Gen'. The baseline in both cases, denoted 'Ref', is the power of the input signal at the reference channel (front-left microphone), which is given by ω E |e T α h (l) (u, ω)S(ω)| 2 dω. In the specific case l = 28, the generic beamformer is refered to as 'HATS'.
Each plot in Fig. 2 shows A-weighted directivity patterns at the left ear for 5 different beamformer configurations when steered towards a frontal target (top row) or a lateral target (bottom row). Each column shows the directivity pattern for a different individual. The 'Ref' directivity patterns indicate the natural directivity due to the acoustics of the head and, for any given individual, are independent of the target direction. Comparison of the 'Ref' directivity patterns in each column reveals substantial variation between individuals, which is consistent with the literature 18 . In the direction of the target, the response in the 'Per' conditions is always identical to that of the 'Ref' condition because of the distortionless constraint. According to the normalization in (13), this corresponds to 0 dB for the frontal target (top row) and an individual-dependent level for the lateral target (bottom row). For all individuals, the directivity pattern for the bilateral beamformers between 300 • and 60 • is very similar to the 'Ref' condition with increased sensitivity on the ipsilateral side. Conversely, the binaural beamformers are more symmetric between the left and right sides. In general, for a frontal target position, the personalized binaural beamformer directivity patterns are much sharper than the personalized bilateral directivity patterns. This is not the case for the lateral target.
Individual s42 (left column) is the same mannequin as in s28 but measured on a different day. Any difference in the directivity pattern between the 'Per' and 'HATS' conditions for s42 is therefore representative of the variation which might be expected due to, for example, reseating the hearing aids. It also gives an indication of the intrinsic variability due to the measurement setup. For the frontal target, the directivity patterns in 'Per' and 'HATS' conditions have a very similar shape. The largest differences, 1 dB and 3 dB for bilateral and binaural respectively, occur between 120 • and 150 • . Whilst the bilateral response at 0 • is 0 dB, the binaural response is −1 dB. It is possible that this difference is due to a small difference in head alignment during the two sets of measurements. Since bilateral beamformers have a broader main lobe they are also more robust to steering errors 5 . For the lateral target, performance for s42 is essentially identical in the 'Per' and 'HATS' conditions suggesting that the beamformers are robust to small differences in the array manifold. The directivity patterns are also the same for bilateral and binaural beamforming, suggesting that in this configuration the contralateral microphones do not contribute substantially.
Participant s34 has a head circumference most similar to s28 (56.0 cm c.f. 55.9 cm). Nevertheless, comparison of the 'Ref' directivity between s34 and s42 shows differences of up to 6 dB at 180 • . The result is that the 'HATS' directivity patterns are distorted compared to the 'Per' directivity patterns. Performance for the binaural beamformer steered to the front is particularly bad with 1.5 dB of attenuation in the target direction and 3 dB less suppression than the 'Per' beamformer over most other angles. Participant s32 is the least similar to the HATS in terms of head circumference with the biggest head (62.7 cm) of all tested individuals. Again, the '4:4 HATS' directivity pattern is <0 dB at 0 • for the frontal target, but in this case more suppression is achieved that with the '4:4 Per' beamformer between 22.5 • and 67.5 • . However, this is offset by substantially less attenuation between 112.5 • and 337.5 • . With the lateral target, for s32 and s34 there is neglible difference between bilateral and binaural beamforming but a clear effect of personalization. For some directions towards the rear the benefit of personalization is 4 dB to 5 dB.
In general, the analysis of directivity patterns suggests that, for the frontal target, binaural beamforming always gives a benefit over bilateral beamforming, regardless of personalization, but in both cases personalization leads to more compact directivity patterns and avoids signal attenuation in the target direction. For the lateral target binaural beamforming offers little, if any, advantage over bilateral, but personalization offers a substantial improvement, particularly in suppression of sound arriving from the rear.

B. Effect on predicted intelligibility
The directivity patterns presented in Section V A confirm that the spatial response of the beamformers is consistent with expectations. However, they do not account for the reverberation properties of the room and they do not consider the effect of interaural coherence on binaural hearing. Therefore, directivity patterns do not allow one to easily assess the effective speech intelligibility improvement experienced by a listener in practice. To address this, we (i) simulate noisy reverberant listening conditions and (ii) assess the effect of hearing aid beamformers using an intrusive measure of predicted binaural speech intellgibility.

Method
Reverberant microphone signals for each participant, l, in the database are simulated as described in Section IV A. Bilateral and binaural beamformers are designed for each participant, l , in the database, as described in Section IV B. Consistent with the steering vectors used, the desired signal at each ear is βh m,j (t) * s(t) where, as in (5), α is the index of the reference channel at that ear. That is, the direct path component of the target source as observed at the reference microphones. For each combination of signals and beamformers, the predicted intelligibility of the enhanced binaural signals are computed using the MBSTOI metric 46,47 for input SNRs −17, −14, −11 and −8 dB. It was shown 46,47 that the MBSTOI is able to predict well the intelligibility of speech signals in combined additive noise and reverberation, similar to the ones in this study. For each combination tested, a logistic model was fitted to the monotonically increasing MBSTOI vs SNR relationship. The SNR at which the model achieved a MBSTOI value of 0.25, compared to the SNR at which the unprocessed noisy signal achieved the same value was taken to be a measure of the equivalent improvement in input SNR, in dB. The MBSTOI value of 0.25 was chosen as the midpoint between the best-case value at −17 dB (0.21) and the worst-case value at −8 dB (0.29). The choice of reference is not critical since the slopes of the logistic models are very similar. This measure of equivalent SNR improvement allows an intuitive way to compare the predicted binaural intelligibility benefit of personalized (l = l) beamformers and all possible nonpersonalized (l = l) beamformers.

Results and discussion
The equivalent SNR improvement for bilateral and binaural beamformers is shown in the left and right columns of Figure 3, respectively, in response to a frontal (top row) and lateral (bottom row) target. The shift in ordinate axes between plots reflects the overall trends that binaural beamformers offer greater benefit than bilateral beamformers and that greater benefit is ob-tained for a frontal target than a lateral target. The particular focus of this study is the relative benefit of personalized beamforming. In general, the benefit obtained using the personalized beamformers is greater than for non-personalized beaformers, as summarized by the interquartile range (IQR). This is particularly apparent for the binaural beamformers. However, there are some combinations of individual, beamformer and target for which the best non-personalized beamformer, denoted 'Best non-per' performs better than the personalized one. This is more frequent for bilateral beamformers, with 14 occurences for the frontal target and 22 occurences for the lateral target, than for binaural beamformers, with 4 occurences each for the frontal and lateral targets. This non-optimal performance of the personalized beamformer in some cases is possibly due to the small mismatch between the simulated conditions and the model assumptions. An alternative explanation, given the small differences involved, is that it is a limitation of the MBSTOI metric. In any case, it should be noted that the bestperforming non-personalized beamformer is different in each case.
Of particular interest is performance using the HATS-derived beamformer, since a HATS is intended to represent an average person. In all but two cases (s1 and s40 with bilateral beamformer and frontal target) the im-  provement obtained using the personalized beamformer is greater than the benefit of the HATS beamformer. It is therefore expected that personalized beamforming will improve speech intelligibility compared to generic beamforming using filters derived from the HATS. In Figure 4 the predicted relative benefit of personalized beamformers compared to the HATS beamformers are shown. The boxplots show the overall distribution and the crosses to the right of each box are the predicted benefits only for those individuals who also participated in the matrix speech intelligibility test reported in Section VI. The median predicted benefit of personalization for bilateral beamformers is 0.3 dB to 0.4 dB and the median equivalent predicted SNR benefit of personalization for binaural beamformers is 0.6 dB to 0.7 dB. In all cases these medians are significantly (p < 0.05) different to 0 dB. In addition to this overall result, it is interesting that some individuals are predicted to benefit substantially more, up to 2.0 dB and 1.8 dB, for personalized bilateral and binaural beamforming, respectively.
The results presented in this section suggest that HATS-derived beamformers may be close to optimal for some individuals but far from it for others. One might expect the extent of the benefit to be related to, for example, the similarity of an individual's head size to that of the HATS. However, dividing the population into two equally-sized groups based on head size did not lead to a significantly better improvement for the smaller group, which is more similar to the HATS head. Therefore, the psychophysical validation of the effect of beamformer personalization on speech intelligibility, reported in Section VI, treats all participants as belonging to a single group.

A. Experiment design
Eleven native Danish speakers (6 male, 5 female) whose HAHRTFs were measured as described in Section II participated. All had normal hearing according to pure tone audiograms measured prior to the experiment. All stimuli were generated specifically for the individual, as detailed in Section IV A, such that each participant heard noisy reverberant hearing aid signals as would be experienced by that individual in the measurement room. Five processing conditions were tested, as listed in Table II. Stimuli were presented over Sennheiser HD650 headphones without equalization. Following the procedure of Experiment 3 in 49 , in each trial the participant listened to a single presentation of a sentence from the Dantale II corpus 50 and used a graphical user interface to select the heard words. For each of the five words in a sentence the subjects were offered a choice between ten possible words and the option to pass (if the word had not been heard at all). Stimuli were generated according to (10), with the speech level scaling parameter, β, set to give SNRs prior to enhancement of −17, −14, −11 and −8 dB. These SNRs were determined in pilot experiments to elicit word accuracy rates which approximately span the informative range of word accuracy scores over all test conditions. The playback level was set such that a noise-only stimuli, i.e. β = 0, was presented at approximately 65 dB SPL.
A block of trials consisted of a single presentation of all experiment conditions in random order. Each experiment condition was repeated 10 times giving a total of 11 participants × 5 processing conditions × 2 target DOAs × 5 words per sentence × 4 SNRs × 10 repetitions = 22000 words. Responses were collected over two selfpaced sessions lasting less than 1 hour each. Participants were encouraged to take at least one break per session. The first session began with a short training phase where participants familiarized themselves with the response interface and types of stimuli and were invited to adjust the playback level if desired. No feedback was given on the accuracy of responses either during training or the main experiment.
B. Results Figure 5 shows the distribution of average word accuracy across participants for each of the 5 test conditions separately for frontal and lateral targets. It is immediately apparent that all the beamformers offer substantial improvement over the reference condition. This is particularly true with the frontal target where performance in the reference condition is worse than with the lateral target. This is consistent with both the literature 51,52 and the numerical results presented in Section V.
Since the focus of this study is the effect of personalized beamformer design versus non-personalized beamformer design, the reference condition is excluded from further analysis. Statistical significance of word accuracy between conditions is addressed using a mixed-effects logistic regression as implemented in the 'lme4' package 53 for R 54 . Fixed factors of target direction (frontal, lateral), configuration (bilateral, binaural) and personalization (personal, HATS) were coded with treatment contrasts while SNR was continuous. The random effect of participant identity was modelled as an independent off- set. Starting from the null model, the likelihood ratio test was used to sequentially add significant (p < 0.05) main effects and interactions. At each stage simplification of the model by pruning terms was also tested. The final model includes significant effects of SNR and target direction and significant interactions between SNR and configuration and between configuration and personalization. Table III details the final model coefficients and standard errors in logit units. The positive coefficients indicate that increasing SNR increases intelligibility as does binaural beamforming with personalized filters. The negative coefficient for the lateral target condition indicates that, after beamforming, lateral targets are less intelligible than frontal targets. The negative coefficient for the interaction of binaural beamforming and SNR indicates that the relative benefit reduces as SNR increases. This is consistent with the fact that intelligibility reaches a ceiling at higher SNRs even without enhancement and so the relative benefit reduces. Note that there is no interaction between target direction and either the configuration or personalization of the beamformer.
To give an intuitive sense of the mixed-effects logistic regression model behaviour, Figure 6 shows the model response as a function of input SNR in terms of proportion correct for a frontal target (top) and lateral target (bottom). Improvements due to the significant fixed effects can be interpreted in terms of a left-shift of the 50 % speech reception threshold (SRT). Here binaural beamforming reduces (improves) the SRT by 0.90 dB compared to bilateral beamforming and personalization adds a further benefit of 0.40 dB over the HATS derived beamformer. It should be stressed that the 0.40 dB benefit of personalization is the average predicted benefit over the population of test participants and is statistically significant at the 5 % level; some individuals stand to benefit more than this relatively modest amount. It can be seen that the effect of the interaction between SNR and beamformer configuration is to reduce the relative benefit of binaural beamforming as SNR increases, since the probability of a correct response approaches 1 in all conditions.
Comparing the psychophysical experiment results reported in this section to the numerical results obtained in Section V B, the predicted intelligibility improvements were reasonably close. Considering only those partici- pants who took part in the speech intelligibility test, as represented by the crosses in Figure 4, the median benefit of personalization for binaural beamformers was predicted to be 0.52 dB and 0.68 dB for frontal and lateral targets, respectively, and 0.4 dB was achieved in practice. In the case of bilateral beamforming, no benefit of personalization was observed in the psychophysical experiment whereas the numerical results predicted a median benefit of 0.22 dB and 0.52 dB for frontal and lateral targets, respectively.

VII. CONCLUSIONS
Using a newly collected and publicly available database of HAHRIRs, the effect of beamformer personalization on model-based beamforming has been studied.
An analysis of directivity patterns suggests that, for the frontal target, binaural beamforming always gives a benefit over bilateral beamforming, regardless of personalization, and that for both bilateral and binaural beamforming personalization leads to more compact directivity patterns and avoids signal attenuation in the target direction. For the lateral target, binaural beamforming offers little, if any, advantage over bilateral beamforming but personalization offers a substantial improvement, particularly in suppressing sound arriving from the rear.
Predicted speech intelligibility using the MBSTOI measure suggests that the benefit of personalized beamforming compared to using HATS-derived beamformers is equivalent to a 0.3 dB to 0.4 dB increase in SNR. For binaural beamformers the equivalent benefit of personalized beamforming is 0.6 dB to 0.7 dB. For some individuals the benefit is predicted to be as much as 2.0 dB.
In a matrix speech intelligibility test, binaural beamforming gave an average benefit of 0.9 dB over bilateral beamforming. In the binaural case, personalization of the beamformers provided an additional 0.4 dB benefit over the HATS beamformer.