Influence of voxelization on finite difference time domain simulations of head-related transfer functions

The scattering around the human pinna that is captured by the Head-Related Transfer Functions (HRTFs) is a complex problem that creates uncertainties in both acoustical measurements and simulations. Within the simulation framework of Finite Difference Time Domain (FDTD) with axis-aligned staircase boundaries resulting from a voxelization process, the voxelization-based uncertainty propagating in the HRTF-captured sound ﬁeld is quantiﬁed for one solid and two surface voxelization algorithms. Simulated results utilizing a laser-scanned mesh of Knowles Electronics Manikin for Acoustic Research (KEMAR) show that in the context of complex geometries with local topology comparable to grid spacing such as the human pinna, the voxelization-related uncertainties in simulations emerge at lower frequencies than the generally used accuracy bandwidths. Numerical simulations show that the voxelization process induces both random error and algorithm-dependent bias in the simulated HRTF spectral features. Frequencies f r below which the random error is bounded by various dB thresholds are estimated and predicted. Particular shortcomings of the used voxelization algorithms are identiﬁed and the inﬂuence of the surface impedance on the induced errors is studied. Simulations are also validated against measurements.

The scattering around the human pinna that is captured by the Head-Related Transfer Functions (HRTFs) is a complex problem that creates uncertainties in both acoustical measurements and simulations. Within the simulation framework of Finite Difference Time Domain (FDTD) with axisaligned staircase boundaries resulting from a voxelization process, the voxelization-based uncertainty propagating in the HRTF-captured sound field is quantified for one solid and two surface voxelization algorithms. Simulated results utilizing a laser-scanned mesh of Knowles Electronics Manikin for Acoustic Research (KEMAR) show that in the context of complex geometries with local topology comparable to grid spacing such as the human pinna, the voxelization-related uncertainties in simulations emerge at lower frequencies than the generally used accuracy bandwidths. Numerical simulations show that the voxelization process induces both random error and algorithm-dependent bias in the simulated HRTF spectral features. Frequencies f r below which the random error is bounded by various dB thresholds are estimated and predicted. Particular shortcomings of the used voxelization algorithms are identified and the influence of the surface impedance on the induced errors is studied. Simulations are also validated against measurements. One important acoustic wave scattering problem is the influence of the human body on a sound field in the vicinity of the ear canal since it contains all of the necessary cues for sound localization. Such influence can be modeled as linear time-invariant filters and are generally referred to as headrelated transfer functions (HRTFs) or head-related impulse responses. These direction-dependent filters are usually measured in anechoic conditions such that the transmission of an incident plane wave generated by a point source from the farfield to a point close to the ear canal is captured for each ear. 1 This process is not without limitations and may result in inaccurate 2-4 or costly and generally time-consuming HRTF measurements and protocols. [5][6][7][8] Computer simulated HRTFs represent an attractive alternative since they do not suffer from the drawbacks inherent in acoustic measurements. A variety of simulation methods have been employed to simulate HRTFs: the boundary element method, [9][10][11] the finite element method, 12 the adaptive rectangular decomposition method 13 and the finite-difference timedomain method. [14][15][16][17] However, simulations also have specific limitations, constraints and parameters that affect the resulting HRTFs such that the consistency with measured HRTFs is affected 15,18 by inaccuracies in both frameworks.
In the case of Finite Difference Time Domain (FDTD) simulations, there are many factors influencing the computed sound field independent of the actual numerical simulation. One of them is the voxelization process, in which a given continuous geometry undergoes one or several stages of approximations until it results in a discrete, voxel-based geometry. For a staircase approximation of the boundary, different approaches exist: the continuous geometry is directly acquired into voxels and further processed (such as an MRI scan 15 ) or first scanned at fixed, discrete locations (such as a mesh resulting from a laser scan 11 ) and then the resulting polyhedron mesh voxelized to be used in an FDTD simulation. The voxelization process of a three-dimensional (3D) geometry is neither a straightforward nor trivial task: depending on the constraints and the required properties of the resulting geometry, it is possible that it will not preserve the mesh topology or that it will not be unique. 19 Different algorithms are designed for the voxelization process that have their own parameters, which in turn may generate additional errors in the resulting sound field, independent of the FDTD update scheme and boundary implementation.
It is unclear how important these voxelization-induced errors are within an FDTD simulation of the head-scattering problem. Some limitations of the staircase-like boundaries were shown 20 only for simple geometries and cases. The current work studies the influence of the voxelization process in the context of FDTD HRTF simulations for cases where the geometry is acquired through a laser scanning process. The propagated errors are quantified by measuring the induced variance in the sound field inside the concha volumes caused by minor translations of the acquired geometry. Nonetheless, the reported variance issue is expected to become insignificant as the voxel is greatly reduced in size. It is noted that similar issues may arise when the voxel-based geometry results from an MRI scan; although the geometry-grid misalignment of the geometry is less of an issue due to the reduced grid sizes generally used, 15,17 either scanner parameters or voxel processing algorithms may influence the FDTD-simulated acoustic field significantly.
Various factors are analyzed: the voxel size is first examined alongside three different voxelization algorithms. In addition, the translation distance causing the sound field variance is also studied as a factor. The frequency is regarded as a parameter in the analysis. As the final factor, the influence of impedance is investigated. Both ears are considered in the investigation. Results are first analyzed based on all directions and frequencies and then a frequency-dependent variance analysis is performed for the same data. In addition, the current simulations are also validated against measurements. 5

II. METHODS
This section describes the mesh acquisition process, the FDTD simulation parameters, and the used voxelization algorithms. A number of simulation parameters are set such that they do not introduce additional variance in the simulated HRTFs and the possible factors affecting the captured variance are identified. The coordinate system used in this work is the vertical-polar coordinate system 5 whereby directions are given as ðazimuth; elevationÞ ¼ ðh; /Þ pairs in degrees.

A. Mesh acquisition and voxelization
A laser-scanned version of the KEMAR head and torso simulator with large pinna 21 was used since it minimizes variation in mesh acquisition and in HRTF measurements compared to a human subject. The head mesh was obtained using a high-resolution scanner (3D Scanner Ultra HD, NextEngine, Inc., Santa Monica, CA) at maximum resolution (dimensional accuracy 22 of 60.127 mm). The acquired mesh scan had artifacts at the inner front walls of the cavum concha that had various parts occluded especially for the left ear. Since the voxelization is expected to represent a more coarse approximation of the geometry, the mesh acquisition process can be relaxed: the head mesh is initially decimated such that the resulting two-sided 23 Hausdorff distance was 0.31 mm (evaluated with an open-source tool 24 ). Subsequently, the frontal part of the cavum concha is manually corrected for both ears such that the measured distance from the surface of the microphone (at a blockedear location) to the tragus is the same on the mesh as it is on the physical KEMAR ($9 mm). Further, the anterior part of the concha is expanded and flattened in such a way that a 1/2 in. microphone positioned to match lateral pictures of KEMAR could fit inside the cavity. In addition, the final shape is checked such that manually fitted, independent ear-scanned meshes have a similar shape in the problematic areas. However, some inaccuracies for the mesh in the concha are likely to persist.
Since a torso affects the spectrum of the resulting HRTFs especially at low frequencies, 25 a less-accurate torso mesh is also acquired using Kinect and ReconstructMe software. 26 The resulting torso mesh is scaled, aligned, and welded to the head mesh such that the resulting mesh matches the available anthropomorphic data of KEMAR 5,21,27 (chest breadth, shoulder breadth, tragion to shoulder), the neck height corresponds to two neck rings, 27 and the horizontal plane and middle of the head match the reported data about KEMAR. 28,29 A view of the resulting mesh that was used for the voxelization algorithms can be seen in Fig. 1.
As voxelization algorithms, the two surface voxelization algorithms and one solid voxelization described by Schwarz and Seidel 30 are implemented: conservative ["CONS"-an example that can be seen in Fig. 2(a)] voxelization that creates a unique supercover 19 of the geometry, 6-separating ["6SEP" example in Fig. 2(b)] that creates a 6-separating version of the voxelized mesh that is also suitable for a second order spatial stencil and a solid ["SOLID" example in Fig. 2(c)] voxelization that also voxelizes the interior of the volume. Since the solid voxelization is based on the Jordan curve theorem (starting from the center of the voxel), the polyhedron mesh needs to be watertight and free of other artifacts: orphan triangles, duplicate triangles, and overlapping triangles. Since this was not the case for the laserscanned mesh, manual corrections were done to make the mesh itself suitable for the voxelization algorithm.

B. FDTD simulation
The update scheme used is the standard rectilinear explicit update with second order temporal and spatial FIG p . This entails a regular cubic grid with a fixed voxel size of DX. To speed up simulation, the implementation is done on the Graphics Processing Unit (GPU) using CUDA kernels. 31 The locally reacting staircase boundaries can be configured to have a frequency independent impedance using the onesided difference operator for the spatial gradient at the boundary. 32 Although the temperature during measurements is not reported, 5,27 the temperature does not seem to drastically affect the resulting HRTFs (Ref. 18) and it is fixed to 20 C, resulting 33 in a sound speed of 343.39 m/s. While all voxelization computations are done in double precision, all FDTD updates are done in single precision, which all have been confirmed to remain stable without any numerical accuracy problems.
The principle of reciprocity is used such that one simulation generates signals for all receivers. The source signal is a pressure injection (also referred to as "soft source" 34 ) discrete delta. The source and receiver positions in the continuous 3D space are transformed to FDTD integer lattice coordinates by setting the corresponding voxel index in which each position resides.

C. HRTF computation
Since a 3 ms long signal is enough to capture all the important attributes of HRTFs, 35,36 the simulation domain is chosen so as no domain reflection affects a 3 ms long signal recorded 1 m from the center of the domain (excluding the propagation delay). The receiver locations are chosen 1 m away from the center of the head and the same positions as in the measurements 5 are used with the addition of the lateral directions: ð690 ; 0 Þ. Consequently, the simulation domain is chosen as a cube of 3 m side with the center of the dummy head placed in the middle. The frontal direction matches the x axis and the interaural axis is parallel to the y axis. The mesh domain and FDTD lattice are axis-aligned and positioned such that the origin of the mesh domain corresponds to the corner of the minimum world space coordinates of the voxel with index {0, 0, 0}. The boundary for the voxelized geometry of KEMAR is set as fully rigid n KEMAR ¼ 0 and the exact admittance value is later considered as a factor in the analysis.
It is also reasonable to assume that as the voxel size decreases, the voxelization issues become less important. Therefore, two different voxel sizes are presented (and chosen such that the frequency resolution of the simulated HRTFs matched the measured ones À220.5 Hz): DX ¼ 2.97 mm and DX ¼ 2.35 mm, the latter limited by the available GPU memory. The corresponding sampling frequencies are f s ¼ 200.214 kHz and f s ¼ 253.134 kHz, respectively. Although the two chosen voxel sizes were found to be sufficient to show the influence of the voxelization process on the FDTDsimulated HRTFs, other voxel sizes are similarly analyzed to confirm the findings and to predict the behavior of the variance at even lower DX voxel sizes.
The computational nodes utilized each had either two Tesla M2090 GPU cards or two Tesla K40 GPU cards each having 6 GB of GPU memory or 12 GB of GPU memory, respectively. Thus, the maximum GPU memory available for a single FDTD computation was about 24 GB.
The HRTFs are obtained through the usual free-field equalization in the frequency domain 1 with a rectangular window applied to the time signals. The window size is chosen such that it matches the one in the measurements, i.e., $4.53 ms. The time signals are low-pass filtered before the frequency division with a 150-tap linear-phase filter (designed with the window method) having a cutoff at 15 kHz.

III. ANALYSIS OF SIMULATIONS
The main objective is to quantify how the staircase boundaries resulting from the voxelization process in an FDTD simulation affect the sound field inside the concha. Here, this is achieved by computation of sound field variance.

A. Quantifying voxelization influence
The voxelization process entails different steps that can generate errors in the simulated sound field. First, the mesh must be super-imposed on an FDTD grid and second converted to voxels with a voxelization algorithm. For small translations of the mesh relative to the grid, the voxel-based geometry resulting from a voxelization algorithm changes due to discretization effects. This possible algorithmdependent error affecting the sound field in the concha can be quantified by assessing the changes in the sound field caused by small translations of the mesh. One way to quantify the sound field changes is to look at the variances in the pressure field caused by the voxelization process under similar conditions. The analysis is limited to magnitude changes in the spectrum caused by small translations of the polyhedron mesh and quantified through the use of HRTFs. No rotations are included in the study since additional variance would be introduced (either changes in HRTFs with direction if the source-receiver distances are kept constant or changes in source-receiver distances). Albeit the voxelized geometry is composed of discrete cubic volumes, the relative position of the mesh to the FDTD grid can take any value. The variance of the magnitude of the HRTFs is quantified in the frequency domain based on the relative position of the mesh to the grid X ¼ ½x; y; z T for each frequency bin and each direction. For this, Monte-Carlo estimators are used, which are a simple and reasonable approach to a variance analysis problem. 37 The magnitude at each frequency bin can be written as a continuous scalar random variable y based on the continuous (up to machine epsilon) random input X, jHRTFðx; h; /Þj ¢y ¼ f ðXÞ ½dB; (1) where f ðÁÞ is deterministic. Since the position of a point relative to the grid is invariant to axial translations by DX, the input variable was initially chosen as uniformly distributed on the ½ÀDX=2; DX=2Þ 3 cube. Ideally, the variance should be quantified at the same location as measured HRTF, for example, at the blocked entrance of the ear canal. However, due to different voxelization algorithms, this is not possible since the source voxel may result in a solid (i.e., a boundary). In addition, for consistent comparisons, the source location voxel needs to be fixed, independent of the translation of the mesh and independent of voxelization type. This forces the location of the sound source at a certain minimal distance from any surface triangle so that even a diagonal displacement for a conservative voxelization (worst-case scenario) would not yield a solid voxel where the source has to be placed.
Therefore, for each voxel size, a sphere of radius r ¼ DX ffiffi ffi 3 p ð1 þ 0:5Þ mm was manually fitted inside the cavum concha such that no mesh triangles intersected its volume, especially among the Cartesian diagonals (for example, the condition can be relaxed in axial directions to r ¼ DX1.5 mm). The center of the sphere was chosen as the source location and the innermost locations satisfying such constraints are close to the exterior edge of the concha volumes, 11.87 mm (left ear) and 10.2 mm (right ear) away in the continuous space from the center of the 1/4 in. KEMAR blocked-ear microphone for DX ¼ 2.97 mm and 7.99 mm (left ear: see an example in Fig. 3) and 8.31 mm (right ear) away for DX ¼ 2.35 mm.
At such distances from the ear canal, although the scattered field inside the concha should be captured, all the localization cues are likely not. 38 However, it can be assumed that the variance in the sound field caused by each voxelization algorithm is similar inside the concha volume. This assumption could be confirmed by the similar variance estimates for the left and right ear since the source location relative to the ear is not equal. Nonetheless, the scanned pinnae are not identical and the left/right variance differences should not be considered an accurate indicator of a factor such as the exact source location.
In addition, the calculated free-field equalized filters in the variance analysis are referred to as HRTFs despite the fact that they are not calculated at a standard location (e.g., blocked ear). Note that although the HRTFs are calculated away from the concha walls, the ear canals of the mesh are still blocked.
Although implemented on the GPU, the simulations are rather time-consuming ($7.5 min for DX ¼ 2.35 mm, excluding voxelization for each simulation) and a reduced number of Monte-Carlo samples is desirable. The equiprobable stratified sampling method is chosen that provides estimates with lower variance 39 compared to uniform random sampling, independent of f ðÁÞ: each dimension is equally divided into a number of u subintervals of length Du creating a number of u 3 sub-volumes inside the input cube X. Then, for each sub-volume of integer index fu x ; u y ; u z g a sample is uniformly drawn for each dimension from the corresponding subinterval (e.g., ½u x Du; ðu x þ 1ÞDuÞ). A total of u ¼ 4 strata per dimension with one sample per stratum are used such that the possible negative bias of the variance estimator is bounded to À0:016 Á r 2 y ðE½r 2 À r 2 y Þ 0 (according to McKay et al. 39 ) for the total of 64 strata, where E½Á represents the expected value operator, N S the number of samples, r 2 y the true variance of y, and y the unbiased 39 The 64 Monte-Carlo runs (one for each stratum in the input X) are referred to as a whole and named a "batch" run. Similarly, all the 64 samples in X for each batch run are addressed as a "batch sample." The magnitude of the HRTFs varies with direction and frequency. 40 If the data are pooled over many directions or frequencies, the variance of HRTFs with direction or frequency will end up in the estimators. To exclude this variance in the conducted analysis, for each direction, frequency bin and Monte-Carlo batch run, the average magnitude is subtracted, where r represents the Monte-Carlo run index inside a Monte-Carlo batch run. Finally, the estimator in Eq.
(2) can be calculated based on pooled data from all directions and/or frequencies. For example, a frequency-dependent variance estimator r 2 ðxÞ is calculated over N S ¼ 64 Á N dir data points obtained from Eq. (4), where N dir represents the total number of directions.
Although unclear if the estimated variances r 2 are normally distributed or if the central limit theorem can be applied to a small number of samples of the used estimator in Eq. (2), the uncertainty is quantified using a student distribution with N R À 1 degrees of freedom, 37 where N R represents the number of independent stratified sample batches used for the N R -independent Monte-Carlo batch runs. The starting random seed is the same for each voxelization algorithm and each batch run. A total of N R ¼ 5 independent Monte-Carlo batch runs are used together with significance level a ¼ 0:05 such that the 95% confidence intervals (CIs) are reported for the final estimator The translation distance or equivalently, the volume of the input cube X, is expected to become a major factor in the variance estimates, especially with an increasing distance of translation. In addition, non-axial translations larger than a voxel side DX can be possible inside the used input cube X ½ÀDX=2; þDX=2Þ 3 . Consequently, the side of X is also changed to lower values than DX.
Although the induced variance through mesh translation is linked with both voxelization aspects (algorithm and relative mesh-grid position), for axial translations of DX, the changes in the sound field solely due to translation can be crudely estimated for a fixed voxel-based geometry. Therefore, for each voxelization algorithm, six axial translations from X 0 ¼ ½0; 0; 0 T can be used to gain some insight into the variance caused by X 2 ½ÀDX; DX 3 .
For ease of interpretation, the standard deviations are reported. The standard deviation for each Monte-Carlo batch run is obtained from the variance estimator as r ¼ ffiffiffiffiffi r 2 p and then the CIs are calculated in the same manner as for the variance.
Following previous FDTD HRTF studies, 16 the analysis is limited to frequencies for which the wavelength is spatially sampled at least 12 times. Therefore, the numerical dispersion is assumed to be negligible for frequencies below f max % 9.63 kHz for DX ¼ 2.97 mm and f max % 12.18 kHz for DX ¼ 2.35 mm. It is still noted that dispersion may affect the simulations even at the aforementioned f max frequencies if the interference of planar wave packets is considered within a 1.62 ms time window when the scattering is analyzed (excluding the 1 m propagation delay from the 4.53 ms window length). The percentage of the group delay s g caused by numerical dispersion in the worst direction relative to the period of the frequency can be estimated as E s g ¼ s g Á f max Á 100. The worst interferences appear at multipliers of E s g ¼ 50%. The group delay can be calculated numerically using a forward difference from the unwrapped phase of the grid transfer function. 41 Thus, for a voxel of DX ¼ 2.97 mm, a wave packet containing a frequency of f max ¼ 9.63 kHz traveling in the axial direction will be delayed through numerical dispersion by E s g ¼ 38% compared to the diagonal direction (unaffected by dispersion). For DX ¼ 2.35 mm, axial and diagonal wave packets with f max ¼ 12.18 kHz will almost be in antiphase (E s g ¼ 48%). While such scenarios are unlikely to substantially affect the not-fully-understood mechanism 17 that generates spectral cues, uncertainty still persists. Therefore, the analyzed bandwidth is considered a parameter in the analysis. The dispersion caused during propagation should affect the magnitude of the simulated HRTFs to a very low degree 42 below f max .

B. Comparison to measurements
The simulations are validated against the CIPIC measurements 5 of KEMAR with large pinna. The CIPIC database was chosen for the following reasons: it is a public database often used in HRTF-related studies, it contains a reasonable amount of directions (i.e., 1250) fairly distributed in space, a bare torso is used in the measurements that better matches the simulated hard surface impedance, and the measuring distance is 1 m, which requires a smaller simulation domain compared to larger distances.
For the purpose of comparison with measurements, the simulated HRTFs (see Sec. II C) are calculated at the blocked entrance of the ear canal. The voxel corresponding to the center of the KEMAR 1/4 in. microphone membrane at the blocked meatus is chosen as the sound source location. In case the voxel is solid due to the voxelization process, the first air voxel in the vicinity of a boundary is searched, with priority given to the outward direction (i.e., the normal to the median plane toward the ear) and then the backward direction (the sub-zero coordinates of the x axis). The position of the mesh relative to the grid is set to X 0 ¼ [0, 0, 0] T with the mesh aligned as described in Sec. II B. The FDTD HRTFs are computed for the same directions as in measurements. 5 It is unclear below which frequency the low-frequency inaccuracies in measurements 27  As in the variance analysis, the comparison with measurements is limited to the magnitude of the HRTFs. A consistent positive gain mismatch between the simulated and the measured HRTFs is found throughout the entire analyzed frequency range. The reasons for this are unclear, but an unexpected negative magnitude is present in the measurements at low frequencies. To exclude such a difference from the error measurements, the average value of the magnitude of the diffuse-field (d.f.) HRTF is calculated for each ear and each HRTF set separately between 400 Hz and 3 kHz, and the corresponding HRTF magnitudes were shifted down in decibels by such an average. The magnitude of the d.f. HRTF is calculated as an average of all directions from the log-magnitude spectra of the HRTFs. The frequency interval over which the average magnitude is calculated is based on the inaccuracy of the measurements at low frequencies 27 and the frequency where the pinna starts influencing 25 the HRTFs at high frequencies.
The magnitude of the resulting HRTFs are finally compared for each direction and each frequency bin. For error reporting, the mean absolute error of all directions is calculated for each frequency in decibels, where N dir represents the total number of directions, HRTF S the magnitude-shifted FDTD-simulated HRTF, and HRTF M the magnitude-shifted CIPIC HRTF. For each frequency bin, the standard deviation r e (x) is also estimated using the same estimator as in Eq.
(2) calculated for all directions and each frequency bin.
For consistency with previous studies, the global unsigned error [also referred to as spectral distortion 15 (SD)] is calculated for all of the directions and frequencies of interest (400 Hz f f max ) based on the magnitude-shifted HRTFs.

A. Artifacts caused by SOLID voxelization
The SOLID voxelization may miss thin layers of the geometry that pass between the test points. This can cause voxelized external ears with holes that may result in unreliable HRTF simulations. Therefore, the probability of a hole p hole rounded to two decimal places is calculated after viewing and counting the resulting voxel-based geometries with holes. The same batch samples as in the variance analysis are used to determine the probability of holes. Since most solid voxelizations contained holes, the same voxelization sampling was done without running FDTD simulations for smaller voxels until the probability reached 0: only one hole out of the 320 total voxelizations was identified for the right ear and DX ¼ 1.0 mm. Results are shown in Fig. 4 and show that SOLID voxelization is generally ill-suited for HRTF simulations unless a very small voxel is used. Note that p hole depends on input cube X: for DX ¼ 2.35 mm, p hole values drop to 0.82 (left) and 0.80 (right) when the sampling cube is restricted to 1 mm 3 . For the two analyzed voxel sizes, when the mesh is in initial position X 0 , all SOLID voxelizations have holes except DX ¼ 2.35 mm, right ear.
Holes are highly undesirable in the context of HRTF simulation and such high p hole values may not justify the study of the solid voxelization algorithm for the two chosen voxel sizes. However, holes were found to greatly vary in topological complexity and location. This is expected to affect the high-frequency HRTF spectra in various degrees depending on mesh-grid misalignment. In addition, due to a different geometrical approach to the voxelization process, the analysis of the solid voxelization algorithm was found valuable to support the subsequent results. Consequently, the solid voxelization is included in the analysis and holes are considered an outcome of the algorithm. Since the translation distances in X are also expected to influence the variance increase, it is likely that it will interact with the voxelization algorithm when analyzing the variance. For example, Fig. 5(d) shows for the larger voxel how the global standard deviation changes with voxelization algorithm for the same translation used in the DX ¼ 2.35 mm analysis. It can be seen that when analyzed up to f max , the translation distances do significantly affect the global standard deviations. However, the influence is much weaker compared to a factor such as the maximal frequency of interest f max [ Fig. 5(c)]. Note that the exact source location is still inconsistent between the two voxel sizes.
Restricting the translation volume even further to a 1 mm cube (i.e., X 2 ½À0:5 mm ; 0:5 mm Þ 3 ), the global standard deviation for such small translations [Figs. 6(a) and 6(b)] drops compared to the maximal possible translation [Figs. 5(a) and 5(b)] even when analyzed up to f max . Figures 5(a) and 6(a) and Figs. 5(b) and 6(b) also suggest an interaction between the voxelization algorithm and the translation cube X: for example, reducing the translation cube to 1 mm 3 for DX ¼ 2.35 mm, the variance for the 6-separating voxelization and right ear becomes similar to other voxelization algorithms; by contrast, for the left ear, a significant decrease in variance can be seen for the same voxelization algorithm.
While assessing how the translation distance affects the global variance, it is still unclear how other sources of variance like the translation variance contribute to the values calculated by the estimators. Using only six axial translations of 6DX, a rough estimate for the translation variance is obtained (see Sec. III A) and the global standard deviations are shown in Figs. 6(c) and 6(d) for the two voxel sizes used. No uncertainty measures can be estimated in such a case. Although the translation distance is higher in Figs. 6(c) and 6(d), it can be seen that the global variance drops below the CIs of Figs. 5(a) and 5(b) in half of the analyzed cases, while in three cases it rises marginally with values up to 0.2 dB. This suggests that the variance induced solely by translation generally affects the estimators to a smaller degree compared to voxelization-induced variance.

C. Frequency-dependent variance
Since the global variance drops when the analyzed frequency bandwidth is further restricted [Fig. 5(c)], it may be useful to analyze the variance based on frequency. Figure 7 shows the estimated standard deviation for each frequency and all directions for the maximal translation cube X. A linear frequency scale is used since low frequencies are not highly affected.
It can be seen that the variation caused by slight translations of the mesh starts influencing the sound field at frequencies lower than f max . The variance is more or less independent of the voxelization algorithm utilized and voxel size at low frequencies ( 5 kHz). Depending on voxel size, the variance starts increasing substantially around 6 kHz for DX ¼ 2.97 mm and around 8 kHz for DX ¼ 2.35 mm. At higher frequencies, the variance becomes highly dependent on the voxelization algorithm and exact location in the concha (note the exact location inside the concha volume differs between ears and voxel sizes). However, the sudden increase in variance at high frequencies is consistent across the collected data.  Fig. 7 the standard deviation curves start to rise at a higher frequency at the finer spatial resolution, but high values of the standard deviation continue up to the start of the shaded area, which is also at a higher frequency at the finer spatial resolution. Similar conclusions can be drawn when the translation cube X for the large voxel DX ¼ 2.97 mm is restricted as for the smaller voxel [same case study as in Fig. 5(d)]. The frequency-dependent standard deviation is shown in Fig. 8 and the sudden increase in variance slope around 6 kHz can still be observed.
So far, the variance analysis based on frequency appears to indicate that although the size of X influences the global variance, the sudden increase in the slope of the variance is not affected to a high degree. On the other hand, when X 2 ½À0:5 mm ; 0:5 mm Þ 3 , the frequency at which the variance increase is found can shift higher in frequency, especially for the conservative voxelization (see Fig. 9). In addition, the increasing rate of the variance appears to be affected by the size of the input cube X, making the sudden increase in the slope of the variance less obvious.
Considering the standard deviation estimates for the six axial translations (i.e., for translation of fixed voxelized mesh done at X 0 ), the frequency-dependent standard deviations are plotted in Fig. 10. The rate of change in variance lacks a clear sudden increase, showing a smoother transition compared to when the voxel-based geometry also changes (see Figs. 7 and 10). On the other hand, the frequency-dependent trend of the variance is similar, which shows that translation does induce variance per se that also increases with frequency and is captured in the estimators. A general increase in variance at lower frequencies compared to Figs. 7 and 8 and 9 is also seen.
Nevertheless, the axial-only translations yield smaller variances compared to Fig. 7, mostly at higher frequencies. The frequency-dependent variance reported in Fig. 7 is higher since it is a sum of translation variance, voxelization variance, and the covariance between the two. The lower variances at higher frequencies support the result that the translation-only variance does not dominate the variance estimators: despite larger translation distances, the variance is generally smaller than a combined variance from smaller translation and voxelization.
The local maxima in the frequency-dependent r estimators shown in Figs. 7-9 after the sudden increase are expected to appear at HRTF features that are direction-dominant. Although the HRTF features are rather complex and vary strongly with direction, direction-dominant features should be captured in the spectrum of the d.f. HRTF (calculated as described in Sec. III B). While most of the modal frequencies were investigated close to the ear canal, 43 lower frequency concha modes are expected to be captured throughout the concha volume. The d.f. HRTFs for DX ¼ 2.35 mm and DX ¼ 2.97 mm and each of the 320 Monte-Carlo voxelization of the 5 batch runs are depicted in Fig. 11. First, for DX ¼ 2.35 mm, the peaks in the variance estimators in Fig. 7 generally correspond to features in the average d.f. HRTF. For DX ¼ 2.97 mm, the sudden increase in variance does generally match the first notch in the averaged d.f. HRTF after which a higher random error makes the match less clear. This is also consistent with the increased oscillatory shape of the estimators in Figs. 8 and 9 where the sampling space is reduced: more consistent HRTF spectra are expected for the Monte-Carlo samples in a batch run.
Observing the spectral features between the 4.2 and 9.5 kHz concha modes, Fig. 11 also shows an algorithmdependent bias. For instance, the supercover generated by a conservative voxelization generates smaller distances in the pinna volumes (or equivalently, creates less air voxels inside the cavities of the external ear) compared to other voxelization algorithms, especially the solid voxelization. This causes the resulting modes 44 or interference patterns 45 to shift up in frequency. By contrast, the solid voxelization shifts such features to lower frequencies due to probable voxel-based geometries with larger distances and the general convex geometry of the pinna. Considering the d.f. HRTF from the CIPIC measurements 5 in Fig. 11, the two voxelization algorithms seem to induce opposite bias in the high-frequency spectral features of the HRTFs. The 6-separating voxelization induced volume bias is somewhere in between since the intersection volume is inscribed in each voxel (see work by Laine 46 ) leaving parts of the polyhedron mesh close to the vertices of the Cartesian grid outside the voxelized mesh. The amount of bias depends on the voxel size: for example, the shifts in frequency for the first peak and first notch of the d.f. HRTF are higher for the bigger voxel in Fig. 11.

D. Surface impedance
The impedance value used in simulations is also expected to be a factor in the variance analysis. Since softer surfaces are expected to dampen the concha modes and interference patterns (effectively smoothing the resulting spectra in the process), lower variance estimates are expected. The frequency-dependent standard deviation for a specific acoustic admittance value of n KEMAR ¼ 0.015 corresponding to the skin reflection coefficient 47 is plotted in Fig. 12. The reduction in the average frequency-dependent standard deviation from Fig. 7 is also shown. Results show a general reduction in the standard deviation with a smoother increase due to less pronounced HRTF spectral features. Although diminished, the peaks and dips in the standard deviation in Fig. 12 match the ones in Fig. 7 while the algorithm-dependent magnitude of the estimates remains in the same order. This shows that frequency-independent absorptive boundaries do only affect the random uncertainty induced by the voxelization algorithm and not the algorithm-induced bias shown in Fig. 11. Figure 12 also shows a higher decrease in standard deviation for the smaller voxel which indicates an increase in the total absorption.

E. Changes with voxel size
The first frequency f r , at which the random error captured in the estimators, reaches a certain dB threshold value is defined. For each batch run, voxelization algorithm and voxel size, f r is estimated through linear interpolation between the closest data points. By pooling the f r data for both ears and all voxelization algorithms, the frequencies where the voxelization-induced random error reaches different threshold values can be estimated using an ordinary least squares prediction as shown in Fig. 13. Although not shown, the frequency-dependent standard deviation plots show a similar sudden increase in variance as can be seen in Fig. 7.
In addition, the f r values are strongly influenced by the bias induced by the voxelization algorithms (similar to Fig. 7), which makes the comparison of the captured variances for each voxel difficult. The f r frequencies increase with lower voxels and higher used thresholds and are lower than f max frequencies. Note that the curves in Fig. 13 corresponding to higher threshold values are less reliable: for example, dispersion is expected to increase. Figure 14 and Table I depict the error between the FDTD-simulated HRTFs (at blocked meatus location) and the CIPIC measurements for the two reported voxel sizes: 2.35 and 2.97 mm. The frequency-dependent error is shown in Fig. 14 where a threshold of 1 dB is considered for f r . The error is small ( 2 dB) and roughly constant up to 4 kHz, after which it starts increasing gradually for most cases and remains at a higher value up to f r . The variance of the reported error is highly similar among all analyzed voxelizations and voxel sizes: it is approximately constant up to 5.3 kHz, after which it starts increasing. The increase in error and the variance of the error above 4 kHz is likely due to the volume bias induced by the voxelization algorithms. The error may also have been inflated by an impedance mismatch. No reliable analysis can be conducted at higher frequencies (above the reported f r frequencies) since the voxelization uncertainty dominates: slightly translating the mesh can easily cause modal shifts and affect the reported error.

F. Comparison with measured HRTFs
The SD calculated up to f max are shown in Table I. Compared to previous studies, 15,16 the reported SD are of similar magnitude. Since the most substantial contribution in error can be seen at high frequencies (Fig. 14), any difference is likely due to voxelization uncertainty. The solid voxelization shows the smallest error for DX ¼ 2.35 mm, however such algorithm generates unrealistic holes in the pinna and most of the difference comes from frequencies above 6 kHz where it was shown that the voxelization process highly influences the result depending on meshgrid alignment X. Moreover, for DX ¼ 2.35 mm, since the frequency bandwidth for SD computation is larger, average results reported in Table I are even more influenced by the mesh-grid alignment X. Figure 14 also provides some evidence that the topological changes caused by voxelization also play a role in the simulated HRTFs: the conservative voxelization has an additional error of approximately 1.5 dB compared to the 6-separating voxelization for DX ¼ 2.97 mm, a value higher than the estimated voxelization standard deviation below f r (see Fig. 7). The differences become insignificant for the smaller voxel (lower plots in Fig. 14). Note also the increase in error above 4 kHz especially for the 6separating voxelization when the voxel size is decreased.

A. Errors introduced by voxelization
There are two types of error introduced by the voxelization process in the simulated HRTFs: the relative position of the mesh causes precision issues in the HRTF spectra (in the present paragraph, accuracy and precision are referred to in a general sense in the context of resulting HRTF spectra, such as the taxonomy by Heffner and Heffner 48 ) that can be treated as a random error, while each voxelization algorithm causes biases in the simulated HRTF spectra due to corresponding topological biases of the resulting voxel-based geometry. The induced random error in the present study is generated by (i) the voxelization process, (ii) the translation, and (iii) the interaction of the two. The last two were shown not to dominate. The random error manifests itself in two ways in the HRTF spectra: first, the peaks and dips in the HRTFs get shifted in frequency (as previously reported 44 ) and second, the spectrum is shifted locally in amplitude (for instance, in Fig. 11). The frequency shifts cause the most severe errors and are the reason for the sudden increase in the captured estimators with frequency due to the onset of local frequency shifts (as seen, e.g., in Fig. 11). The algorithm-dependent bias is caused by a volumetric error of the voxel-based geometry and manifests itself as a general shift in frequency of HRTF spectral features: while the supercover of the conservative voxelization shifts the HRTF features up in frequency, the solid voxelization shifts such features down in frequency. Consequently, the bias depends on the voxel size, as shown in Fig. 11. It is noted that the voxelization bias is solely a volume bias for rigid boundaries. In the case of absorbing boundaries, the voxelization process also induces a surface mismatch 49 that will bias the simulated HRTF magnitudes compared to measurements with similar impedance values. Nevertheless, absorbing boundaries do reduce the amount of random error induced by the voxelization process, as shown in Fig. 12.
A volume bias in simulations compared to measurements is expected even when DX ! 0 due to mismatches between the polyhedron mesh and the real geometry (such as in the present study due to laser scanning artifacts or due to Translation cube X 2 ½ÀDX=2; DX=2Þ 3 . Note that the source location is outside a tragus-helix-lobule plane from DX ¼ 3.58 mm. The f max frequency that corresponds to 12 samples per wavelength is also displayed. In addition, impedance mismatches may also add to the complexity of the comparison. This is why it is hard to quantify the magnitude of the algorithm-dependent volume bias in the present work (as compared to, for example, measurements). Nevertheless, as the voxel size approaches zero, the volume bias should converge to the one induced by the polyhedron mesh utilized.

B. Frequency dependence
Overall, results show that the global standard deviation analyzed up to f max differs slightly with ear and voxelization algorithm for the two analyzed voxel sizes. Differences are accentuated when the bandwidth of analysis is reduced. Possible causes for differences between ears can be asymmetries in the scanned geometry, inconsistent mesh corrections at the left/right concha volume, small number of samples used in estimating the CI, or different source location relative to the ear (especially for DX ¼ 2.97 mm). In addition, analyzing the introduced variance based on frequency shows similar behavior, with a sudden increase in variance at a frequency that generally depends on voxel size, voxelization algorithm, and amount of mesh-grid mismatch.
When quantifying the amount of introduced random error in the simulated HRTFs in Fig. 13, it was found that the first frequency f r , where the random error is limited to various threshold values, varies with voxel size. It is hard to quantify which voxelization algorithm generates less random error since the induced volumetric bias affects the estimated f r frequencies and the comparison should be done for specific HRTF features. Nevertheless, the curves in Fig. 13 could be used as a general indicator of the random error introduced by the voxelization process.
Results generally show that the axis-aligned staircase boundaries induce three frequency regions in the simulated HRTF spectra: at very low frequencies, the simulated HRTFs are insignificantly influenced by the staircase approximation and the simulated HRTF spectra is in the asymptotic range; as the frequency increases the volume and surface bias created by the voxelization algorithm start dominating the simulated HRTF spectra with limited, algorithmdependent random error. Then finally, the random voxelization-induced error dominates the HRTF spectra in the highest frequency region. These regions depend on voxel size, as can be seen from Figs. 11 and 13.
Solid voxelization tends to cause more variance at lower frequencies (i.e., f r ) and less variance at higher frequencies.
It is unclear why the solid voxelization generates less variance at high frequencies-it seems that the d.f. HRTFs are more consistent for this voxelization algorithm compared to the other two (see Fig. 11). Nevertheless, the high probability of holes in the resulting voxel-based geometries may be responsible: some direction-dependent HRTF features at high frequencies may be missing. At low frequencies, the volume bias pushes the random error toward lower frequencies and the estimated translation variance for the solid voxelization in Figs. 10 and 6(c) and 6(d) mostly appears to be higher. The results shown in Fig. 4 also suggest that unless a very small voxel is used, the solid voxelization may generate unrealistic HRTF spectra at high frequencies. In addition, any voxelization algorithm based on the Jordan curve theorem is the most sensitive algorithm to mesh artifacts and additional work is required to clean such artifacts of mesh acquisition.

C. Further analysis
Even in the continuous space, translating the geometry would create small changes in the sound field captured at a fixed location. This also generates additional variance of the sound field that is expected to inflate the calculated estimates. Unfortunately, this variance cannot be quantified and excluded from the current experimental method. It is also unclear how much this variance depends on the magnitude of the translation. Previous studies investigated the changes in the sound field at the entrance of the ear canal due to general head movements, 2 which is not necessarily related to the sound field variance in the concha caused by translations with a fixed receiver. Nevertheless, translation variance is expected especially at high frequencies: the translation problem is similar (yet not identical) to changing source location in HRTF simulations, which has already been reported to cause changes in the magnitude of the simulated field 15 especially at high frequencies. 18 In the context of FDTD simulations, the simulation parameters influencing such translation variance, together with possible interactions and covariances, are unknown. Still, results tend to show that the translation-induced variance and the covariance between translation-induced and voxelization-induced variance does not dominate the reported estimates.
The current framework was designed for a second-order spatial stencil operating on a Cartesian lattice. Other schemes could be studied as long as the voxelization algorithm matches the spatial update of the scheme at the boundary to avoid using pressure values on both sides of the auricle. For example, schemes employed on a non-Cartesian lattice (see for example, the work by Hamilton and Bilbao 50 ) would need a non-cubic voxelization. Nevertheless, any update could be used in the interior of the domain. Since higher-order spatial stencils are generally introduced to reduce dispersion 51 and since voxelization errors are mainly caused by induced geometrical inaccuracies that dominate the HRTF spectra at frequencies with limited dispersion errors, similar results are expected for all schemes that use Cartesian grids as long as no additional error is introduced at the boundary.
Although a decrease in error relative to the measurement is generally expected as the voxel size is decreased, this is not always true as shown for the 6-separating voxelization above 5 kHz in Fig. 14. The bias created by the voxelization algorithm is partly responsible for such an increase. An impedance mismatch is also expected to influence the comparison since (within a finite volume interpretation 20 of the FDTD update) the amount of absorption given by the surfaces of the voxelized geometry can increase with smaller voxel size. 49 The error increase due to a higher impedance mismatch for the lower voxel is supported by the increase in absorption (equivalently, in surface bias) for the smaller voxel in Fig. 12. Although most studies tend to indicate that impedance is not a major factor in the spectra of the HRTFs, 10,18,21,52 some works support, to some extent, the importance of impedance in HRTF spectra both quantitatively 53,54 and perceptually. 55 Since the volume bias is expected to decrease with voxel size, a grid convergence study (such as Roache's Grid Convergence Index 56 ) relevant to the HRTF problem should be developed and conducted to reveal any voxelization bias. Although convergence is expected for the volume bias, the error created by a surface bias cannot be eliminated 49 for absorptive boundaries. A superior modeling technique for the boundaries such as the fitted boundary cells 20 may provide a general solution to both topological biases. Further validation of the simulation with such boundaries over an unstructured grid is needed for a complex geometry such as the human pinna. Furthermore, it is hypothesized that the voxelization process is not such an important factor in simulations with simpler geometries and rigid boundaries. However, more complex geometries are expected to be affected to a greater extent by the voxelization process, as long as the voxel size is comparable to their topological characteristics.

VI. CONCLUSION
In the context of an FDTD HRTF simulation and staircase-like boundaries, results show that for complex geometries, errors are introduced in the FDTD-simulated sound fields solely due to the voxel-based, axis-aligned boundaries at frequencies lower than assumed accuracy limits.
Results show that a staircase approximation of the pinna boundaries can be highly problematic for the simulated highfrequency HRTF spectra. Both volume and surface errors are introduced by the voxelization process, which in turn bias the resulting spectra. In addition, the exact mesh-grid alignment adds random error to such bias in the HRTF spectra, dominating the simulated sound field at higher frequencies.
Considering the analyzed voxelization algorithms, the solid voxelization is the most problematic since it has a high probability of creating holes for voxel sizes greater than 1 mm. Conservative and 6-separating algorithms provide voxelizations that are more suitable for HRTF simulation but none of them is ideal as they all introduce some bias and shift the high HRTF spectra in frequency.