Spheroidal Ambisonics: a Spatial Audio Framework Using Spheroidal Bases

Ambisonics is an established framework to capture, process, and reproduce spatial sound fields based on its spherical harmonics representation. We propose a generalization of conventional spherical ambisonics to the spheroidal coordinate system and spheroidal microphone arrays, which represent sound fields by means of spheroidal wave functions. This framework is referred to as spheroidal ambisonics and a formulation for the case of prolate spheroidal coordinates is presented. Spheroidal ambisonics allows analytical encoding of sound fields using spheroidal microphone arrays. In addition, an analytical conversion formula from spheroidal ambisonics to spherical ambisonics is derived in order to ensure compatibility with the existing ecosystem of spherical ambisonics. Numerical experiments are performed to verify spheroidal ambisonic encoding and transcoding when used for spatial sound field recording. It is found that the sound field reconstructed from the transcoded coefficients has a zone of accurate reconstruction which is prolonged towards the long axis of a prolate spheroidal microphone array.


Introduction
Immersive multimedia technologies such as augmented reality (AR) and virtual reality (VR) are receiving much attention recently. Audio is an indispensable factor in such modes of multimedia, and it is essential to be able to capture, process, and render spatial sound fields with high precision for presentation of plausible AR/VR and the creation of immersive experiences. The spatial audio framework of ambisonics [1] as well as higher-order ambisonics (HOA) [2] is receiving much attention due to the popularization of AR/VR devices as well as the ability to stream this representation using standard platforms [3,4], and its high compatibility with first-person view AR/VR. Ambisonic spatial audio capturing and processing consists of a microphone array and designated signal processing algorithms that are used to encode the raw microphone array signal to the spherical harmonics-domain spatial description format, which is referred to the ambisonic signal. This ambisonic signal is decoded to the signal which is fed to loudspeaker arrays to render the spatial sound field. Such loudspeaker arrays can also be virtualized by means of binaural technologies [5,6,7] and played back using headphones. Hence the high compatibility of ambisonics with AR/VR applications that usually resort to binaural transducers for audio playback.
Due to its formulation in the spherical harmonics-domain, the most natural implementation of ambisonic recording devices is employing spherical microphone arrays [1,2,8]. In this work, we generalize the framework of ambisonics into spheroidal coordinates and define spheroidal ambisonics, which uses spheroidal wave functions for the representation of spatial sound fields. A formulation for the case of prolate spheroidal coordinates is presented, allowing the use of prolate spheroidal microphone arrays in an analytical manner in contrast to a recently proposed approach which allows arbitrary shaped microphone arrays but relies on numerical simulation to encode the captured field [9]. In addition, an analytical conversion formula from spheroidal ambisonics to spherical ambisonics is derived. This conversion ability is important to utilize the existing ecosystem around spherical ambisonics after recording the spatial audio with a spheroidal microphone array. The overview of the proposed schemes of spheroidal ambisonic encoding and transcoding is shown in Fig. 1. Numerical experiments are performed to validate and demonstrate spheroidal ambisonic encoding and transcoding when used for spatial sound field recording.

Background: spherical ambisonics
The conventional framework of ambisonics, which is referred to as spherical ambisonics, or, simply ambisonics, is briefly reviewed here. Ambisonic encoding and decoding can be performed by either relying on solving a linear system using least squares [2] or relying on spherical harmonic transformation using numerical integration [10]. Since the first approach allows more flexibility of the microphone array configuration, this approach is adopted in this paper. Throughout this paper, only microphone arrays mounted on surfaces of rigid scattering bodies are considered. This is a commonly used approach to avoid the instability arising in encoding filters for hollow microphone arrays due to singularities originating from the roots of the spherical Bessel function [2]. In this paper, all formulations are presented in the frequency-domain, which can be converted into a time-domain representations by inverse Fourier transform, if necessary.
The spherical harmonics used in this paper are defined as the following.
with θ and ϕ the polar and azimuthal angle, respectively, and P m n (x) the associated Legendre polynomials: with the Legendre polynomials: The above definition of spherical harmonics provides an orthonormal basis: with δ ij the Kronecker delta.

Encoding in spherical ambisonics
The process of obtaining the ambisonic signal A m n , the weights of the spherical basis functions of the three dimensional sound field representing an arbitrary incident field to the microphone array, from the signal captured by the microphone array is referred to as ambisonic encoding.
An arbitrary incident field to the spherical microphone array mounted on a rigid sphere with radius R and located at O, the origin of the spherical coordinate system (r, θ, ϕ), can be expanded in terms of the regular spherical basis functions j n (kr)Y m n (θ, ϕ) of the three-dimensional Helmholtz equation: with j n (x) the spherical Bessel function of degree n and k the wavenumber. The total field p tot , which is the sum of the incident field and the scattered field is given by: with h n (x) the spherical Hankel function of the first kind with degree n. On the surface of the rigid sphere, i.e. r = R, this total field is evaluated as: The total field captured by the q-th microphone located at (R, θ q , ϕ q ) is therefore given by: By truncating the infinite series with n max ≡ N , this result can be represented in the following vector form: where p tot is a vector holding p (q) tot in its q-th entry (in this paper, indices are 0-based), A is a vector holding A m n (k) in its (n 2 + n + m)-th entry, and Λ • is the "inverse" encoding matrix for rigid sphere microphone arrays which is a matrix holding i (kR) 2 h n (kR) Y m n (θ q , ϕ q ) in its (q, n 2 + n + m) entry. The goal of ambisonic encoding was to obtain A m n (k) from the observation p tot . Typically, this problem is solved by regularized least squares with a minimization objective: with σ a regularization parameter, and the solution given by: where It is useful to explicitly write down the ambisonic coefficients representing some canonical fields, e.g. plane waves. A plane wave with a wave vector in spherical coordinates (k, θ i , ϕ i ) is given by: Hence, the ambisonic coefficients A m n for this plane wave is given by:

Formulation of spheroidal ambisonics
The fact that the three-dimensional Helmholtz equation is separable in the spheroidal coordinate system allows us to formulate spheroidal ambisonics. In this section, the definition of prolate spheroidal coordinates, the definition of spheroidal ambisonic coefficients, the solution of the scattering problem for an arbitrary incoming wave with a rigid prolate spheroid and encoding is presented.

Spheroidal coordinates
While there are two types of spheroidal coordinates, namely the prolate and oblate spheroidal coordinates, only the formulation for the case of prolate spheroidal coordinates and prolate spheroidal ambisonics is presented in this paper. The case of oblate spheroidal coordinates should be able to derive in a similar fashion and could be addressed elsewhere.
The definition of prolate spheroidal coordinates itself has some variations [11]. In this paper, the definition also used in [12] is employed. The prolate spheroidal coordinate system has three coordinates ξ, η, and ϕ, which is also characterized by the parameter a, where 2a is the distance between the two foci of the prolate spheroid. The domain of ξ and η is ξ ≥ 1 and |η| ≤ 1, respectively. The conversion with the Cartesian coordinates (x, y, z) is given by: The long radius r long and short radius r short of a prolate spheroid is related with a and ξ 1 by:

Scattering of an arbitrary incident wave by a soundhard prolate spheroid
An arbitrary incident wave can be expanded using radial spheroidal wave functions R mn and angular spheroidal wave functions S mn [11]: The spheroidal ambisonic coefficients are defined as the collection of the {A mn , B mn } coefficients. A canonical example of an incident wave is a plane wave p pw in = e ik·r with a wave vector represented in the Cartesian coordinates: with k the wave number. The incident plane wave can be expanded as: The total field after scattering an arbitrary incident field characterized by {A mn , B mn } is then given by: × S mn (c, η) (A mn cos mϕ + B mn sin mϕ) .
On the surface of the spheroid, i.e. ξ = ξ 1 , by using the Wronskian relation (1,2) , the total field can be written as:

Spheroidal ambisonics encoding
The goal of spheroidal ambisonic encoding is to estimate the spheroidal ambisonic coefficients from observations by a limited number of microphones mounted on the surface of a spheroid-shaped baffle. As mentioned earlier, it is assumed here that the baffle is a sound-hard prolate spheroid.
By truncating the expansion order by N > 0, (23) can be rewritten in vector form: where q the sensor index. p tot is a vector holding p is L = (N + 1) 2 unknowns, which is the same as the total number of spherical ambisonics coefficients {A m n } with maximum order N . Λ (P) ≡ SR is referred to as the "inverse" encoding matrix for sound-hard prolate spheroidal ambisonics.
The unknowns A mn and B mn can be estimated from observations of the sound field with multiple sensors mounted on the spheroidal baffle, by solving (24) with least squares. This process is referred to as spheroidal ambisonics encoding. The regularized least squares solution is given by: with σ a regularization constant and E (P) ≡ (Λ (P) H Λ (P) + σI) −1 Λ (P) H the encoding matrix for sound-hard prolate spheroidal ambisonics.

Transcoding from spheroidal to spherical ambisonics
The sound field encoded as a spheroidal ambisonics signal can be converted into a conventional spherical ambisonics representation. This process is referred to as transcoding. The following relation connecting spheroidal wave functions and spherical Bessel functions and associated Legendre polynomials [11]: can be utilized for the derivation of the transcoding formula, where d mn r (c) are the expansion coefficients: It can be shown that the analytical transcoding formula from spheroidal ambisonics coefficients {A mn , B mn } to spherical ambisonics coefficients A m n is given as the following: Here, % is the modulo operator and d m,n r are the expansion coefficients as defined in [12].

Experimental evaluation
Prolate spheroidal ambisonic encoding as well as its transcoding into spherical ambisonics was validated by numerical experiments. Encoding and transcoding of a plane wave with three different incident angles was performed with a soundhard spherical microphone array as well as a sound-hard prolate spheroidal microphone array. The spherical array had a radius of 0.198 m. The prolate spheroidal microphone array had r short = 0.05 m and r long = 1 m. The arrays were designed to have the same surface area and both had 512 microphone capsules located on a grid of Gauss-Legendre quadrature nodes for θ and η and equispaced for ϕ. The long axis of the prolate spheroidal array was set parallel to the x-axis. Fig. 1 shows the experimental procedure and the two microphone arrays used for the experiments. Spherical and spheroidal ambisonic encoding was performed using (11) and (26), respectively. Computation of the coefficient tables of spheroidal wave functions were performed using the software library Spheroidal [12]. The truncation order was set to N = 12 for both spherical and spheroidal ambisonics. The regularization parameter σ was set to zero for both spherical and spheroidal encoding, i.e. no regularization was applied. Transcoding from spheroidal ambisonics to spherical ambisonics was performed using (29), truncated for n ≤ N . The estimated incident field for the encoded spherical ambisonic coefficients was reconstructed and compared to the ground truth incident field. The reconstruction of the estimated incident fields was performed using (5) truncated for n ≤ N . The signal-to-distortion ratio (SDR) of the reconstructed fields was computed for evaluation points in the x−y plane. The region with SDR higher than 30 dB was considered as the sweet-spot of accurate reconstruction. Fig. 2, Fig. 3, and Fig. 4 shows the results for incident waves with normalized wave vectors, expressed in the Cartesian coordinates, of (1, 0, 0), (0, 1, 0), and ( √ 2 2 , √ 2 2 , 0), respectively. The frequency of the incident wave was 541.8 Hz. It can be observed that the width of the sweet-spot of precise reconstruction in spheroidal ambisonics is shorter in the shorter axis of the spheroid, but longer in the longer axis of the spheroid, compared to the width in the baseline spherical ambisonics case. This asymmetry of the sweet-spot shape could be useful in some applications, in which a non-spherical sweet-spot is desired. An example application is sound field reproduction for multi-person home-theater systems in which the sweet-spot should cover multiple listeners sitting next to each other.

Conclusion
The framework of spheroidal ambisonics, a natural extension of ambisonics into spheroidal coordinates, was proposed. Spheroidal ambisonics enables analytical encoding of the spatial sound field into spheroidal ambisonic coefficients using spheroidal microphone arrays. An analytical transcoding formula from spheroidal ambisonics into conventional spherical ambisonics was derived, in order to ensure compatibility with the existing software ecosystem around spherical ambisonics. The numerical experiments demonstrated that the sweet-spot of reconstruction in spheroidal ambisonics has an asymmetric shape which is prolonged towards the longer axis of the prolate spheroidal microphone array, realizing non-spherical sweet-spots in ambisonic reconstruction, which could be useful in some applications. The case of oblate spheroidal microphone arrays can be derived in a similar fashion and will be published elsewhere. A recently proposed microphone array for three-dimensional ambisonics recording, which uses a sound-hard circular disc as the scattering body [13], can be seen as a special case of an oblate spheroidal ambisonic microphone array. Another future research topic is the optimization of the microphone capsule configuration on the spheroid. In a practical setup, care must be taken for spatial aliasing [14] and a careful design of the microphone array configuration is important. While the subject of optimizing the microphone array configuration for spherical arrays has been studied extensively in the past [15,16], optimization of the array configuration in the case of spheroidal microphone arrays requires further re- Figure 2: Results for an incident plane wave travelling along the long axis of the spheroidal array, which is set parallel to the x-axis. The first row from left to right: the real part of the sound pressure of the ground truth incident field, the field reconstructed from spherical ambisonic (HOA) coefficients, and the field reconstructed from the prolate spheroidal ambisonic (ps-HOA) coefficients transcoded to spherical ambisonic coefficients. The second row presents the SDR of the reconstructed fields for HOA (left) and ps-HOA (right). The region with SDR higher than 30 dB was considered as the sweet-spot and is colored in red.  search.