Displaying bioacoustic directional information from sonobuoys using “azigrams”

using “azigrams” Aaron M. Thode, Taiki Sakai, Jeffrey Michalec, Shannon Rankin, Melissa S. Soldevilla, Bruce Martin, and Katherine H. Kim Marine Physical Laboratory, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093-0238, USA Lynker Technologies, LLC, under contract to the Southwest Fisheries Science Center, NMFS/NOAA, La Jolla, California 92037, USA Southwest Fisheries Science Center, NMFS/NOAA, La Jolla, California 92037, USA Southeast Fisheries Science Center, NMFS/NOAA, 75 Virginia Beach Drive, Miami, Florida 33149, USA JASCO Applied Sciences, 32 Troop Avenue, Suite 202, Dartmouth, Nova Scotia, B3B 1Z1, Canada Greeneridge Sciences, Inc., 90 Arnold Place, Suite D, Santa Barbara, California 93117, USA

The AN/SSQ-53 Directional Frequency Analysis and Recording (DIFAR) sonobuoy is an expendable device that can derive acoustic particle velocity along two orthogonal horizontal axes, along with acoustic pressure. This information enables computation of azimuths of low-frequency acoustic sources from a single compact sensor. The standard approach for estimating azimuth from these sensors is by conventional beamforming (i.e., adding weighted time series), but the resulting "cardioid" beampattern is imprecise, computationally expensive, and vulnerable to directional noise contamination for weak signals. Demonstrated here is an alternative multiplicative processing scheme that computes the "active intensity" of an acoustic signal to obtain the dominant directionality of a noise field as a function of time and frequency. This information is conveniently displayed as an "azigram," which is analogous to a spectrogram, but uses color to indicate azimuth instead of intensity. Data from several locations demonstrate this approach, which can be computed without demultiplexing the raw signal. Azigrams have been used to help diagnose sonobuoy issues, improve detectability, and estimate bearings of low signal-to-noise ratio signals. Azigrams may also enhance the detection and potential classification of signals embedded in directional noise fields. V C 2019 Acoustical Society of America. https://doi.org/10.1121/1.5114810 [KGS] Pages: 95-102

I. INTRODUCTION
A sonobuoy is an expendable device that can transmit acoustic data from a hydrophone to a nearby platform, typically an aircraft. Although the concept was first developed during World War I, the first wide-spread use of the technology occurred during World War II, and by 1945, the US Navy had ordered 150 000 sonobuoys and 7500 receivers (Holler, 2014). The first buoys simply used an omnidirectional hydrophone, but as early as 1943, engineers were designing mechanically rotating directional hydrophones that used a gravity motor to spin 3-5 times a minute down a fishing line in order to measure the direction from which acoustic signals arrived. The spiritual descendants of these prototypes, the AN/SSQ-1 and AN/SSQ-20 (derived from a British design), were deployed in the early 1950s. In 1954, Bell Telephone Labs built and unsuccessfully tested the first sonobuoys with orthogonal pressuregradient hydrophones in an attempt to eliminate the need to mechanically rotate the hydrophone to obtain directional information. Between 1965 and 1969, the first AN/SSQ-53 Directional Frequency Analysis and Recording (DIFAR) sonobuoy was developed, which obtains directionality by deriving acoustic particle acceleration along two orthogonal horizontal axes (v x and v y ), along with a pressure component (p) from an omnidirectional sensor. The DIFAR sonobuoy is a workhorse of the current anti-submarine warfare (ASW) fleet and the subject of this manuscript.
While the Navy has used DIFAR sonobuoys extensively since the late 1960s, their first published use for oceanographic research in the open literature occurred two decades later, measuring the directionality of acoustic noise from coastal surf (Wilson et al., 1985). Beginning in the 1990s, their use by civilian researchers expanded further as surplus sonobuoys from the U.S. Navy began to be used by marine bioacousticians to detect and track baleen whales (e.g., D'Spain et al., 1991;Thode et al., 2000;Greene et al., 2004;McDonald, 2004;Miller et al., 2015). At present, surplus sonobuoys are being used to study baleen whales in the Arctic and Antarctic Ocean, the Pacific Ocean, and the Gulf of Mexico.
DIFAR sonobuoys combine the three data streams (p, v x , v y ) into a single broadband heterodyned signal before transmitting the signal to ship, shore, or plane, where the signal is then converted back into p, v x , and v y . The signal processing methods many bioacousticians currently use to process DIFAR data have changed little over 50 years. A spectrogram is made of the omnidirectional channel data (p), which has a bandwidth of around 3-4 kHz (depending on signal intensity). A bioacoustic user manually selects a "bounding box" around a transient signal of interest, and the three time series are then bandpass filtered and trimmed into short signal segments. The segments are added together in a weighted sum analogous to conventional beamforming (McDonald, 2004;D'Spain et al., 2006): where u, or the "steering angle," is a hypothesized azimuth of an arriving plane wave signal, typically defined as increasing clockwise relative to the internal x axis of the sensor, consistent with a geographic azimuth (i.e., 0 points to true or geodetic north, and 90 is geodetic east.) The free-space impedance Z 0 is a conversion factor that ensures that all three time series share the same units and scaling. Since the relationship between acoustic pressure and particle velocity for an acoustic plane wave is v ¼ p/qc, where q and c are the respective density and sound speed of the fluid medium, a common value of Z 0 is typically qc. Equation (1) can also be computed in the frequency domain. Different beam patterns can be generated using different values for Z 0 , allowing tradeoffs between beampattern directivity and sidelobe ambiguity. When evaluated as a function of azimuth, Eq.
(1) generates a cardioid beampattern whose output is maximized whenever the steering angle matches the true arrival azimuth of the signal. This maximization requires evaluating Eq. (1) over numerous angles, a computationally cumbersome process. Furthermore, if a weak transient signal is embedded in a directional ambient noise field, Eq. (1) can yield incorrect bearings when computed in the time domain, particularly if the bioacoustic signal of interest is a frequency-modulated (FM) sweep, which is a common form of baleen whale call. Drawing a bounding box around an FM up-or down-sweep incorporates a lot of background noise into the signal, even if bandpass filtering is used.
Much of this approach is a legacy from an era when signal processing was performed in military hardware due to limitations in computer processing speed. Here we present an alternative approach to computing and displaying DIFAR data that takes place in near-real time and processes all timefrequency cells of a spectrogram, obviating the need to select bounding boxes. This approach is called an "azigram" and is analogous to a conventional spectrogram. Azigrams use an alternative multiplicative approach to computing bearing, first formulated by (Mann et al., 1987;Fahy and Salmon, 1990) and applied to ocean acoustic data by (D'Spain et al., 1991;Greene et al., 2004;D'Spain et al., 2006). While difficult to implement directly in hardware, azigrams can be easily implemented in modern software, yielding a capability to quickly estimate the dominant arrival azimuth of every timefrequency bin in a spectrogram. The additive approach of Eq. (1) can also be used to generate azigrams, but the multiplicative approach detailed here yields direct bearing estimates with much less computational burden. The benefits of azigrams are similar, whether they are generated by additive beamforming or a multiplicative approach.
Azigrams have been extensively used by underwater acousticians studying acoustic vector sensors, particularly for anti-submarine warfare (ASW) applications, and commercial software exists for such applied applications (e.g., the "TruView" software suite by GeoSpectrum, Inc.) However, azigram use by bioacousticians is not widespread, with several exceptions (Miksis-Olds et al., 2018). Therefore, the goal of this paper is to illustrate the numerous advantages of this alternative representation of directional bioacoustic data for those not familiar with the technique.
Section II discusses how to demultiplex a DIFAR signal in software, compute azimuths from the active intensity, and generate an azigram. Section III provides illustrative examples of azigrams and highlights useful applications of this kind of plotting, including diagnosing equipment issues, improving azimuth estimation for low signal-to-noise (SNR) frequency-modulated (FM) sweeps, and potentially enhancing the performance of simple automated detectors.

A. Demultiplexing in frequency domain
Let s(t) be the multiplexed time series received from a DIFAR sensor. The omnidirectional component is defined as p(t), and the two orthogonal particle velocity components are v x (t) and v y (t), respectively. Modern DIFAR sensors have built-in compasses that allow the latter two channels to be mapped relative to magnetic north, with x indicating a magnetic north-south axis and y indicating an east-west axis. The frequency regime below 7.5 kHz represents p(t), which is extracted through simple low-pass filtering. At frequencies above 7.5 kHz the two velocity time series are multiplexed together using Quadrature Amplitude Modulation (QAM) (Grado, 1988;Delagrange, 1992;Holler, 2014), where f 0 ¼ 15 kHz is the analog carrier frequency used to generate the QAM output. The spectrum of s(t) is then where S 0 (f,T) is the Fast-Fourier Transform (FFT) of s(t) computed over a particular time window T. The values of the FFT are complex numbers, so every frequency bin in the FFT spectrum contains a real and an imaginary number, which can be used to encode two separate signal spectra.
[Only positive values of f are required for de-multiplexing; the negative spectrum-negative values of f-is simply the complex conjugate of the positive spectrum, since Eq. (3) is the FFT of a real-valued signal]. However, QAM requires a pilot frequency to encode a reference phase in order to recover the two spectra. In a DIFAR sonobuoys, this "phase pilot" is embedded at 15 kHz, with an additional "frequency pilot" at 7.5 kHz, which exists to help locate the phase pilot (e.g., Doppler shifts caused by moving aircraft receivers). Sonobuoy receivers typically employ a phase-locked loop circuit in order to reconstruct a high-quality version of the phase pilot (Anonymous, 1983;Grado, 1988;Delagrange, 1992), but for strong clean signals, the multiplexing can be conducted in the frequency domain. Let f 15 be the discrete frequency that yields the highest spectral power between 14.5 and 15.5 kHz (in principle, f 15 ¼ f 0 , but when sampling data with a short FFT length, the bin value of f 15 will not be exactly 15 kHz). The phase can then be used to adjust the phase of the other bins of the multiplexed signal, yielding a phase-corrected S, It is convenient to define frequency indices f þ ¼ fÀf 15kHz and f À ¼ f 15kHz Àf when defining the expressions that reconstruct v x and v y from the QAM signal where Re stands for the real component of the spectrum in that frequency bin, and Im stands for the imaginary component.
The values used for f þ will be drawn from the 15 to 22.5 kHz portion of the multiplexed spectrum, while values used for f À will span from 7.5 to 15 kHz. Thus, in order to recover a f ¼ 1 kHz component of the DIFAR signal, one uses the frequency bin associated with f À ¼ 14 kHz and f þ ¼ 16 kHz.

B. Computing dominant directionality of active intensity
An alternative to Eq. (1) for estimating azimuth is to estimate the "active intensity" I of an acoustic field (Mann et al., 1987;Fahy and Salmon, 1990;D'Spain et al., 1991;D'Spain et al., 2002;Dall'Osto et al., 2012), which involves multiplying the DIFAR time series together, instead of the addition proscribed by Eq. (1). The active intensity of the acoustic field at sample time T and frequency f is a vector quantity that is conveniently computed by measuring the inphase product of p and v k , where the index k represents either the x or y direction, The symbol "*" indicates a complex conjugate. The azimuthal estimate of the signal at time T and frequency f relative to the y coordinate (North) becomes simply a measurement of the angle of the resulting vector, The use of the form tan À1 (x/y), as opposed to tan À1 (y/x), produces an azimuth that follows the geographic convention (u increasing clockwise from the y axis) instead of the mathematical convention (u increasing counterclockwise from the x axis).
Results can be improved by averaging samples of the active intensity over a short time window T 0 , in order to reduce the variance of each active intensity component, just as a periodogram averages Fourier Transform samples to reduce the variance of a spectral estimate (Oppenheim and Schafer, 1989).
The function uðf ; TÞ can be plotted as an image with respect to time and frequency, with the dominant bearing represented as a color, typically on hue, saturation, value (HSV) color scale, so that no color discontinuity exists between 0 and 360 . This particular representation of displaying a signal is defined as an azigram for the rest of this paper, as its coordinates are the same as a typical spectrogram. The advantages of using this type of display will be the focus of the rest of this paper. We note here that other researchers have merged the concepts of spectrogram and azigram by making the azigram pixel transparency (e.g., the alpha channel of an HSV display) proportional to the signal intensity (Miksis-Olds et al., 2018); however, in the following examples, we simply use the azigram without transparency adjustment.   Fig. 1(a). The seismic survey generates airgun impulses visible between 10 and 175 Hz in roughly 10 s intervals, beginning at 5 s. Electrical interference from what may be a ship automatic identification system (AIS) is also visible at ten-second intervals beginning at 8 s. Figure 1(b) shows the corresponding azigram of the same time window and bandwidth, after averaging 0.5 s (three FFT samples) of data. The probable whale call shows a bearing between 105 and 115 azimuth, while the seismic airgun impulses show azimuths between 220 and 240 . The image also shows how the seismic reverberation between the individual airgun pulses shares the same dominant directionality with the pulses, and that the influence of this reverberation on the ambient noise extends beyond 100 Hz, even though the spectrogram only indicates a substantial reverberation presence between 10 and 50 Hz. The ambient noise field above 200 Hz displays no consistent dominant directionality, except for the 360 Hz tone arriving from the Gunter. Figure 1(b) thus provides an example of how bearings from multiple simultaneous transient and continuous acoustic data can be viewed quickly using an azigram. Figure 2 demonstrates how azigrams can rapidly diagnose issues with sonobuoy deployments. Figure 2(a) shows 30 s of spectrogram data collected by a sonobuoy deployed within 1 km of the NOAA R/V Reuben Lasker near 18.7 N, 158.27 W. The goal of this deployment was to determine whether vessel-generated noise from this relatively quiet ship could be detected and used as a bearing calibration reference for sonobuoys. While the spectrogram in Fig. 2(a) shows no obvious noise contamination, attempts to use conventional beamforming to locate the Lasker did not give consistent results. The azigram generated from the same data in Fig. 2(b) reveals the problem: the dominant noise directionality across all frequency bands randomly fluctuates over time scales of a few seconds. In comparison, the random noise directionality in the working sonobuoy in Fig. 1(b) is steady over time but varies with frequency. One possible explanation for this observation is that the DIFAR sensor is spinning; however, a spinning sensor would show a regular and repeated progression of directions over relatively short time scales, while the fluctuation pattern here shows more random shifts in direction. We interpret the result as showing a sensor entangled in its own deployment cable, so that the sensor is tilted at a 90 angle relative to the vertical. The vertical movement of the sensor from ocean waves on the elastic deployment line gently rocks the sensor over timescales of a few seconds, resulting in the shifting bearing directionality and uniform bearing regardless of the specific frequency.

B. Diagnosing deployment problems
Figure 2(c) shows data analyzed from another, presumably untangled, sonobuoy, which demonstrates that the Lasker, despite being a relatively quiet NOAA vessel, produces sufficient noise over the entire 3 kHz bandwidth to be identified at $125 azimuth. The figure thus illustrates how in principle this vessel can be used to calibrate the sonobuoy orientation and even location (using the bearing of the vessel and received level estimates).

C. Enhancing accuracy of bearing measurements of weak FM signals
This section examines a more quantitative advantage of using azigrams over conventional cardioid beamforming. Since an azigram displays the dominant directionality for each time/frequency cell in a spectrogram, one need only to select the time/frequency cells that are dominated by signal, while rejecting cells that contain only interfering noise. By contrast, standard beamforming requires processing to be conducted on a "bounding box" drawn around the signal in the spectrogram, and thus can incorporate substantial amounts of interfering noise, particularly if the signal of interest is an FM sweep. We hypothesized that azigrams would provide a more accurate measurement of the true azimuth of broadband FM sweeps with low SNRs, a type of sound often generated by baleen whales.
We tested this hypothesis using data from controlled broadcasts of signals off the San Diego coast on June 7, 2016. A series of 12 playbacks were conducted from a small boat around a set of four sonobuoys deployed in a square, with $ 2 km between each sonobuoy. Playback signals consisted of synthesized sounds with different spectral FM slopes and signal durations. Each signal was repeated three times, with the second and third signals broadcast with source levels 3 and 10 dB greater than the first broadcast. Figure 3(a) shows a spectrogram of a sequence of playbacks with the source depth an estimated 3 m, and the sonobuoy hydrophone deployment depth set to 30 m. The true position of each sonobuoy was measured by attaching SPOT GPS trackers. The azigram [ Fig. 3(b)] reveals a highly directional background noise field, due to broadband engine noise, and shows that the playback signals display consistent bearings, even from the weak signals generated at 6, 21, and 24 s. Also noticeable are tones from the deploying motorboat (visible between 1.5 and 1.7 kHz from 0 to 15 s), and what may be a low-frequency dolphin sound between 2.5 and 2.7 kHz at 25 s. In many cases, azigram time-frequency bins associated  with the signals can be determined without referencing the original spectrogram, as the collection of azigram points associated with the signal reproduces the original time-frequency modulation of the signal. For weak signals, one has to use a priori information from the spectrogram to locate the appropriate time-frequency bin in the spectrogram.
For each playback reception, the SNR of each sound was computed by first measuring the root-mean-square (rms) pressure of the signal and noise encompassed by a bounding box on a spectrogram. The rms noise pressure was then measured from a time window preceding the playback sequence, using a window duration and bandwidth that matched that of the selected signal. The signal azimuth was estimated two ways: first, by applying bandwidth-filtered data (with the bandwidth defined by the bounding box selected from a spectrogram) to Eq. (1); and second, by creating an azigram and then selecting the azimuth associated with the time/frequency cell associated with the greatest received level within the spectrogram bounding box. Thus the same bounding box was used for all three measurements, but for the case of the azigram only a single time-frequency bin within the bounding box was used to estimate the azimuth, and that bin was selected by finding the bin with the maximum acoustic signal received level in the associated spectrogram. Figure 4 plots the resulting azimuthal error from 1223 playback measurements across the four sonobuoys, generated by computing the mean and median error of sample subsets that lie within 2 dB SNR of each other. The azimuthal error is defined here as the difference between the measured and true azimuth of the playback relative to the receiving buoy; the error is negative whenever the estimated azimuth is greater than the true azimuth. The top subplot shows the standard deviation of the error around the mean for each 2 dB SNR bin (with each bin containing roughly 100 samples), while the bottom subplot shows the median, 25th and 75th percentiles of the error for each bin. Both plots show that the error statistics for both azigrams and the standard DIFAR approach [Eq. (1)] are similar for high SNR signals, with biases near zero, standard deviations on the order of 615 , and a 25th to 75th percentile spread of 10 .
As the SNR falls below 7 dB, the error distributions of both methods diverge. The azigram errors maintain a bias and spread similar to those of the high SNR signals. By contrast, conventional bearing methods using Eq. (1) yield errors with negative biases that approach À10 to À20 , depending on the metric used. The negative bias arises from the increased influence of the directional noise background visible in Fig. 3(b), which is generated by other vessel traffic and arrives from 215 azimuth, larger than the true azimuth of roughly 170 . The spread of the errors from conventional methods also rises prominently at low SNR values. These results are consistent even when medians and error percentiles are used, which removes the effects of outliers on the statistics. sounds, provided they are recorded in the presence of a directional noise field whose directionality differs from that of the signal(s) of interest. Many detectors currently applied to spectrograms in the bioacoustic literature can be adapted to work directly on azigrams. The simplest practical detector is the classic "energy" detector applied to an "equalized" spectrogram, wherein a time-averaged estimate of the background noise spectrum over a predefined bandwidth is subtracted from an incoming spectrum over the same bandwidth (Van Trees, 1970). The sum of the resulting square-magnitude spectral outputs then defines a detection function for new transient signals. An analogous detector can be developed for an azigram, which generates a moving average of the dominant background directionality vs frequency, where a is a smoothing constant that determines how rapidly the background noise estimate evolves, and an overhead line designates an averaged quantity. The detection function is then generated by the following expression: For a given frequency, the function D will be near zero if incoming azigram values are similar to the background directionality, but will spike if a transient signal arrives from a different direction, with the magnitude of the spike dependent on the azimuthal difference between the signal and noise. Figure 5(c) shows a comparison between a simple spectrogram energy detector output and the analogous azigram function [Eq. (10)] from the same dataset shown in Fig. 3, when integrated between 700 and 1300 Hz. The ratio between the peak detector output and the background detector "noise" is $1.8 for the energy detector, and 3.75 for the azigram detector. If a threshold of 0.5 is used on the scaled detector, then the energy detector has five false detections, while the azigram detector has none. While this example may be an extreme case of a directional noise field, the result hints that a weighted or other adaptive combination of spectrogram and azigram detectors might enhance detection and potentially classification performance in certain ambient noise environments.

IV. CONCLUSION
DIFAR sonobuoy data can be demodulated and processed in the frequency domain, and then used to compute the active intensity of all frequency components of an acoustic field, which in turn allows azigrams to be computed efficiently. Although not implemented here, real-time computations of azigrams are likely possible, just as scrolling spectrograms have become standard in popular passive acoustic monitoring packages today (Gillespie et al., 2009). Azigrams have already proved useful in interpreting complex soundscapes, diagnosing sonobuoy issues, and enhancing azimuthal estimates for weak signals. They also show some promise in aiding detection and possibly classification efforts of transient signals in directional noise environments. As the use of DIFAR sonobuoys continues to spread among the civilian oceanographic community, no doubt other applications and other processing approaches from the ASW community will continue to be discovered and further developed for bioacoustics applications. For example, one co-author (Martin) has identified a way to combine spectrograms and azigrams into one display by making the transparency of an azigram pixel proportional to signal intensity, so that a viewer can get some rough idea of signal intensity simultaneously with bearing.