A sparsification and reconstruction strategy for compressed sensing photoacoustic tomography.

Compressed sensing (CS) is a promising approach to reduce the number of measurements in photoacoustic tomography (PAT) while preserving high spatial resolution. This allows to increase the measurement speed and reduce system costs. Instead of collecting point-wise measurements, in CS one uses various combinations of pressure values at different sensor locations. Sparsity is the main condition allowing to recover the photoacoustic (PA) source from compressive measurements. In this paper, a different concept enabling sparse recovery in CS PAT is introduced. This approach is based on the fact that the second time derivative applied to the measured pressure data corresponds to the application of the Laplacian to the original PA source. As typical PA sources consist of smooth parts and singularities along interfaces, the Laplacian of the source is sparse (or at least compressible). To efficiently exploit the induced sparsity, a reconstruction framework is developed to jointly recover the initial and modified sparse sources. Reconstruction results with simulated as well as experimental data are given.


Introduction
Photoacoustic tomography (PAT) is a non-invasive hybrid imaging technology, that beneficially combines the high contrast of pure optical imaging and the high spatial resolution of pure ultrasound imaging (see [4,31,33]). The basic principle of PAT is as follows (see Fig. 1). A semitransparent sample (such as a part of a human patient) is illuminated with short pulses of optical radiation. A fraction of the optical energy is absorbed inside the sample which causes thermal heating, expansion, and a subsequent acoustic pressure wave depending on the interior absorbing structure of the sample. The acoustic pressure is measured outside of the sample and used to reconstruct an image of the interior. (c) Figure 1: (a) An object is illuminated with a short optical pulse; (b) the absorbed light distribution causes an acoustic pressure; (c) the acoustic pressure is measured outside the object and used to reconstruct an image of the interior.
In this paper we consider PAT in heterogeneous acoustic media, where the acoustic pressure satisfies the wave equation Here r ∈ R d is the spatial location, t ∈ R the time variable, ∆ r the spatial Laplacian, c(r) the speed of sound, and f (r) the photoacoustic (PA) source that has to be recovered. The wave equation (1) is augmented with initial condition p(r, t) = 0 on {t < 0}. The acoustic pressure is then uniquely defined and referred to as the causal solution of (1). Both cases d = 2, 3 for the spatial dimension are relevant in PAT: The case d = 3 arises in PAT using classical point-wise measurements; the case d = 2 is relevant for PAT with integrating line detectors [3,6,29,28].
To recover the PA source, the pressure is measured with sensors distributed on a surface or curve outside of the sample; see Fig. 1. Using standard sensing, the spatial sampling step size limits the spatial resolution of the measured data and therefore the spatial resolution of the final reconstruction. Consequently, high spatial resolution requires a large number of detector locations. Ideally, for high frame rate, the pressure data are measured in parallel with a large array made of small detector elements. However, producing a detector array with a large number of parallel readouts is costly and technically demanding. In this work we use techniques of compressed sensing (CS) to reduce the number of required measurements and thereby accelerating PAT while keeping high spatial resolution [30,1,5,20].
CS is a new sensing paradigm introduced in [8,9,12] that allows to capture high resolution signals using much less measurements than advised by Shannon's sampling theory. The basic idea is to replace point samples by linear measurements, where each measurement consists of a linear combination of sensor values. It offers the ability to reduce the number of measurements while keeping high spatial resolution. One crucial ingredient enabling CS PAT is sparsity, which refers to the requirement that the unknown signal is sparse, in the sense that it has only a small number of entries that are significantly different from zero (possibly after a change of basis).

Main contributions
In this work we develop a new framework for CS PAT that allows to bring sparsity into play. Our approach is rooted in the concept of sparsifying temporal transforms developed for PAT in [30,20] for two and three spatial dimensions. However, the approach in this present paper extends and simplifies this transform approach considerably. First, it equally applies to any detection surface and arbitrary spatial dimension. Second, the new method can even be applied to heterogenous media. In order to achieve this, we use the second time derivative applied to the pressure data as a sparsifying transform. Opposed to [30,20], where the transform was used to sparsify the measured signals, in the present work we exploit this for obtaining sparsity in the original imaging domain.
Our new approach is based on the following. Consider the second time derivative ∂ 2 t p(r, t) of the PA pressure. We will show, that this transformed pressure again satisfies the wave equation, however with the modified PA source c 2 ∆ r f in place of the original PA source f . If the original PA source consists of smooth parts and jumps, the modified source consists of smooth parts and sparse structures; see Fig. 2 for an example. This enables the use of efficient CS reconstruction algorithms based on sparse recovery. One possible approach is based on the following two-step procedure. First, recover an approximation h(r) ∆ r f (r) via 1 -minimization. In a second step, recover an approximation to f by solving the Poisson equation While the above two-stage approach turned out to very well recover the singularities of the PA source, at the same time it shows disturbing low-frequency artifacts. Therefore, in this paper we develop a completely novel strategy for jointly recovering f as well as its Laplacian. It is based on solving the constrained optimization problem min (f,h) Here M is the forward operator (including wave propagation and compressed measurements), · 1 denotes the 1 -norm guaranteeing sparsity of h and I C the indicator function of the positive cone C {f | f (r) ≥ 0} guaranteeing non-negativity. In the case of noisy data we consider a penalized version that can be efficiently solved via various modern optimization techniques, such as forward backward splitting.

Outline
In Section 2 we describe PAT and existing CS approaches. We thereby focus on the role of sparsity in PAT. The proposed framework for CS PAT will be presented in Section 3. This includes the sparsification of the PA source and the joint reconstruction algorithm. Numerical and experimental results are presented in Section 4. The paper concludes with a summary and outlook on future research presented in Section 5.

Compressed photoacoustic tomography 2.1 Photoacoustic tomography
Suppose that V ⊆ R d is a bounded region and let r i for i = 1, . . . , n denote admissible detector locations distributed on the boundary ∂V . Then, for a given source f supported inside V , we define where p(r, t) is the solution of (1). The inverse source problem of PAT is to recover the PA source f from the Wf in (2). The measured data Wf is considered to be fully/completely sampled if the transducers are densely located on the whole boundary ∂V , such that the function f can be stably reconstructed from the data. Finding necessary and sufficient sampling conditions for PAT is still on-going research [19].
Let us mention that most of the theoretical works on PAT consider the continuous setting where the transducer locations are all points of a surface or curve Γ ⊆ ∂V ; see [14,32,24,13,24,26,17,18,25,27,21]. On the other hand, most works on discrete settings consider both discrete spatial and time variables [22,19]. The above setting (2) has been considered in a few works [30,20,10]. It well reflects the high sampling rate of the time variable in many practical PAT systems.

Compressive measurements in PAT
The number n of detector positions in (2) is directly related to the resolution of the final reconstruction. Namely, consider the case V being the disc of radius R and f being essentially wavenumber limited with maximal wavenumber give by λ 0 . Then, N ϕ ≥ 2Rλ 0 equally spaced transducers are sufficient to recover f with small error; see [19]. In many biomedical applications, this results in a high sampling rate. For example, the PA source may contain narrow features such as blood vessels and have sharp interfaces. This results in large wavenumbers and necessary high sampling rate. For this reason, full sampling in PAT is costly and time consuming.
To reduce the number of measurements while preserving resolution, we use CS measurements in PAT. Instead of collecting n individually sampled signals as in (2), we take CS measurements with m n. In [7,30] we proposed to take the measurement matrix A in (3) as the adjacency matrix of a lossless expander graph. Hadamard matrices have been proposed in [5,23]. In this work, we take A as a random Bernoulli matrix with entries ±1/ √ m with equal probability or a Gaussian random matrix consisting of i.i.d. N (0, 1/m)-Gaussian random variables in each entry. These choices are validated by the fact that Gaussian and Bernoulli random matrices satisfy the restricted isometry property (RIP) with high probability (see Section 2.3 below).

The role of sparsity
A central aspect in the theory of CS is sparsity of the given data in some basis or frame [8,12,15].
where |˙| is used to denote the number of elements of some set. If the data is known to have sparse structure, then reconstruction procedures using 1 -minimization or greedy-type methods can often be guaranteed to yield high quality results even if the problem is severely ill-posed [8,16]. If we are given measurements Mx = y, where x ∈ R N and y ∈ R m with m N , the success of the aforementioned reconstruction procedures can for example be guaranteed if the matrix A satisfies the restricted isometry property (RIP), i.e. for all s-sparse vectors z we have for an RIP constant δ < 1/ √ 2; see [15]. Gaussian and Bernoulli random matrices satisfy the RIP with high probability, provided m ≥ Cs log(en/s) for some reasonable constant C and with e denoting Euler's number [2].
In PAT, the possibility to sparsify the data has recently been examined [30,20]. In these works it was observed that the measured pressure data could be sparsified and the sparse reconstruction methods were applied directly to the pressure data. As a second step, still a classical reconstruction via filtered backprojection had to be performed. The sparsification of the data was achieved with a transformation in the time direction of the pressure data. In two dimensions, the transform is a first order pseudo-differential operator [30], while in three dimensions the transform is of second order [20].

Sparsifying transform
The following theorem is the foundation of our CS approach. It shows that the required sparsity in space can be obtained by applying the second time derivative to the measured data.
In particular, where M denotes the CS PAT forward operator defined by (3).
Proof. We first recall that the solution of (1) for t > 0 is equivalent to the initial value problem To see this equivalence note first that the solution of (6)-(8) extends to a smooth function p : R d × R → R that is even in the time variable. Denoting by χ = χ{t > 0} the characteristic function function of (0, ∞) yields ( Using the initial conditions (7), (8) shows that χp (which coincides with p for positive times) solves (1).
According to the above considerations, for t > 0, the function p coincides with the solution of (6)- (8). As a consequence, for t > 0, the function q = ∂ 2 t p satisfies In fact, (9) results by applying the second derivative to (6) and (11) follows from the symmetry of p. Evaluating (6) at t = 0 and applying the Laplacian to (7) yields initial condition (10). Finally as for the original pressure one concludes that (9)-(11) implies (5).
Remark 3.1. In order to focus on the main ideas, throughout the following we assume the spatial variable r ∈ {0, . . . , N r } d to be already discretized. The discrete Laplacian ∆ r then may be defined via symmetric finite differences; alternatively ∆ r may be defined via the Fourier transform in the spectral domain. Theorem 3.1 and the equivalence of (1) and (6)-(8) then holds for discrete sources f : {0, . . . , N r } d → R as well.
Typical phantoms consist of smoothly varying parts and rapid changes at interfaces. For such PA sources, the modified source c 2 ∆ r f is sparse or at least compressible. The theory of CS therefore predicts that the modified source can be recovered by solving In the case the unknown is only approximately sparse or the data are noisy, one instead minimizes the penalized functional problem 1 2 Mh − ∂ 2 t y 2 2 + β h 1 , where β > 0 is a regularization parameter which gives trade-off between the data-fitting term Mh − ∂ 2 t y 2 2 and the regularization term h 1 . Having obtained an approximation of h by either solving (12) or the relaxed version, one can recover the original PA source f by subsequently solving the Poisson equation ∆ r f = h/c 2 with zero boundary conditions. While the above two-stage procedure recovers boundaries well, we observed disturbing low frequency artifacts in the reconstruction of f . Therefore, below we introduce a new reconstruction approach based on Theorem 3.1 that jointly recovers f and h.

Joint reconstruction approach
As argued above, the second derivative p is well suited (via c 2 (r)∆ r f ) to recover the singularities of f , but hardly contained low-frequency components of f . On the other hand, the low frequency information is contained in the original data, which is still available to us. Therefore we propose the following joint constrained optimization problem min (f,h) We habe the following result. Theorem 3.2. Assume that f : {0, . . . , N r } d → R is non-negative and that ∆ r f is s-sparse. Moreover, suppose that the measurement matrix M satisfies the RIP (see (4)) and denote y = Mf . Then, the pair [f, ∆ r f ] can be recovered as the unique solution of (13).
Proof. According to Theorem 3.1 (in a discrete form; compare Remark 3.1), p (r, t) is the unique causal solution of the wave equation (5) with modified source term h = c 2 (r)∆ r f . As a consequence c 2 (r)∆ r h satisfies Mh = y , which implies that the pair [f, h] is a feasible for (13). It remains to verify that [f, h] is the only solution of (13). To show this, note that for any solution [f * , h * ] of (13) its second component h * is a solution of (12). Because M satisfies the s-RIP, and h = c 2 (r)∆ r f is s-sparse, CS theory implies that (12) is uniquely solvable [8,12,15] and therefore h = h * . The last constraint then implies that f * = f .
In the case the data only approximately sparse or noisy, we propose, instead of (13), to solve the 2 -relaxed version Here α > 0 is a tuning and β > 0 a regularization parameter. There are several modern methods to efficiently solve (14). In this work we use the forward-backward splitting with quadratic term as smooth part used in the explicit (forward) step and β h 1 + I C (f ) as non-smooth part used for the implicit (backward) step.

Numerical minimization
We will solve (14) using a proximal gradient algorithm [11], which is an algorithm well suited for minimizing the sum of a smooth and a non-smooth but convex part. In the case of (14) we take the smooth part as and the non-smooth part as Ψ(f, h) β h 1 + I C (f ).
The proximal gradient algorithm then alternately performs an explicit gradient step for Φ and an implicit proximal step for Ψ. The gradient [∇ f Φ, ∇ h Φ] of the smooth part can easily be computed to be The proximal map of the non-smooth part is given by With this, the proximal gradient algorithm is given by where (f k , h k ) is the k-th iterate and µ k the step size in the k-th iteration. We initialize the proximal gradient algorithm with f 0 = h 0 = 0.
Implementation of (18) avoids taking the second time derivative of the data y. Because the proximal map of f → c∆ r f 1 in not available explicitly, (18) and its relaxed versions cannot be straightforwardly addressed with the proximal gradient algorithm. Therefore, in the present paper we only use the model (14) and the algorithm (16), (17) for its minimization. Different models and algorithms will be investigated in future research.

Numerical results
For the presented numerical results, the two dimensional PA source term f : {0, . . . , N r } 2 → R depicted in Figure 2 is used which is assumed to be supported in a disc of radius R. Additional results are presented using an MRI image. The synthetic data is recorded on the boundary circle of radius R, where the the time was discretized with 301 equidistant sampling points in the interval [0, 2R]. The full data was recorded at n = 200 detector locations. The reconstruction of both phantoms via the filtered backprojection algorithm of [14] from the full measurements is shown in Figure 3. CS measurements Mf (see (3)) have been generated in two random ways and one deterministic way. The random matrices A have be taken either as random Bernoulli matrix with entries ±1/ √ m with equal probability or a Gaussian random matrix consisting of i.i.d. N (0, 1/m)-Gaussian random variables in each entry. The deterministic subsampling was performed by choosing m equispaced detectors. In the joint reconstruction with (16), (17), the step-size and regularization parameters are chosen to be µ k = 0.1, α = 0.1 and β = 0.005; 5000 iterations are performed. For the random subsampling matrices the recovery guarantees from the theory of CS can be employed, but they do not provably hold for the deterministic subsampling -although the results are equally convincing even for subsampling on the order of 10 % of the full data, cf. If one increases the number of measurements to about 25 % of the data, the reconstruction results become almost indistinguishable from the results obtained from FBP on the full data, cf. Figure 5.
In Figure 6 the application of the method developed in this article to an MRI image is presented.
As the sparsity of the Laplacian is not as pronounced as in the synthetic example, the MRI image requires more measurements to achieve high qualitative outcome.
For noisy data, the algorithm still produces good results, although more samples need to be taken to achieve good results. For the synthetic phantom, Gaussian noise amounting to an SNR of approximately 15 % was added. The reconstruction results using m = 20 and m = 50 measurements are depicted in Figure 7 and Figure 8, respectively.

Experimental results
Experimental data have been acquired by an all-optical photoacoustic projection imaging (O-PAPI) system as described in [3]. The system featured 64 integrating line detector (ILD) elements distributed along a circular arc of radius 4 cm, covering an angle of 289 degree. For an imaging depth of 20 mm, the imaging resolution of the O-PAPI system was estimated to be between 100 µm and 260 µm; see [3]. PA signals were excited by illuminating the sample from two Reconstruction results for the leaf phantom from 60 sparsely sampled sensor locations after 500 iterations with the proposed joint minimization algorithm are shown in Figure 9. For this, the regularization and step-size parameters were chosen as in the previous section. From the experimental data we also generated m = 30 random Bernoulli measurements. The reconstruction results using this data are shown in Figure 10. For all results, the PA source is displayed on a 1.6 cm × 1.33 cm rectangle with step size 26 µm inside the detection arc.

Conclusion
In order to achieve high spatial resolution in PAT, standard measurement and reconstruction schemes require a large number of spatial measurements with high bandwidth detectors. In order to speed up the measurement process, systems allowing a large number of parallel measurements are desirable. However such systems are technically demanding and costly to fabricate. For Bernoulli Sparse Figure 6: Reconstructions from 60 noisefree measurements (MRI phantom): Reconstruction of an MRI image from m = 60 (i.e. 33 % of the fully sampled data) synthetically generated noisefree measurements using the method presented in this article (top) and FBP (bottom).
example, in PAT with integrating detectors, the required analog to digital converters are among the most costly building blocks. In order to increase measurement speed and to minimize system costs, CS aims to reduce the number of measurements while preserving high resolution of the reconstructed image.
One main ingredient enabling CS in PAT is sparsity of the image to be reconstructed. To bring sparsity into play, in this paper we introduced a new approach based on the commutation relation ∂ 2 t W[f ] = W[c 2 ∆ r f ] between the PAT forward operator W and the the Laplacian. We developed a new reconstruction strategy for jointly reconstructing the pair [f, ∆ r f ] by minimizing (14) and thereby using sparsity of ∆ r f . The commutation relation further allows to rigorously study generalized Tikhonov regularization of the form 1 2 Mf − y 2 2 + β c∆ r f 1 + I C (f ) for CS PAT. Such an analysis as well as the development of more efficient numerical minimization schemes are subjects of further research.