Gradient Projection for Sparse Reconstruction
Table of Contents
Some ideas I liked from this paper:
- A way to convert some non differentiable objective functions to differentiable problem with constraint
- Use of Projection method
1. Convex optimization problem
Our objective is to do Sparse reconstruction. Which can be formulated as a least squares problem with L1 regularization:
\begin{align*} \min_x \frac 1 2 || y - Ax ||_2^2 + \tau ||x||_1 \end{align*}This is same as max a posteriori estimation of \(x\) from observation with Gaussian noise and prior on \(x\) being Laplacian.
2. Reformulation
This problem can be formulated as Bound constrained quadratic program (BCQP) by splitting \(x\) as difference of two vectors \(u \geq 0\) and \(v \geq 0\).
\begin{align*} \min_{u,v} \frac 1 2 || y - A(u - v) ||_2^2 + \tau 1^T u + \tau 1^T v \end{align*}such that \(u, v \geq 0\)
The original problem with L1 norm is not differentiable at zero, and thus standard methods don't directly apply. This reformulation now allows computing the gradient.
This can be written in more standard BCQP form with objective function expressed as
\begin{align*} F =\frac 1 2 z^T B z + c^Tz \end{align*}with the constraint \(z \geq 0\) where the matrix \(B\) encodes the L2 norm and \(c\) encodes the L1 norm.
3. GPSR-Basic method
Solution involves taking gradient step in \(\nabla F\) and projecting \(z\) to positive orthant. [Eqn: 11, 12]
While taking the step, the step size \(\alpha\) is choosen so as to get minimum \(F\) in the direction of modified gradient \(g\). The modified gradient \(g\) at some components has 0 value if using direction from \(\nabla F\) takes that component to negative, otherwise it is same as \(\nabla F\).
We can derive a formula for \(\alpha\).
4. Barzilai-Borwein method
To choose step sizes. They also use a continuation scheme
The idea is to update by the step \(-H_k^{-1} \nabla F\) where \(H_k\) is an approximation of the Hessian. Thus making it a quasi-newton method.
5. Warm starting
Warm starting uses an idea similar to but not exactly same as of penalty methods. There, the regularization strength is slowly increased from a value that leads to easy solution to a value for which direct solution is difficult.
Here too, the \(\tau\) parameter is first taken to be large and then decreased (notice, in penalty method we increase). For large \(\tau\) the solution is easy (\(x = 0\)) and decreasing \(\tau\) make the problem progressively difficult.