O EURASIP Journal on

Advances in Signal Processing

a SpringerOpen Journal

RESEARCH Open Access

An augmented Lagrangian multi-scale dictionary learning algorithm

Qiegen Liu1, Jianhua Luo1*, Shanshan Wang1, Moyan Xiao1 and Meng Ye2

Abstract

Learning overcomplete dictionaries for sparse signal representation has become a hot topic fascinated by many researchers in the recent years, while most of the existing approaches have a serious problem that they always lead to local minima. In this article, we present a novel augmented Lagrangian multi-scale dictionary learning algorithm (ALM-DL), which is achieved by first recasting the constrained dictionary learning problem into an AL scheme, and then updating the dictionary after each inner iteration of the scheme during which majorization-minimization technique is employed for solving the inner subproblem. Refining the dictionary from low scale to high makes the proposed method less dependent on the initial dictionary hence avoiding local optima. Numerical tests for synthetic data and denoising applications on real images demonstrate the superior performance of the proposed approach.

Keywords: dictionary learning, augmented Lagrangian, multi-scale, refinement, image denoising.

1. Introduction

In the last two decades, more and more studies have focused on dictionary learning, the goal of which is to model signals as a sparse linear combination of atoms that form a dictionary below a certain error toleration. Sparse representation of signals under the learned dictionary possesses significant advantages over the pre-specified dictionary such as wavelet and discrete cosine transform (DCT) as demonstrated in many literatures [1-3] and it has been widely used in denoising, inpaint-ing, and classification areas with state-of-the-art results obtained [1-5]. Considering there is a signal bl eRM, it can be represented by a linear combination of a few atoms either exactly as bi = Axi or proximately as bi » Axi, where A represents the dictionary and xi denotes the representation coefficients. Given an input matrix B = [bi, ..., bL] in RMxL of L signals, the problem then can be formulated as an optimization problem jointly over a dictionary A = [a1, ..., aj] in RMxJ and the sparse representation matrix X = [x1, ..., xJ in RjxL, namely

* Correspondence: jhluo@sjtu.edu.cn

1College of Life Science and Technology, Shanghai Jiaotong University, 200240, Shanghai, P.R. China

Fulllist of author information is available at the end of the article

Ti1^i=i Mo

M\\i < t,

l = 1, ■■■ , L;

>1, j = 1, ■■■ , J

where ||-||0 denotes the i0 norm which counts the number of nonzero coefficients of the vector, ||-||2 stands for the Euclidean norm on and t is the tolerable limit of error in reconstruction.

Most of the existing methods for solving Equation 1 can be essentially interpreted as different generalizations of the K-means clustering algorithm because they usually have two-step iterative approaches consisting of a sparse coding step where sparse approximations X is found with A fixed and a dictionary update step where A is optimized based on the current X [1]. After initialization of the dictionary A those algorithms keep iterating between the two steps until either they have run for a predefined number of alternating optimizations or a specific approximation error is reached. Concretely, at the sparse coding step, seeking the solution of Equation 1 with respect to a fixed dictionary A can be achieved by optimizing over each xi individually as follows:

mm ||X|||o

s.t.\\bi - Axih < t

ringer

© 2011 Liu et al; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 1 The denoising results in dB for six test images with noise power in the range [5,100] gray values

s/PSNR 'Barbara' 'House' 'Boat' 'Lena' 'Peppers' 'Cameraman'

5/34.15 37.91 39.50 37.89 38.13 38.08 38.22

37.74 39.32 37.81 37.97 37.78 37.93

10/ 28.13 34.19 36.05 33.58 34.30 34.59 34.00

33.94 35.97 33.39 34.02 34.17 33.71

15/ 24.6' 32.10 34.46 31.33 32.10 32.54 31.67

31.92 34.31 31.17 31.86 32.27 31.35

20/ 22.1- 30.63 33.35 29.78 30.58 31.03 30.27

30.55 33.18 29.65 30.38 30.76 29.97

25/ 20.17 29.33 32.18 28.60 29.45 29.88 29.04

29.25 32.10 28.52 29.30 29.70 28.78

50/ 14.15 24.57 27.76 24.98 25.83 26.20 25.73

24.65 27.85 24.99 25.76 26.19 25.65

75/ 10.61 21.50 25.19 22.67 23.55 23.69 23.38

21.52 25.26 22.72 23.56 23.72 23.38

100/ 8.13 20.19 23.46 21.66 22.02 21.90 21.70

20.26 23.40 21.67 22.01 21.86 21.72

For each test setting, and K-SVD algorithm

two results are provided: our ALM-DL algorithm (top); (bottom). The best result in each set is highlighted.

Or equivalent form:

minÁ||x¡||o + \\b¡ - Axi\\t

where l is the regularization parameter related to t and it tunes the weight between the regularization term ||x/||o and the fidelity term \\bi — Axi W^. Solving Equation 2 or 3 proves to be a NP-hard problem [6], one way to solve which is greedy pursuit algorithms such as matching pursuit (MP) and its variants [7,8]; another commonly used approach is to relax the optimization problem convexly via basis pursuit [9] such as iterated thresholding [10], FOCal Underdetermined System Solution (FOCUSS) [11], and LARS-Lasso algorithm [12].

At the dictionary updating step, when the optimization problem Equation 1 is solved over bases A given fixed coefficients X, it reduces to a least squares problem with quadratic constraints as shown in Equation 4:

A ^1=1

s.t. lia

= 1,j = 1,J

In general, this constrained optimization problem can be solved using several methods. One simple technique is gradient descent such as maximum likelihood (ML) [13,14] and maximum a posteriori (MAP) with iterative

projection [15], another is a dual version derived from its Lagrangian proposed by Lee et al. [16], the method of optimal directions (MOD) [17] proposed by Engan et al. is also a common technique which solves it using the pseudo inverse of X. For all the methods, the most important breakthrough is the K-singular value decomposition (K-SVD) proposed by Aharon et al. [1]. K-SVD uses a different strategy such that the columns of A are updated sequentially one at a time by using an SVD to minimize the approximation error. Hence, the dictionary updating step is to be a truly generalization of the K-means since each patch can be represented by multiple atoms and with different weights.

Recently, much effort has been posed on tightening or loosening the constraint of the dictionary. Some parametric dictionary learning algorithms are proposed in [18,19], which only optimize the parameters of pre-spe-cified atoms (e.g., Gabor-like atoms) instead of the dictionary itself and thus decrease the dimensionality of the corresponding optimization problem, while these algorithms depend too much on selecting a proper parametric dictionary experimentally in advance and only better match to the structure of a specific class of signals. In contrast, non-parametric (Bayesian) approaches proposed in [20,21] learn the dictionary using some prior stochastic process, which automatically estimate the dictionary size and make no explicit assumption on the noise variance, while the drawback of them is the computational load. However, little attention in the literature has been paid to making the generalized clustering ability of the dictionary more stable.

Since the traditional dictionary learning methods can be viewed as various extensions of the K-means clustering method, a common drawback of them is that they are prone to local minima, i.e., the efficiency of the algorithms depends heavily on either the samples type or the initialization. Figure 1 shows a two-dimensional toy example in which the atom identify ability of K-SVD algorithm is investigated for two different sample types, both of which comprise of 1,000 samples and each sample is a multiple of one of eight basis vectors plus additive noise. One type is that the dictionary's atoms have a uniform angular distribution in the circle, and the other is that seven atoms have a uniform angular distribution in half a circle except the last one. We run K-SVD over 50 times with different realizations of coefficients and noise, and obtain the average identify ability of the two types which is 87% for the former and almost 100% for the latter. The main reason of this phenomenon is that the K-SVD algorithm is sensitive to initialization and has difficulty in updating the atom in a correct direction when the samples distribute in a non-directional, non-regular way. A natural way to alleviate this problem is first updating the dictionary in a low resolution or

-1 -0.5

-1 -0.5

0 0.5 1 1.5

Figure 1 A two-dimensional toy example testing the atom identify ability of K-SVD algorithm for two different sample types with the size of 1,000. Each sample is a multiple of one of eight basis vectors plus additive noise. (a) The dictionary's atoms have a uniform angular distribution in the circle; (b) Seven atoms have a uniform angular distribution in half a circle except the last one. A display of one run is shown above, where the green vectors show the true atoms and the red vectors show the learned atoms.

smoothed version of the samples and then making the smoothed samples converge asymptotically to the original samples while refining the dictionary.

In this article, we propose a specific approach of such multi-scale strategy, the outline of which is to transform the constrained dictionary learning problem into the augmented Lagrangian (AL) framework first and then refine the dictionary from low scale to high. We name this approach as AL-based multi-scale dictionary learning (ALM-DL) algorithm. AL is a standard and elementary tool in optimization field, it converges fast even super-linearly when forcing its penalty parameter updated to infinity [22,23]. A closely related algorithm is the Bregman iterative method which was originally proposed by Osher et al. [24] for total variation regulariza-tion model, they are identical when the constraint is linear [25]. Under the circumstance of the study proposed in this article, AL is equivalent to the Bregman iterative method. We choose to follow the AL perspective, instead of Bregman iteration, only because of the fact that AL is popularly used in the optimization community. Usually, a "decouple" strategy (e.g., alternating direction method-ADM) is used to solve the subproblem of the AL scheme, it facilitates the AL to be implemented efficiently in many inverse problems [26-29]. In this article, we resort to a variant of this spirit. We employ a modified majorization-minimization (MM)

technique to tackle with the subproblem, enabling its solution accuracy and implementation efficiency.

The rest of the article is organized as follows: Section 2 describes the proposed method with two parts, i.e., the multi-scale dictionary learning framework and the subproblem of inner minimization. In Section 3, we conduct the experiments on synthetic data and compare its ability for recovering the original dictionary with K-SVD, MOD. Then, its ability for denoising real images is tested and compared with K-SVD in Section 4. At last Section 5 concludes the article with remarks.

2. The proposed method

This section introduces the ALM-DL algorithm for solving the dictionary learning problem, it is achieved by first recasting the constrained dictionary learning problem into an AL scheme, and then updating the dictionary after each inner iteration of the scheme, during which MM technique is employed for solving the inner subproblem.

2.1 A multi-scale dictionary learning framework

In this section, l1 norm instead of l0 norm is used to relax the minimization problem Equation 2; therefore, the objective optimization problem over x in is given with the subscript variable l omitted for the sake of clarity as follows

s.t.\\b — Ax\\2 < t

By reformulating the feasible set {x: | |b-Ax||2 ^ t} as an indicator function $2(b — Ax), the constrained problem Equation 5 turns into an unconstrained one:

1 + S2(b — Ax)

where S2(z) =

0, if||z||2 < r +<x>, otherwise Similarly in [26,30], the resulting unconstrained problem is then converted into a different constrained problem by applying a variable splitting operation, namely:

Suppose that each iterative step is regarded as a scale, then the dictionary updating, via summing the multiplication of primal and dual variables (i.e., Equation 10), can be seemed as a refinement process from the low scale to the high one. As discussed in the introduction, this method can avoid local optima problems because only the main features of the image patches exist at the initial stage of the iteration and we list the proposed method ALM-DL in Diagram 1. Diagram 1. The general description of the ALM-DL algorithm

1: initiation: X0 = 0; A0 2: while stop-criterion not satisfied 3: for I = 1, ..., L, (xf+\ zk+1} = argminLp (xi, zi)

1 + s2(z) s.t. — Ax — z + b = 0

Lp(xi,zj) = ll.tillj + - {)'{',At+zi - bt) + ||A\"; + z\ -

We apply the method of AL for solving this constrained problem, which is replaced by solving a sequence of unconstrained subproblems in which the objective function is formed by the original objective of the constrained optimization plus additional "penalty" terms, the "penalty" terms are made up of constrained functions multiplied by a positive coefficient (for more details of AL scheme, see [22]), i.e.,

1, zfe+1

} = argmin Lr (x, z)

yfe+1 = yfe + - zh+1 + b).

yfe+1 = yfe + - zk+1 + b)

where (v)denotes the usual duality product.

For conventional dictionary learning approach, dictionary is updated after achieving the optimal minimization of Equations 8 and 9 and the whole learning procedure loops in an alternative way until satisfying some conditions. In contrast, here we update dictionary after inner iteration of Equations 8 and 9, i.e., taking the derivative of functional Lp with respect to A we get the following gradient descent update rule:

Ak+1 = Ak — ^

-Yh + jp(Akxh+1 +z'l+1 ~ B)

= Ak + liYk+1(Xk+1)T

A merit of the AL methodology is its superior convergence property: Axk ® Ax* = b - z* [22], where each iterative variable "Axk" can be viewed as a low-resolution or smoothed version of the true image patches "Ax*".

4 yh+1 = y'1 + —i-AXh+1 - Zh+1 + B)

5: Ak+1 = Ak + ^Yk+1 (Xk+1)T 6: end while

2.2 The sub-problem of inner minimization

From the pseudocode of the proposed algorithm depicted in Diagram 1, it is obvious that the speed and accuracy of the proposed method depend heavily on how the subproblem over variables x and z is solved, so a simple and efficient method should be developed to enable the efficiency of the whole algorithm. Ideally, the minimization of Equation 8 with respect to z can be computed analytically and z can be eliminated:

x|| ! + Sf (z) - (y,Av + z-b)+ — ||A* + z-

= x , + min

= x i + min

1 H 1,112

S? (z) + — Ax + z-b- 28 f tK ' 4P \\ ' r h

«?(*) +¿11* "Ml

Denoting bi = -Ax + b + 2byk, then the minimization of the second and third terms in Equation 11 with respect to z is obtained:

' -h, if||i'i||2 > r

z = ] \\b1 \\2

[ b1, otherwise

THT(b1) = b1~Z =

hh ~ r IIM2

b1, if\b1\2 > t

0, otherwise

Moreover, it follows that

yfe+1 = / + —(-Ax**1 - zh+1 + b)

[~zk+1 + (-A/+1 + b + 2/iyfe)]

[2вyfe + (b - Axfe+1)]

The next most crucial problem is how to determine x. It is hard to minimize Equation 11 which is nonlinear with respect to variable x, so we develop an iterative procedure to find the approximate solution. In the developed method, z is replaced by its last state z™ and MM technique is employed to add an additional proximal-like penalty at each inner step so as to cancel out the term || Ax ||2 (for more details of MM technique, see [31-33]). Since both of the variables x and z are updated at each inner step, it seems justified to conclude that a satisfied solution will be obtained after just a few steps. The experimental verification is presented in Section 4.3.

xm+j =aigmi^4^|x|j + ||zm - bi|2 + (x - xm)T (yI - ATA)(x - xm)J

= aigmm I 4/ ||x|| 1 + || Ax - b - 2/yk + zm ||^ + (x - xm)T(yI - ATA)(x - xm)

= aigmin i llx — x

■A' (b + 2/3/ - z'" - At'") III + — Цл'Ц!

= aigmini ||x|i +

= Shrink(xm

||Ь[Л""+?

-2l3f — zm — Axm)

defined as Shrink(/, /) =

where g > eig(A A) and the Shrink operator is

7 - ¡¡, f > ¡¡0, - ¡. < f < ¡. f + f <

In summary, the proposed ALM-DL algorithm consists of a two-level nested loop; the outer loop updates the dual variables and the dictionary while the inner loop minimizes the primal variables at the same time to enable the accuracy of the algorithm. The detailed description of the algorithm is listed in Diagram 2, the initial dictionary A0 in line 1 can be any predefined matrix (e.g., the redundant DCT dictionary); the operator THt (Y) in line 4 implies to deal with each column of the matrix Y individually. Diagram 2. The detailed description of the ALM-DL algorithm

1: initiation: X0 = 0; A0 2: while stop-criterion not satisfied (loop in k): 3: while stop-criterion not satisfied (loop in m):

4: Ym+1 = —THT [2/3 Cfe - (AkXk'm - B)] 23

5: Xk,m+j = shrink(Xkm + 233j ATYm+j,233j )

6: end while

y: ck+1 = Ym+1* Xk+1' 0 = X^ m+1

8: Ak+1 = Ak + '^Ck+1 (Xk+1' 0)T

9: end while

2.3 An hybrid method for improving performance

At first glance, it seems that our proposed iterative scheme of xm+1 is very similar to the iterative shrinkage/ thresholding algorithm (ISTA), which has been intensively studied in the fields of compressed sensing and image recovery [10,11,25,26,28,29,32,34]. To improve the efficiency of the ISTA, various techniques have been applied to Equation 13. The most simple and fast approaches in recently years include FPC [35], SpaRSA [36], FISTA [34]. In fact, as noted in [32,34], the MM technique we employ in Section 2.2 can lead to ISTA (for details one can also see our derivation in the Appendix 1), the main novelty in our work is that we accelerate the ISTA algorithm with regard to variable xm+1 by using up-to-date zm. i.e. at each inner ISTA iteration of x, xm+1 benefits from the latest value zm. Seen from Diagram 2, by using up-to-date z™, the convergence of variables both x and y are accelerated, therefore the corresponding update of dictionary A, Ak+1 = Ak + ^Y^1 (Xk+1' 0)T, are also accelerated accordingly.

After this paper was submitted for publication we recently became awarea of some very recent studies by Yang [29] and Ganesh [37], the ADM framework adopted by these authors is very similar to ours, i.e. they first introduce auxiliary variables to reformulate the original problem into the form of AL scheme, and then apply alternating minimization to the corresponding AL functions. The main differences between these method and ours lie on the fact that the application field is different, the ultimate goal of Ganesh's and Yang's methods in compressed sensing field pursues the sparest coefficient x under predefined transform or dictionary, while our method is devoted to obtain the optimal dictionary A.

Keep this awareness in mind, we can find the major distinction between our method and Ganesh's and Yang's methods. Firstly, in Yang's study they apply the basic ISTA to solve the inner minimization with respect to variable x [[29], p. 6]. Secondly, in Ganesh's study they apply FISTA, an accelerated technique of ISTA, to solve the inner minimization with respect to variable x [[37], pp.15-16]. Both Ganesh's and Yang's methods try to find sparest solution under fixed transform or dictionary [29,37]. Finally, in our work we aim to obtain optimal dictionary and its corresponding update form is Ak+1= Ak + ^Y^1 (Xk+1, 0)T in the iterative process. This indicates that the convergence of updating A depends on both x and y. So we modified the naive ISTA with respect to variable xm+1 by taking advantage of up-to-date zm. Under these circumstances, both x and y are

Y AV",2P/Y )

Table 2 The denoising results in dB for six test images and a noise power in the range [5,100] gray values

Sn 'Barbara' 'House' 'Boat' 'Lena' 'Peppers' 'Cameraman'

5/34.15 37.86 39.40 37.75 38.07 38.08 38.02

37.63 39.36 37.48 37.77 37.67 37.69

10/ 28.13 34.29 36.10 33.61 34.28 34.54 33.92

34.08 36.07 33.34 33.93 34.16 33.41

15/ 24.6' 32.10 34.47 31.30 32.22 32.61 31.64

31.99 34.46 31.16 32.04 32.39 31.29

20/ 22.1' 30.61 33.31 29.75 30.62 31.12 30.15

30.53 33.23 29.68 30.48 30.95 29.94

25/ 20.17 29.27 32.31 28.62 29.41 29.89 29.07

29.25 32.39 28.58 29.39 29.84 28.91

50/ 14.15 24.82 28.20 25.07 25.83 26.29 25.70

24.90 28.11 25.11 25.80 26.26 25.65

75/ 10.61 21.80 25.24 22.76 23.46 23.52 23.61

21.71 25.20 22.74 23.45 23.49 23.58

100/ 8.13 20.24 23.78 21.73 22.22 21.65 21.80

20.24 23.66 21.69 22.14 21.63 21.72

For each test setting, and K-SVD algorithm highlighted.

two results are provided: our ALM-DL algorithm (top); (bottom). The best among each two results is

accelerated thereby the update of dictionary A is accelerated. As by-products, through this modification the variable z is omitted and implicitly updated in the iterative scheme. Thus the whole iterative procedure deduces to a very simple and compact iterative fashion. It is worth noting that since the number of training samples is very big for dictionary learning problem (the number adopted in the experiment of Section 4 is 62001), a simple iteration formula is essential.

For comparison purpose, we modify and extend Yang's and Ganesh's method for dictionary learning problem by adding dictionary update stage, i.e. we update dictionary A the same as we have done in Equation 10 of Section 2.1. We call the extended Yang's method as ADM-ISTA-DL and Ganesh's method as ADM-FISTA-DL. The detailed description of the two methods is presented in Diagrams 3 and 4 in the Appendix 2 respectively. Furthermore, since both of our's and Ganesh's methods can be viewed as accelerated techniques for ISTA, we can integrate them into a unified framework for our dictionary learning problem. Diagram 5 shows the pseudocode of the hybrid algorithm. As can be seen from the Diagram, lines 5 and 6 come from our method which pursues accelerating variables x and y; on the other hand, lines 7 and 8 belonging to FISTA aim to accelerating

variable x. Compared with ADM-FISTA-DL shown in the Appendix 2, the proposed hybrid algorithm has more simple formation and faster convergence of x and y. Compared with our ALM-DL shown in Diagram 2, it inherits the strength of FISTA. To conclude, the hybrid algorithm would perform better than both, and its computational cost between our ALM-DL and ADM-FISTA-DL, the numerical comparison of the three approaches will be conducted in Section 4.3. As for real application of dictionary learning such as image denoising, we still choose the primary ALM-DL because of its simple and compact formation. Diagram 5. The detailed description of the hybrid algorithm 1: initiation: X0 = 0; A0

while stop-criterion not satisfied (loop in k):

W1 — Xk; Q1 — Xk; h — 1 while stop-criterion not satisfied (loop in m):

ym+1 = J_rHr[2/iCfc _ (AfeQm _ B)] 2p

Wm+1 = shrink(Qm + 2PyAlYm+1, 2Py)

tm+1 = \ (l + V1 + 4tm)

t — 1

vn+l = wm+1 + __ wnvj

9: end while

10: ck+1 — ym+1, xk+1 — Wm+1

11: Ak+1— Ak + (X+1, 0)T

12: end while) 3. Synthetic experiments

To evaluate the proposed method, ALM-DL, we first try it on artificial data to test its ability for recovering the original dictionary and then compare it with the other two methods: K-SVD and MOD.

3.1 Test data and comparison criterion

The experiment described in [1] is repeated, first a basis Aorig <= rMxJ is generated, consisting of J — 50 basis vectors of dimension M — 20, and then 1,500 data signals {bi,b2, ..., b1500} are produced, each obtained by a linear combination of three basis vectors with uniformly distributed independent identically distributed (i.i.d.) coefficients in random and independent locations. We add Gaussian noise with varying SNR to the resulting data, so that we finally get the test data.

For the comparison criterion, the learned bases were gained by applying the K-SVD, MOD, and ALM-DL to the data. As in [1], we compare the learned basis with the original basis using the maximum overlap between each original basis vector a°ng and the learned basis vector fl]earn, i.e., whenever max (l - a°ngaJeam ) is smaller than 0.01, we count this as a success [1].

Table 3 The computation time of the four methods for running 34 iterations

ADM-ISTA-DL ADM-FISTA-DL ALM-DL The hybrid method

Boat, s = 15, m = 4 174.08s 205.45s 168.48s 188.12s

Boat, s = 15, m = 7 273.39s 330.63s 279.12s 316.61s

Lena, s =15, m = 4 174.25s 207.13s 168.54s 188.29s

Lena, s =15, m = 7 271.75s 329.27s 278.75s 313.31s

3.2 The parameter of the algorithm

The impact of parameter b on the ALM-DL algorithm is investigated in this section. In the case of SNR = 10, we set b = 0.22, 0.44, 0.66, 0.88, respectively, and run the algorithm for 180 iterations. With the process of iterations, we investigate the evolution of detected atom numbers and the root mean square error (RMSE) which is

defined as RMSE = 1 j \/Ml||b - AkXk || As can be seen

from Figure 2, the RMSE increases but the number of successfully detected atoms (NSDA) decreases with increasing b, and an interesting phenomenon is that the larger the value of b, the less stable the NSDA, it seems that the NSDA increases more gradually and stably when b is very small. However, when the value of b is very small the algorithm needs more iterations. Thus, in practicable implement the parameter b should be given a relatively small value. As for the experiments conducted below, the parameter b is set as 0.45 and the number of iteration k is set as 100. For a fair comparison, the number of learning iterations of K-SVD and MOD is also set to be 100, which is bigger than that in [1].

3.3 Comparison results

The ability of recovering the original dictionary is tested for three methods, namely, K-SVD, MOD, and ALM-DL, and the comparison results are given in this section. We repeat this experiment 50 times with a varying SNR of 10, 20, 30,

40, and 50 dB. As in [1], for each noise level, we sort the 50 trials according to the number of successfully learned basis elements and order them in groups of 10 experiments. Figure 3 shows the results of K-SVD, MOD, and ALM-DL. As can be seen, our algorithm outperforms both of them, especially when the noise level is low, ALM-DL recovers the atoms much more accurately. We know that not only the test dictionary, but also the coefficients are generated in random and independent locations, the specific distribution of the sample data widens the performance gap between our proposed ALM-DL and K-SVD. This indicates that our method has better performance on images with irregular objectives and this advantage will also be validated for real applications as shown in the next section.

4. Numerical experiments of image denoising

This section presents the dictionary learned by ALM-DL algorithm and demonstrates its behavior and properties in comparison with K-SVD algorithm. We have tested our method for various denoising tests on a set of six 8bit grayscale standard images shown in Figure 4, which are "Barbara", "House", "Boat", "Lena", "Peppers", and "Cameraman". In the experiment, the whole process involves the following steps:

• Let / be a corrupted version of the image I (256 x 256), after the addition of white zero-mean Gaussian noise with power cn, data examples {b1,b2, ..., b62001} of 8x8 pixels are extracted from the noisy images I, some

(a) (b)

Figure 2 The evolution of RMSE (a) and the NSDA (b) with various b The target RMSE is 0.1063.

♦ ♦

■ ♦

■gi-ii)

■ I ♦

■ I ♦

i-1—

r'p--6

■ k-svd

mod ♦ alm-dl

snr (db)

Figure 3 Comparison results of K-SVD, MOD, and ALM-DL with respect to the reconstruction of the original basis on synthetic signals. For each of the tested algorithms and for each noise level, 50 trials were conducted and their results were sorted. The graph labels represent the mean number of successfully learned basis elements (out of 50) over the ordered tests in groups of ten experiments.

initial dictionary A0 is specially chosen for both of the training algorithms.

• In the sparse coding stage of learning procedure, each patch is extracted and sparse-coded. For ALM-DL we set m = 7, /3 = 100 and target error T = C\fMan with the default value C — 1.15. The iteration is repeated until the error has been satisfied. Meanwhile, error-constrained orthogonal MP (OMP) implementation is used in the K-SVD algorithm [2,38] (the K-SVD codes are available at http://www.cs.technion.ac.il/~elad/soilware/) to solve Equation 1 with the same target error as mentioned above and K-SVD runs ten iterations. To enable a fair comparison, the data samples are sparse-coded using OMP under the learned dictionary for both algorithms after the learning procedure, these implementations lead to approximate patches with reduced noise

{J1, b2, ■ ■ ■ , b>62001}.

• The output image J is obtained by adding the patches {b 1, b2, ■■■ , b62001} in their proper locations and averaging the contributions in each pixel, the implementation is the same as in [2].

4.1 The learned dictionary

We investigate the sensitivity of dictionaries generated by ALM-DL and K-SVD to initialization, respectively, in this section. First two dictionaries are chosen as the initializations, one is the redundant DCT dictionary (Figure 5a) and the other is a random matrix whose atom is randomly chosen from the training data (Figure 5b). Both of the dictionaries consist of J — 256 atoms and each atom is shown as an 8 x 8 pixel image. Then ALM-DL and K-SVD are used for denoising the image "Cameraman" with a — 10, and at last two sequences of dictionaries generated by the two methods are shown in Figures 6 and 7, respectively, from each top line of which it can be seen that the ALM-DL drastically changes the dictionary while K-SVD does not, thus the proposed algorithm has a good ability to recover the main prototypes at the first few stages. Moreover, these figures also show that the ALM-DL has another well-posed property, i.e., it is insensitive to initialization because the final learned dictionaries are very similar to each other regardless of the atom location (seen from Figures 6e and 7e), while K-SVD depends too much on the initialization. Thus, our proposed method avoids largely getting trapped into some local optima.

4.2 Denoised results

In this section, ALM-DL is compared with K-SVD for the image denoising applications. In fact, the six test images in Figure 4 can be classified into two categories based on their overlapped patches' distributions, which can be distinguished by patches' standard covariance and principal components. The first three images (i.e., "Barbara", "House", "Boat"), typically characterized by regular textures or edges, are classified as the regular one, while the latter three images (i.e., "Lena", "Peppers", "Cameraman"), typically characterized by irregular objectives, are classified as the irregular one. Figure 8a-b shows the standard covariance matrix of the 62,001 patch examples extracted from "Barbara" and "Cameraman", respectively, with standard deviation a — 20. The entries of the 64 x 64 matrix are between 0 and 1. As can be seen, the coordinates in 64-dimensional space of the image "Barbara" are connected more closely than

дозззпзз »assssa

ОДВ19811 ИММОШИИНВШЙВ II .'.. : ::••.." s -г

ШШШШИВДШШШШВШ

ii ir. тшшжшкШжтйшшж п.: \."v ;; ;;/• in i мшшншм s Iliis

I- .ЖИВ? „» Y3S«P» kr^'fS. . Й Як ~r--f "И^'ЙЙ! ££11 s .: "-Ч1.. WBi? " Г- -- . 7 --Vi'-i i.'1.;"

BiBWrffiHHiSftB 'ii I 7 I •.::» 'Ж Г ЮЛЮ

rsais^wr-c

»as • :;.■ s& . йвдакн H^ssssi; ■>;■

i "K-ltiiiieTaiE^ I - Vv.'

а еж

.ftiiiSBfr r Щ-ffi »T

'■b .'-• ■ V -V

Figure 5 Two initial dictionary. DCT (a) and a random matrix chosen from the training data (b).

those of the image "Cameraman". The first two-dimensional projection of these patch examples (through PCA transform) presented in Figure 8c-d also demonstrates the different distribution forms.

We now present denoising results obtained by our ALM-DL approach and the K-SVD method with noise level in the range of [5,100]. Every reported result is an average of over five experiments, having different realizations of the noise. In Table 1, the PSNRs for six test images using our ALM-DL approach are compared with the K-SVD when redundant DCT is chosen as the initial dictionary, and the best result gained by this two methods are highlighted, from which we can get a conclusion that our method is better than K-SVD for all the noise levels lower than s = 25, and from Table 2 it can be seen that the conclusion is still valid when the initial dictionary is a random matrix. In order to better visualize their comparison, Figure 9 describes the difference

between the denoising results of the ALM-DL and K-SVD. It can be seen that our proposed approach outperforms K-SVD for almost all the noise levels especially for the second type of images. No matter what the initial dictionary is, the PSNR value obtained by the ALM-DL gives an average advantage of 0.2 dB over K-SVD for all the noise levels lower than s = 25, but as the noise increases, the advantages of our approach is gradually weakened, and this will be a future research direction. Figure 10 plots the initial dictionary, the dictionary trained by K-SVD and our ALM-DL algorithm, and the corresponding denoised results of image "Cameraman" with s = 15. To facilitate the visual assessment of images quality, in Figure 10d-f small regions of the physical image are boxed, in which we clearly observe the differences of the edge and the noise those images contain. It can be seen that Figure 10e shows the edge blurred but the proposed method still keeps most part of the edge. What's more, the small boxes of Figure 10e-f also show that the K-SVD has some noise while our method does not.

The above experiments are conducted under the fixed number (i.e., j = 256) of dictionary elements, now we consider four different number of elements: 64, 128, 256, and 512. Figure 11 shows the PSNR values of image "House" and "Peppers". As can be seen, the denoising ability of ALM-DL and K-SVD improves as the number of dictionary elements increase, while the gap of the PSNR value obtained by the two methods is bigger when the elements number is very small, which indicates that our proposed method is more robust.

i 'Л'п jst-

I а л л

I икЬ^УДОЁЗН^З&Ш

ML L к

Гк к к • '/■■■■■. ":::.':' г ни 11 лЖ^кващк^уу N HV 'ЛИНЩЖКЙЙ.......

ш и утиУАШшмш&шш

' ii ч Ш1ш:тжШтт.

9 1.1. ьшжтжжт&шш

ПП'.Т ГНШ&ЖЩ' : Ж

I , , , .- ;; ;... ;; ;;; ;;; ;

■ - - yo»' hM. в

I 11M .ПЩЗЕ

P а л л airlCllblSH

p ь.ьимлчьу-йяДгТ!!? ш /л». -

lib к к \ ST, i'BWSSKS

V I I Ii АЛ' NKJ!!1>B

Pi Ш I ! ' HH >Щ

hi i aw 'ivк Mi ffliv у a

' IT ННГП'Г '■ ,ч. NBfcSSi

■ г П7| Xi) fs иыюавв*

I Л.1«:

i.".v ■:-v--: j ?i I ■.'..■:•>::>

I 1ЛЛЛООВЙЧННЙВ8в1

I имуоок^азказкш!

шшииминьгшянияязш^ 11

II г,™-кшхшшштж

.ними»»»

i и, ш1гтшштшшщ i i

II Шщрм н inBlii * in-::.-.

г "i^-iiii^^-r^ i -

I 1Л1Ш|ЭХЗЙ£ЗеЯЭ£в£

IIRAV.'SflKSWKSiaS^SiiB

II p w и о:« й ii a rs ч в и ti >,

IIV.'.vЛ . .> : -•;: t :

глишитющйжежиае«

i ■■ ;^ин>:ж ни

ii I..!;'гшжяжявяжж^ж'

I 111 -.

I .jiiJHEllblill

I Lklirf t ^ ■ i'r-'.r-IL L L // ' '

IIUHfj' v i г / агг.'ч^г-пв. л iit

И |(|ТН»''Ч\ЯИЕ!| . -II.1 if./»/

С ¡жгг икая ' I пи ,мс ii л я : I п ht' I =;- . эя 1 ил,№ . ^

I 1 ItV. ffi! ц

mm • Й: ¡л*« i j'^'-ii

и.". :-: г : ; iBiiss

Р г ¡ii

11. ..',.■ >: ■: ■:

i Av у>:го:ззаае йВЫ

N WWM«»K;iiSBSgfcSES И'.'.'.'м'.'мл'ййннйянай® 11т^пшшшшш ;:s

II ШМи.

III SSBK iiiiffwviiJiffliSKiiiSisssa^l i ii :.. w mmmi тшШ isffil

IIIЯ М И p®ffi!S Ж»!888 iih In ш ™ on ■:■■■ в !шши aasssiiaii

W-Iall

I LIlLrf^KIEnn'

mu/' '

ilk l l иги '— ; й

в rirair \ .л, n .ь

Fi ■ir^M уЗ

II.1 j'i т./"чвкмв IBBIU Г fir Г Ж9ЯНЯ1ПНПН ' f ив;./ f ' .ч -а

вПЯ i j if' я- - ii. эш

1 III1! .П.' W-^H- ч !П -

i i ■ liin 'еав' KBi

1" '' E. . Ш7Я. 9 ' :-l -

i MMiKi :±im I kv

i. -. • : -

1аяш

№гл'; 'т&я&жтш^-мш

nr. !/,!ШУ/ш.т:т. iiiwwwi^wBiffl 1

и таг мм®®: шшъшж ■ii <;■:;■ ^ и, •:

ш^'и'-гу-; г /-;,:

■ Ш' ч. _

Г 111 li^r' _

I , i i 1ЙК1В=1в11 Г Lllii I !E ■ !->=

IIL k k 1ПГШ ri ilH\ ¡5 ir г i r /! t чкг а, 1 ь

If Ш ГГ®Ч\ЯЛ Я 1 ^s

■ I.1 I f /Ш / "ЧкМИЧЙЮ

г in / i/ .

' I IIHIJII j Ч КЛВЙ. -t Г I A if f fl- - I -3 . 911

1 'U ,tl: -.,ri ч EBB i i 1 «ни у ix - -IIBflfll. . Ut?B, » 1 -J 1

I - - вй: i

I К/ИМЙК S14.VK.»

илv„v:,:■:•:••:•:¥•'•<--:. i -

II.•„•.•)•. >'•.: - T';. .• .:

111 КЖ&ЕЖВЙЕЙ

i MI1 vnwm ssBi^BB'iE

III;i.'.iVX'fViwi? :' и N:i|,iT.,:K'f:«ii¥SL {¡tsi"'-?'

II .:■■,:■,■ •• -

Figure 6 The dictionary learning process with DCT as initialization. From left to right: the first row: the learned dictionaries generated by ALM-DL after 2, 6, 11, 17, 20 iterations; the second row: the learned dictionaries generated by K-SVD after 1, 3, 6, 8, 10 iterations.

\'lvi r M jKfS»l-i!<ßeiHB! !Sst_ f ¡»k é" 1 ïb hï 'K s

:■. -Pï , ai». • -

A^iit .s

№3 !«L

.«an îhhoîîBI am^-y-.û

«ÉiSHfK»! 'fSIfil

:«!?aKsa lata^is «s-- riJ'iil'l ."v'^SBEaSE Miïlâ^ÏÏS:- WMS

:v:h v." KES-S^SÔÏÏÏ a «w -¿a -■fc::':atvi' -d ssHssasHKtsssflitf.

(b) (c)

«¿KB/»'!*.'

a'sea Ma ^ittei;«

F» Iff BI5ÜE S^lï ÎSŒÎCft .iiii>ftî5aî.."«v-B-«..^-..rs ÎÂHP J aifcr-'™* «îar-*

-Vr -VL , ■=■= i r ¡t. isrjiïSL i

shhösbl •. si ««»tf'ij»! «3SBä№«. i?« •tii^ s■.«BË . k«K • •; ti-. >-.- > r- .•'•:••

% «Mfp

j-Ï isrspï x » d

a* -: aiiiii'iuaw'BSüiilfiEWi liK VSHJ Ju ji^iiPHiWSL,« «s isoRB iflja»a

: t^C.. :r<-B;. -..■■... ■■-ai ar*m J lap.r'-ftii'-iÉ .-•••'.-. , i t. Rü IH3 №b „f^c. i

, -M --ai'1 .1

s, -¿r-, ai mb i siibffiïaaib . ■: :"ta v b-. ïri « üas: iqï s* »^i.-idaas^r BW BSSSri»ïJStf iia VeijiRui;«. ri~'**n «= <jRZ JiltiffiHt» StögE-ÏÛ Ii. S

«a A-»Bf ¿-a

w.z as^siasBBäatffiWE.

am&i j»

.la^iti««^. GB~ffiBSïr;;rz -PII

WliMlT .'B-fflï .1 'sïl I BP!

iH3i »L -¡sr.c m i .:r._ as : .«at'k .sai; ■. t

«KH1I OHA t~^S5S (HfiB '..iFSüP^-ai:. K cftöia^.wMKi'ffliiiSV^r ffiii aSjyfl J V TÈ -- — V " . I ÏIÛ.- I

j'tiBtïffatôHi m: s s.* iirîïïïB&iafc •■:fl«i >a

fllHJ Ja jWffi»1 A

iftL IPS f MKm '"1335»

S^P J :, Hn^-Q^ i.j ■

'¿-vr. , 'Hin .ü

■ israya -a: ai

V »I •..-.

¡.rjfishftï.: 'ffibf !pî!b.'iîbs EBffiSiOiîSEiiai'KiEV: * № «ajs»- j .'tiisirsftp .î ri - ■ um,

«¡«¿■jî

.ffiSsjiatï a^aïay '¿-.a

Figure 7 The dictionary learning process with a random matrix as initialization. From left to right: the first row: the learned dictionaries generated by ALM-DL after 2, 6, 11, 17, 20 iterations; the second row: the learned dictionaries generated by K-SVD after 1, 3, 6, 8, 10 iterations.

4.3 The inner sub-problem solving and its computational load

As mentioned in Section 2.2, the inner sub-problem is essential to our algorithm, so we test ALM-DL with three different inner iterations (i.e., m = 1,4,7). Figure 12 shows

the difference of the three denoising results of ALM-DL compared with those of K-SVD, which appears as a zero straight reference line. These comparisons are presented for images 'Boat' and 'Lena'. As can be seen, the number of iterations affects the accuracy of solution very much for

(a) (b)

a- •200 o «xi ex •«X -300 0 200 <0 ECO 930

(c) (d)

Figure 8 An illustration of overlapped patches' distributions from different images. Top: the standard covariance matrix of the 62001 examples extracted from "Barbara" (a) and "Cameraman" (b), and a = 20. Each entry of the matrix is between 0 (black) and 1 (white). Bottom: The first two dimension projection of the 62001 examples extracted from "Barbara" (c) and "Cameraman" (d), and a = 20.

i Ip-,b---,:-::-::=:==:== E=:==:

i I'.'vvyi-iivoii?;

ii r.,.r.i.w>j:-?::5is:=::i3?3 11 v/i'i'.vjj^asiiss

11 ■■■■ :■:■ ............ ■==: ====

iiiffiwpmfflissiiKmsis

i v» :«

HWiCJi;;-:^^ i^™

I IJIrtVKOOSiBSii^S

IHrtVV^VEUiSSfctfk^

IIW^LViM'.'

J P! ' 'tii

II urn

i IF MWHnoiHff w. sifi

I Sfl-¿¿¡is

III,1 V.V.V -■ MffO:- I'My.A^kE; l-i 'mtfFj:::.'-*

Vhfe J V.l S1I3I

r - A dm A. rill IrflBP »11

r JL h L // 'C^HI^ №'

ilk kin Vij iiJIiB Iffl C3 v i ffrflBn'r yR , '^-...eii №1 r VP' laVRid I wn ■i/: rm / ^yriis №

I ,■ ,' ■ Ff , If £ *= ' i f ttKiJ t " FnHKBIUia^

■ rJ, ,iir f ^ ■■x^^a

1 iiinfliBfiias^iH

r r J MWi alt

I ■ i I'm - ^I^RK

(d) (e) (f)

Figure 10 The denoised result of image "Cameraman" with s = 15. Top plots: (a) the initialdictionary, (b) the dictionary trained by K-SVD, and (c) the dictionary trained by ALM-DL algorithm. Bottom plots (d) the reference image, (e) denoised image by K-SVD and (f) our ALM-DL algorithm.

34.5 r

34.4 -

33.9 -

33.8 L 64

128 256

Dictionary Elements

34.6 г 34.5 -34.4 -34.3 -34.2 -34.1 -

34 -33.9 -33.8 -

33.7 -

Peppers

ALM-DL K-SVD

Dictionary Elements (b)

Figure 11 Effect of changing the number of dictionary elements on denoising. (a) Denoising the image "House" with s =15 and DCT as

the initialdictionary; (b) Denoising the image "Peppers" with s = 10 and a random matrix as the initial dictionary.

noise levels lower than a — 25, i.e., the larger the number of iterations, the better the denoising result; and again we get the conclusion that our proposed method outperforms K-SVD much more for noise levels lower than a — 25 as demonstrated in Section 4.2. So, in practical implementation of the proposed algorithm, better results are often produced with more iterations because the approximation is more accurate. However, on the other hand, more accurate approximates need more inner iterations and, thus, more computations. Therefore, an appropriate value of m should be selected to trade off between accuracy and

efficiency. We suggest that selecting m = 7 as the inner iterations is a nice balance.

As we have analyzed in Section 2.3, our proposed method is very similar to Yang's [29] and Ganesh's [37] methods regardless of different application fields, hence we have extended them in Appendix 2 and named them as ADM-ISTA-DL and ADM-FISTA-DL, respectively, we compare them with our ALM-DL and the consequent hybrid algorithm. We evaluate the four methods from three criteria: RMSE, average L1 norm (ALN), and the computation time, the evolution of RMSE and ALN

—B— ALM-DL,inner=1

0 ALM-DL,inner=4

0 ALM-DL,inner=7

- K-SVD

Figure 12 The difference of the denoising results between the ALM-DL with iterations 1, 4, 7 and K-SVD of image "Boat" (a) and "Lena" (b). The initiation dictionary is set to DCT.

—□- ADM ISTA DL

—o - ADM FISTA DL

—e— Our ALM DL

—v~~ - The hybrid method

15 20 iteration

0.7 0.6 0.5

< 0.3 0.2 0.1

—□- ADM ISTA DL

—:>- ADM FISTA DL

—e— Our ALM DL

—v- - The hybrid method

15 20 iteration

Figure 13 The RMSE (a) and average L1 norm (ALN) (b) of four methods for image "Boat" with s = 15. The initiation dictionary is set to DCT and m = 4.

reflect the algorithm's effectiveness while computation time measures the algorithm's efficiency.

Figures 13 and 14 show the RMSE and ALN of image "Boat" in the case of m = 4 and m = 7, respectively. First, compared with ADM-ISTA-DL, both ADM-FISTA-DL and our ALM-DL exhibit faster convergence, with the iterative process, ADM-FISTA-DL behaves slower increase of ALN since they use FISTA in the inner minimization such that has quicker reduction of ALN under the predefined iteration number; our ALM-DL behaves faster decrease of RMSE due to the accelerated update of variables z and y. Second, the hybrid method outperforms the ADM-ISTA-DL, ADM-FISTA-DL, and ALM-DL. Figures 15 and 16 show the RMSE

and ALN of image "Lena" in the case of m = 4 and m = 7, respectively, similar phenomenon is observed. Finally, from the viewpoint of computation time, Table 3 shows that our method possesses the minimum amount of time in the case of m = 4, when increasing the number of inner iteration from m = 4 to m = 7, the computation cost of our method is a litter bigger than that of ADM-ISTA-DL. Considering all the three criteria, it concludes that our proposed approach is a very promising method.

5. Conclusions

In this article, we have developed a primal-dual-based dictionary learning algorithm under the AL framework. The dictionary is updated by summing the

30 28 26

20 18 16

-a- ADM_ISTA__DL

-O - ADM_FISTA__DL

-e— Our ALM_DL

■V— The hybrid method

15 20 iteration

0.7 0.6

0.5 g 04

< 0.3 0.2 0.1

ADM_ISTA__DL

ADM_FISTA__DL

-0— Our ALM_DL

-V— The hybrid method 15 20 25 30 35 iteration

Figure 14 The RMSE (a) and average L1 norm (ALN) (b) of four methods for image "Boat" with s = 15. The initiation dictionary is set to DCT and m = 7.

ADM_ISTA_DL

ADM_FISTA__DL

Our ALM_DL

The hybrid method

15 20 Iteration

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

15 20 Iteration

ADM_ISTA_DL

A DM_F ISTA_DL

Our ALM_DL

The hybrid method 25 30 35

Figure 15 The RMSE (a) and average L1 norm (ALN) (b) of four methods for image "Lena" with s = 25. The initiation dictionary is set to DCT and m = 4.

multiplication of primal and dual variables after each iteration of the AL scheme. The ultimate advantage of this strategy is that the proposed algorithm does not depend on the initialization too much, so it largely avoids getting trapped into some local optima. Experiments on image denoising application show that (1) our proposed approach outperforms the traditional alternating approaches especially for the "Cameraman"-like images whose composite patches are distributed in a non-directional, irregular way; (2) our proposed approach is more tolerant to the number of dictionary elements, which is often unknown for signal/image processing applications.

There are several research directions that we are considering currently. For instance, as proved in [22], the parameter b of the AL scheme updates in a non-increasing way, including the case of b to be a constant, can guarantee its convergence. However, an automatic selection of parameter b will certainly accelerate the convergence, and how to achieve it remains an open question.

Appendix 1

In the appendix, we prove that the iterative scheme (13) derived by employing MM technique is essential to be an ISTA [25,32]. As mentioned in [25], the standard formula of ISTA for solving the general L1-minimization

problem of the form: min {^Hxllj + H(x)}

x'n+l = argmin \ fiWxh + — \\x - (;x'n - 8VH{x>

Setting i = 4p; H(x) = jAx - b - ipyk + zm || ; and 8

= 1/2g, then we give the iterative scheme (13) as follows:

40IMI! + y I A" — \ xm + -AT(b + 2/3yh-z'"-Axm)

This is the same as we have obtained in Section 2.2. Appendix 2

In this appendix, we modify and extend Yang's [29] and Ganesh's [37] methods for dictionary learning problem by adding dictionary updating stage. We are grateful to a referee for pointing out to us Yang's [29] and Ganesh's [37] studies. The ADM framework adopted by these authors is very similar to ours, i.e., they first introduce auxiliary variables to reformulate the original problem into the form of AL scheme, and then apply alternating minimization to the corresponding AL functions. Particularly, in Yang's study they apply ISTA to solve the inner minimization with respect to variable x [[29], p. 6], while in Ganesh's study they apply an accelerated FISTA for solving the inner minimization instead [[37], pp. 15-16]. Although both of them try to find sparest solution under fixed dictionary, we can modify and extend them to our dictionary learning problem for comparison purpose, i.e., we update dictionary A the same as we have done in Equation 10. We call the extended Yang's method as ADM-ISTA-DL and Ganesh's method as ADM-FISTA-DL. The detailed description of the two methods is presented in Diagrams 3 and 4, respectively.

Diagram 3. The detailed description of the ADM-ISTA-DL algorithm

1: initiation: X0 = 0; A0

2: while stop-criterion not satisfied (loop in k):

3: zk+1 =

11M 2 ; bi = -Akxk + b + yk

bi, otherwise

8: Ak+i= Ak + ^+1(X+1)T 9: end while

Diagram 4. The detailed description of the ADM-FISTA-DL algorithm

1: initiation: X0 = 0; A0 2: while stop-criterion not satisfied (loop in k):

-bi,if\\bi\\2 > t

3: zk+1 2b/

l|bi|l2 b1, otherwise

; b1 = -Ak xk + b +

W1 = Xk; Q1 = Xk; t1 = 1

while stop-criterion not satisfied (loop in m):

Wm+1 = shrink(Qm + 1/YAT(-Zk+1 - AkQm + B + 2fiYk), 2P/y)

7: tm+! = 1 (l +

8: Qm+1 = W'' 9: end while

t — 1

+ —-(wm+1 - Wm)

4: while stop-criterion not satisfied (loop in m):

5: Xkm+1 = shrinh{Xim + 1/yAT(-Zk+l - AkXkm + B + ip Yk), 2P/y) 6: end while

7: Xk+1 = Xk- m+1; Yk+1 = Yk + —(-Zk+1 - AkXk+1 + B)

10: Xk+1 = Wn+1; Yk+1 = Yk + —(-Zk+1 - AkXk+1 + B)

11: Ak+1= Ak + (Xk+1)T 12: end while

Endnote

aWe are grateful to a referee for pointing out to us Yang's [29] and Ganesh's [37] studies.

Acknowledgements

This study was partly supported by the High Technology Research Development Plan (863 plan) of P. R. China under 2006AA020805, the NSFC of China under 30670574, Shanghai InternationalCooperation Grant under 06SR07109, Region Rhone-Alpes of France under the project Mira Recherche 2008, and the joint project of Chinese NSFC (under 30911130364) and French ANR 2009 (under ANR-09-BLAN-0372-01). The authors are indebted to two anonymous referees for their usefulsuggestions and for having drawn the authors' attention to additionalrelevant references.

Author details

1College of Life Science and Technology, Shanghai Jiaotong University, 200240, Shanghai, P.R. China 2Department of Mathematics, Shanghai Jiaotong University, 200240, Shanghai, P.R. China

Competing interests

The authors declare that they have no competing interests.

Received: 29 July 2010 Accepted: 12 September 2011 Published: 12 September 2011

References

1. M Aharon, M Elad, AM Bruckstein, The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representations. IEEE Trans Signal Process. 54(11), 4311-4322 (2006)

2. M Elad, M Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process. 15(12), 3736-3745 (2006)

3. J Mairal, M Elad, G Sapiro, Sparse representation for color image restoration. IEEE Trans Image Process. 17(1), 53-69 (2008)

4. M Aharon, M Elad, Sparse and redundant modeling of image content using an image-signature-dictionary. SIAM Imag. Sci. 1, 228-247 (2008). doi:10.1137/07070156X

xm+ = argmin

5. J Mairal, F Bach, J Ponce, G Sapiro, Online dictionary learning for sparse coding, in International Conference on Machine Learning ICML' 09 (ACM, New York, 2009), pp. 689-696

6. DL Donoho, M Elad, V Temlyakov, Stable recovery of sparse over-complete representations in the presence of noise. IEEE Trans Inf. Theory. 50(1), 6-18 (2006)

7. S Mallat, Z Zhang, Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process. 41 (12), 3397-3415 (1993). doi:10.1109/78.258082

8. JA Tropp, Greed is good: algorithmic results for sparse approximation. IEEE Trans Inf. Theory. 50(10), 2231-2242 (2004). doi:10.1109/TIT.2004.834793

9. SS Chen, DL Donoho, MA Saunders, Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129-159 (2001). doi:10.1137/S003614450037906X

10. M Elad, Why simple shrinkage is still relevant for redundant representations? IEEE Trans Inf. Theory. 52(12), 5559-5569 (2006)

11. I Gorodnitsky, B Rao, Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm. IEEE Trans Signal Process. 45(3), 600-616 (1997). doi:10.1109/78.558475

12. B Efron, T Hastie, I Johnstone, R Tibshirani, Least angle regression. Ann. Statist. 32(2), 407-499 (2004). doi:10.1214/009053604000000067

13. B Olshausen, D Field, Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis Res. 37(23), 331 1-3325 (1997). doi:10.1016/ S0042-6989(97)00169-7

14. BA Olshausen, DJ Field, Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images, vol. 381. (Springer-Verlag, New York, 1996), pp. 607-609

15. K Kreutz-Delgado, J Murray, B Rao, K Engan, T Lee, T Sejnowski, Dictionary learning algorithms for sparse representation. Neural Comput. 15(2), 349-396 (2003). doi:10.1162/089976603762552951

16. H Lee, A Battle, R Rajat, AY Ng, Efficient sparse coding algorithms, in Advances in Neural Information Processing Systems 19 (MIT Press, Cambridge, MA, 2007), pp. 801 -808

17. K Engan, SO Aase, JH Husoy, Method of optimal directions for frame design, in IEEE International Conference Acoust., Speech, Signal Process. vol. 5, 2443-2446 (1999)

18. M Yaghoobi, L Daudet, M Davies, Parametric dictionary design for sparse coding. IEEE Trans Signal Process. 57(12), 4800-4810 (2009)

19. M Ataee, H Zayyani, MB Zadeh, C Jutten, Parametric dictionary learning using steepest descent, in Proc. ICASSP20I0 (Dallas, TX, March 2010), pp. 1978-1981

20. M Zhou, H Chen, J Paisley, L Ren, G Sapiro, L Carin, Non-parametric bayesian dictionary learning for sparse image representations, in Neural Information Processing Systems (NIPS), (2009)

21. N Dobigeon, JY Tourneret, Bayesian orthogonal component analysis for sparse representation. IEEE Trans Signal Process. 58(5), 2675-2685 (2010)

22. D Bertsekas, Constrained Optimization and Lagrange Multiplier Method (Academic Press, 1982)

23. RT Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math Oper Res. 1(2), 97-116 (1976). doi:10.1287/moor.1.2.97

24. S Osher, M Burger, D Goldfarb, J Xu, W Yin, An iterative regularization method for total variation-based image restoration. SIAM JMMS. 4, 460-489 (2005)

25. W Yin, S Osher, D Goldfarb, J Darbon, Bregman iterative algorithms for l1-minimization with applications to compressed sensing. SIAM J Imag Sci. 1 , 142-168 (2008). doi:10.1137/test6

26. M Afonso, J Bioucas-Dias, M Figueiredo, Fast image recovery using variable splitting and constrained optimization. IEEE Trans Image Process. 19(9), 2345-2356 (2010)

27. T Goldstein, S Osher, The split Bregman method for L1-regularized problems. SIAM J Imag Sci. 2(2), 323-343 (2009). doi:10.1137/080725891

28. E Esser, Applications of Lagrangian-based alternating direction methods and connections to split Bregman, CAM Report 09-31, UCLA. (2009)

29. J Yang, Y Zhang, Alternating direction algorithms for l1 problems in compressive sensing. Technical Report, Rice University (2009). http://www. caam.rice.edu/~zhang/reports/tr0937.pdf

30. R Tomioka, M Sugiyama, Dual augmented lagrangian method for efficient sparse reconstruction. IEEE Signal Process. Lett. 16(12), 1067-1070 (2009)

31. D Hunter, K Lange, A tutorial on MM algorithms. Am Statist. 58, 30-37 (2004). doi:10.1198/0003130042836

32. I Daubechies, M De Friese, C De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun Pure Appl Math. 57, 3601 -3608 (2004)

33. J Oliveira, J Bioucas-Dias, MAT Figueiredo, Adaptive total variation image deblurring: a majorization-minimization approach. Signal Process. 89(9), 1683-1693 (2009). doi:10.1016/j.sigpro.2009.03.018

34. A Beck, M Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci. 2(1), 183-202 (2009). doi:10.1137/ 080716542

35. E Hale, W Yin, Y Zhang, A fixed-point continuation method for L1-regularized minimization with applications to compressed sensing. CAAM Technical report TR07-07, Rice University, Houston, TX (2007)

36. S Wright, R Nowak, M Figueiredo, Sparse reconstruction by separable approximation, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (October 2008)

37. A Ganesh, A Wahner, Z Zhou, AY Yang, Y Ma, J Wright, Face recognition by sparse representation (2010), http://www.eecs.berkeley.edu/~yang/paper/ face_chapter.pdf

38. R Rubinstein, M Zibulevsky, M Elad, Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit. Technical Report, CS Technion (2008)

doi:10.1186/1687-6180-2011-58

Cite this article as: Liu et al.: An augmented Lagrangian multi-scale dictionary learning algorithm. EURASIP Journal on Advances In Signal Processing 2011 2011:58.

Submit your manuscript to a SpringerOpen journal and benefit from:

7 Convenient online submission 7 Rigorous peer review 7 Immediate publication on acceptance 7 Open access: articles freely available online 7 High visibility within the field 7 Retaining the copyright to your article

Submit your next manuscript at 7 springeropen.com