Available online at www.sciencedirect.com -

ScienceDirect PrOCed ¡0

Computer Science

Procedia Computer Science 47 (2015) 272 - 281

Wavelet transform based steganography technique to hide

audio signals in image.

Hemalatha SM, U. Dinesh Acharyaa, Renuka Aa

aDepartment of Computer Science and Engineering, Manipal Institute of Technology, Manipal Universuty, Manipal 576104, India

Abstract

Information security is one of the most important factors to be considered when secret information has to be communicated between two parties. Cryptography and steganography are the two techniques used for this purpose. Cryptography scrambles the information, but it reveals the existence of the information. Steganography hides the actual existence of the information so that anyone else other than the sender and the recipient cannot recognize the transmission. In steganography the secret information to be communicated is hidden in some other carrier in such a way that the secret information is invisible. In this paper an image steganography technique is proposed to hide audio signal in image in the transform domain using wavelet transform. The audio signal in any format (MP3 or WAV or any other type) is encrypted and carried by the image without revealing the existence to anybody. When the secret information is hidden in the carrier the result is the stego signal. In this work, the results show good quality stego signal and the stego signal is analyzed for different attacks. It is found that the technique is robust and it can withstand the attacks. The quality of the stego image is measured by Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Metric (SSIM), Universal Image Quality Index (UIQI). The quality of extracted secret audio signal is measured by Signal to Noise Ratio (SNR), Squared Pearson Correlation Coefficient (SPCC). The results show good values for these metrics.

© 2015 The Authors.Publishedby ElsevierB.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-reviewunderresponsibilityof organizing committee of the Graph Algorithms, High Performance Implementations and Applications (ICGHIA2014)

Keywords: Information security; Steganography; Wavelet transform; PSNR; SSIM; UIQI; SNR; SPCC.

CrossMark

1 Corresponding author. Tel.: +919481752532;

E-mail address: hema.shama@manipal.edu.

1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of organizing committee of the Graph Algorithms, High Performance Implementations and Applications (ICGHIA2014) doi: 10.1016/j.procs.2015.03.207

1. Introduction

Over many years information security is the biggest challenge for researchers. Since cryptography cannot make anything invisible, it is replaced by steganography for unseen communication. Steganography hides secret information in other objects known as cover objects. Cover objects along with the hidden information is known as stego object. The cover can be an image, audio or video. The secret can be text message, image or audio. In this paper the cover is an image and secret information is an audio file. The steganography is achieved in transform domain. There are mainly two types of steganography techniques: temporal domain and transform domain. In temporal domain, the actual sample values are manipulated to hide the secret information. In transform domain, the cover object is converted to different domain such as frequency domain, to get the transformed coefficients. These coefficients are manipulated to hide the secret information. Then the inverse transformation is applied on the coefficients to get stego signals. The temporal domain techniques are more prone to attacks than transform domain techniques because there actual sample values are modified. The transforms that can be used are Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT) or Discrete Wavelet Transform (DWT). In this paper DWT is used because wavelet transformation gives frequency content of a function f(t) as a function of time. The drawback of FFT is that Fourier Transform gives frequency information, but it does not provide information about its timings. This is because the basis functions (sine and cosine) used by this transform are infinitely long. They pick up the different frequencies of f(t) regardless of where they are located. DCT produces artifact problems1.

1.1. Discrete Wavelet Transform (DWT)

In wavelet transformation, a mother wavelet is selected, a function that is nonzero in some small interval, and it is used to explore the properties of the function f(t) in that interval. The mother wavelet is then translated to another interval of time and used in the same way. So with wavelet transforms, signals with sharp discontinuities can be approximated and also they provide a time-frequency representation of the signal. There are many wavelets discovered. The simplest one is the Haar wavelet. Information that is produced and analyzed in real-life situations is discrete. It comes in the form of numbers, rather than a continuous function. This is why the discrete rather than the continuous wavelet transform is the one used in practice. When the input data consists of sequences of integers as in the case for images, wavelet transforms that map integers to integers can be used. Integer Wavelet Transform (IWT) is one such approach2.

One of the most popular cover objects used for steganography is an image. Cover images may be gray scale images or color images. Color images have large space for information hiding and therefore color image steganography is more popular than gray scale image steganography. Color images can be represented in various formats such as RGB (Red Green Blue), HSV (Hue, Saturation, Value), YUV, YIQ, YCbCr (Luminance, Chrominance) etc3. Color image steganography can be done in any color space domain. When the wavelet transform is applied to a color image, the transformation coefficients are obtained for all the three channels in the corresponding representation.

Audio signals are analog signals. To use digital signal processing methods on an analog signal, it is sampled periodically in time. It produces sequence of samples. Audio files are stored in various file formats. WAV file is the simplest format. Unlike MP3 and other compressed formats, WAVs store samples "in the raw" where no pre-processing is required. MP3 is a popular audio signal format used everywhere. The MP3 standard involves a coding technique that includes several methods namely, sub-band decomposition, filter bank analysis, transform coding, entropy coding, dynamic bit allocation and psychoacoustic analysis. The encoder operates on successive tracks of audio signal. Each track contains 1152 samples and one track is further divided into two pieces with 576 samples each. A hybrid filter bank is applied to enhance the frequency resolution4.

When wavelet transform is applied to an image, it is decomposed into four sub-bands LL, LH, HL and HH. LL is the low frequency sub-band and contains approximation coefficients. The significant features of the image are contained in this sub-band. Other three sub-bands are high frequency sub-bands and contain less significant features. It is possible to reconstruct the image by considering only LL sub-band. When audio samples are transformed, approximation and detailed coefficients are produced. Approximation

coefficients contain the most significant features. In this case also it is possible to reconstruct the audio signal by considering only approximation coefficients.

1.2. Characteristics of Steganography

• Imperceptibility: It is the ability to be un-noticed by human beings.

• Capacity: The amount of secret information in bits or bytes or samples

• Security: It is the measure of un-detectability. It is also the measure of the quality of a signal. For images it is measured in terms of Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Metric (SSIM), Universal Image Quality Index (UIQI), Color Image Quality Measure (CQM) etc. and for audio it is measured in terms of Signal to Noise Ratio (SNR), Squared Pearson Correlation Coefficient (SPCC) etc5,6.

1.2.1. Peak Signal to Noise Ratio (PSNR) It is given by equation (1).

PSNR = 10

, [MAX2 \ ...

MAX is the maximum value of pixels (255 for grey scale images). MSE is the mean square error between the original and stego images. It is given by equation (2).

MSE = -J-2f=127=1||0(i,y) -D(U)II2 (2)

O(i,j) is original pixel and D(i,j) is stego pixel. Greater PSNR values indicate better quality. It is expressed in decibels (dB).

1.2.2. Structural Similarity Index Metric (SSIM)

SSIM is an objective image quality metric and is superior to traditional measures such as MSE and PSNR. PSNR estimates the perceived errors, whereas SSIM considers image degradation as perceived change in structural information. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close. These dependencies carry important information about the structure of the objects in the visual scene. The SSIM is given by equation (3).

„„..„ (2xx X y+Cl)(2xoyv+C2)

SSIM = -y-—-, \ (3)

(ax +ay2 + C2)x(x2+y2+Cl) w

where C1 = (k1L)2, and C2 = (k2L)2 are two constants used to avoid null denominator.

L is the dynamic range of the pixel values (typically this is 2# bits per pixel -1). k1 = 0.01 and k2 = 0.03 by

default.

The dynamic range of SSIM is between -1 and 1. Maximum value of 1 will be obtained for identical images.

Equation (3) can be written as the product of three terms: M1, M2 and M3 given by equations (4), (5) and (6) respectively.

= 2xxxy+Cl (4)

x2+y2+Cl v J

M2 = 2x;-xgf+C2 (5)

ax2 +ay2 +C2 v y

M3 = axy + °3 where, c3 = - (6)

ay*OV+ C3 2 v 7

Ml indicates luminance distortion, M2 indicates contrast distortion and M3 indicates structural distortion.

1.2.3. Universal Image Quality Index (UIQI)

UIQI is also an objective image quality measure. It is given by equation (7).

n _ 4xa-xyxxxy (7)

V " O^+^xC^+yZ) ( )

x, y, <jx2,<jy2and oxy are given by equations (8), (9), (10), (11) and (12) respectively.

y= -¿^Y.^Y.Uw.m (9)

- 2 - -^r^iMU) - x)2 (10)

A MXN-

vy^-^xr^Uy^-yy (ID

^ = dh2^2^«*^ - *) CK^ - yr> (i2)

This quality index represents any distortion as an amalgamation of three factors: loss of correlation, luminance distortion, and contrast distortion. To illustrate this, the definition of Q can be written as a product of three components: Q = Qt X Q2 X Q3 Qi, Q2, and Q3 are given by equations (13), (14) and (15) respectively.

ÖVXÖV

^ = (14) Qs - + (15)

Q! represents the correlation coefficient between x and y, which is the measure of degree of linear correlation between x and y. Q2 indicates luminance closeness between x and y. Q3 denotes contrast similarities between the two images. The dynamic range of UIQI is between -1 and 1. For identical images its value will be 1.

1.2.4. Color Image Quality Measure (CQM) It is given by equation (16).

CQM = CPSNRY x Rw) + x (16)

where PSNRY, PSNRU and PSNRV are the PSNR values of Y, U, V components of the color image respectively. CW and RW are the weights on the human perception of cone and rod sensors respectively. In HVS cones are responsible for chrominance perception and rods are responsible for luminance perception. CW = 0.0551 and RW = 0.9449 as specified by HVS. CQM greater value indicates greater image similarity. It is represented in dB.

1.2.5.

Signal to Noise Ratio

It is given by equation (17)

SNR = 10 x log10 (sfe^ (17)

where, MSE = £ SfLiC^i - yd2 , xi is original sample and yi is stego sample

Signal to noise ratio refers to the measurement of the level of an audio signal as compared to the level of noise that is present in that signal. The measurement is usually expressed in decibels (dB). A larger value of SNR implies a better quality. But it is a statically measured quantity and so does not judge the quality as a whole.

1.2.6. Squared Pearson Correlation Coefficient (SPCC)

SPCC measures the similarity level between two signals4. The higher the SPCC, the higher is the similarity level. Its range is between 0 and 1. It is given by equation (18).

SPCC =

z"=1l>;-*)(y;-y)

where xi and yi are the two signals, x and y are their averages.

2. Literature review

It has been proved that hiding in frequency domain rather than time domain will give better results in terms of image quality. According to Human Visual System (HVS), human eye is sensitive to small changes in luminance but not in chrominance. YCbCr is one of the representations where Y is the luminance and Cb, Cr are the chrominance components. The chrominance part can be modified, without visually damaging the overall image quality3. Very few research papers are found in the literature where audio file is hidden in an image. One such technique was proposed by M. I. Khalil7. In this paper a short audio message is embedded in the least significant bits of all the bytes of a pixel. So the maximum size of secret audio is 3*W*H, where W is the width and H is the height of the cover image. Since LSBs are used for embedding, possibility of losing the data is more during compression, cropping, filtering etc. The authors did not perform any quality measurement of the stego image, which is essential to justify that stego image is perceptually similar to the cover image.

MSE and PSNR are not the correct evaluations to judge the image quality. MSE and PSNR are acceptable for image similarity measure only when the images differ by simply increasing distortion of a certain type. But they fail to capture image quality when they are used to measure across distortion types. SSIM and Universal Image Quality Index (UIQI) are widely used method for measurement of image quality based on Human Visual System (HVS)5, 6. YALMAN et.al8 proposed a full-reference Color Image Quality Measure (CQM), based on reversible YUV color transformation and PSNR measure. It is based on HVS. It is measured by the human eye's perception to luminance and chrominance. Using the CQM together with the traditional PSNR approach provides distinguishing results.

To increase the hiding capacity Orthogonal Frequency Division Multiplexing (OFDM) approach is used9 but it requires original cover at the receiver. In its extension to blind steganography, the payload and quality are low. To test the robustness of Discrete Wavelet Transform based steganography algorithm, Vijay Kumar et.al10 evaluated the performance of stego-images by subjecting the stego images to different types of attacks and proved that secret image can be retrieved. These attacks include Gaussian noise, Sharpening, median filtering, Gaussian blur, Histogram Equalization and Gamma Correction. Ali Kanso et.al11 tested their steganography algorithm against the existing steganalytic attacks like histogram test, RS attack, Chi-square test, PSNR test, Structural Similarity Index Metric (SSIM) test etc. RS attack is used to detect stegos with LSB replacement and to estimate the size of the hidden message12. The difference expansion, histogram shifting and interpolation strategies are applied to increase the hiding capacity in image steganography13. Ki-Hyun Jung et.al14 used image interpolation and edge detection to increase payload capacity and image quality.

3. Proposed method

In this paper a steganography technique to hide audio signals in image is proposed. Image can be in any format like .jpg, .bmp etc. and audio also can be in any format like .wav, .mp3 etc. Since audio files contain large no. of samples even for small duration, the cover image has to be considerably large. Color images are suitable because of enough hiding space. Since YCbCr approach is more secure than RGB approach, YCbCr approach is used. The cover image is converted to YCbCr. Then Cb, Cr components and secret audio signal are transformed using IWT. The approximate coefficients of the secret audio signal are hidden in the second and third bit planes of high frequency coefficients of the Cb and Cr. The procedure is as follows:

Embedding: input: cover image C and secret audio S.wav, output: stego image G Step 1: Read cover image C and secret audio S. C=imread('C.jpg') S=audioread('S.wav')

Step 2: Represent C in YCbCr and obtain IWT of Cb component to get four sub bands CLL, CHL, CLH and CHH.

LS = liftwave ( 'haar', 'Int2Int' ) [CLL,CHL,CLH,CHH] = lwt2(double(Cb),LS) Step 3: Obtain IWT of secret audio to get approximation and detail coefficients

[CA, CD] = lwt(double(S),LS) Step 4: Hide the approximation coefficients of secret audio in the second and third LSB planes of CHH and CLH sub bands after encryption. {C1, C2} = IWTencode (CA, CLH, CHH) In this method two bits of the secret message are hidden in one byte of the cover image. Two bits from the secret are XORed with 5th and 4th bits of the cover byte to get encrypted secret bits. Suppose S1 and S0 are two secret bits, then S1' = S1 XOR b5 XOR b4 and S0' = S0 XOR b5 XOR b4, where b5 and b4 are 5th and 4th bits of the cover byte respectively. 3rd and 2nd bits of the cover byte are replaced by these encrypted secret bits. This type of dynamic encryption avoids the need for encryption key. Embedding can be done in the Cr component also in the similar fashion. Here C1 and C2 are the modified CLH and CHH. Step 5: Obtain inverse IWT to get stego Cb. Then convert to RGB format. G = ilwt2(CLL, CHL, C1, C2, LS) G=ycbcr2rgb(YGCr) stegoimage =imwrite(G, 'stego.jpg') Step 6: End Embedding. Extracting: input: stego image G, output: secret audio S.wav Step 1: Read stego image G and represent in YCbCr format. G'=imread('G.jpg') YCb'Cr=rgb2ycbcr(G') Step 2: Obtain IWT of Cb' to get four sub bands: GLL, GHL, GLH, and GHH. LS = liftwave ( 'haar', 'Int2Int' ) [GLL,GHL,GLH,GHH] = lwt2(double(Cb'),LS) Step 3: Extract the encrypted secret audio bits from the second and third bit planes of GLH and GHH. Then decrypt.

CAbin=IWTdecode (GHH, GLH) In this method, two encrypted bits of the secret message are obtained from one byte of the stego image coefficient. Then decryption is done as follows: the two encrypted bits are XORed with 5th and 4th bits of the stego byte to get secret bits i.e., S1 = S1' XOR b5 XOR b4 and S0 = S0' XOR b5 XOR b4. Step 4: Convert to decimal to get approximation coefficients of secret audio. CA=bin2dec(CAbin)

Step 5: Obtain inverse IWT for approximation coefficients obtained in step 4 and considering zeroes for detailed coefficients. The result is secret audio S=ilwt(CA,0,LS) Step 6: End Extracting.

4. Experimental results and analysis

The algorithm is tested by taking color image of size 512 X 512 and varying the secret audio samples. When the payload capacity is increased to 131064 and 262128 samples, two levels of integer wavelet transformation is performed, so that the coefficients to be hidden are reduced to one fourth. While extracting, two levels of inverse wavelet transformation is performed. Since jpeg format is the most commonly used format, jpeg images are considered. There is no influence of the image format on the performance evaluation metrics because both cover and stego images will be in the same format and data hiding is done in the transform domain. Fig. 1 shows the cover and stego images. Fig. 2 and Fig. 3 show the plots of original secret and extracted secret audio signals respectively..

Fig. 1. Cover and stego images: (a) Cover, (b) Stego with 32768 samples, (c) Stego with 65536 samples, (d) Stego with 131064

samples

x104 original

<B 1 T2

I 0 -1 -2 -3

0 0.5 1 1.5 2 2.5

samples

Fig. 2. Original secret audio signal

I 0 -1 -2

0 0.5 1 1.5 2 2.5

samples

Fig. 3. Extractedl secret audio signal

The performance evaluation metrics for the stego image and extracted secret audio are shown in Table 1. The stego image is evaluated using PSNR, SSIM and CQM. UIQI values obtained are same as SSIM and so it is not included in the table. Extracted secret audio is evaluated using SNR and SPCC. It is observed that when the secret audio samples are increased above 131064, the quality of the stego and the extracted secret signals are decreased below the HVS and HAS limits. This is because two levels of wavelet transformation is taken before hiding the secret message. In this case the extracted secret audio differs slightly. Otherwise it is exactly same as the original.

Table 1. Performance metrics for the stego and extracted secret signals

Cover image Secret audio Stego Extracted

512X 512 samples PSNR in dB SSIM CQM in dB SNR in dB SPCC

lena.jpg 32768 41.6 0.954 43 38.3 0.9022

lena.jpg 65536 38.6 0.935 40 36.3 0.8922

lena.jpg 131064 38.7 0.935 40 32.4 0.8353

lena.jpg 262128 24 0.796 25 29 0.7067

By considering one cover image and varying the secret audio samples, the results can be analyzed easily. In this technique, the maximum payload size 262128 samples (each sample is 8 bits) with 512 X 512 color image. If both Cr and Cb components are used to hide the secret message then the quality of the stego decreases and secret message cannot be hidden to the maximum extent possible without crossing the quality metrics limits. It is not proper to compare this work with any other image steganography techniques, where the secret message is not an audio signal. One paper is found in the literature7, where the performance measurement is not done. Anyhow this work is compared with the paper11, where SSIM is measured. CQM is not evaluated in any of the steganography papers. Table 2 shows this comparison. The comparison is made by taking bits per pixel (BPP) as the reference. In Ali Kanso's work even though the PSNR and SSIM are slightly higher, the BPP is very low.

Table 2. Performance comparison with that of other related published work

TECHNIQUE BPP PSNR in dB SSIM CQM in dB

Ali Kanso et. al11 0.006 46 0.9975 Not Calculated

Proposed 0.333 41.6 0.954 43

4.1. Analysis for common attacks

When a steganography algorithm is designed, it is necessary to test its performance by subjecting it to different types of attacks. It should be possible to retrieve the hidden information even if the stego image undergoes certain attacks. The common attacks that the stego image may experience are Gaussian noise, median filtering, JPEG compression, scaling, cropping etc. JPEG compression and scaling may not affect the stego image and extraction process, because embedding is done in the wavelet transform domain. Here two most common attacks are considered: Gaussian noise and median filtering. Gaussian noise attack is performed with zero mean and 0.001 variance. Median filtering is performed using 3-by-3 neighborhood. In both cases the secret audio can be obtained with reasonable SNR and SPCC values. The stego image before and after the attacks are shown in Fig. 4.

Fig. 4. Effect of attacks: a) NO attack, b) Gaussian, c) Median filtering

Table 3 shows the SNR and SPCC of the extracted audio signal

Table 3. SNR and SPCC of the extracted audio signal

Attack type SNR in dB SPCC

No attack 38.3 0.9022

Gaussian noise 37 0.9022

Median filtering 36 0.9020

5. Conclusion

In this paper a secure, robust and high capacity image steganography technique is proposed. It gives good values for all the metrics and hence this is an efficient way to send audio files without revealing its

existence. The performance against some of the attacks is also good. The technique needs to be tested

against other attacks like histogram equalization, cropping, occlusion, translation etc. the experimental results show that the secret audio can be extracted without much distortion in most of the cases.

References

1. David Salomon, Data Compression- The Complete Reference, 3rd edn, Springer-Verlag 2004.

2. M. F. Tolba, M. A. Ghonemy, I. A. Taha, A. S. Khalifa. Using Integer Wavelet Transforms in Colored Image-Stegnography. International Journal on Intelligent Cooperative Information Systems, Volume 4, July 2004. pp. 75-85.

3. Shejul, A. A., Kulkarni, U.L. A Secure Skin Tone based Steganography (SSTS) using Wavelet Transform. International Journal of Computer Theory and Engineering, Vol.3, No.1, 2011. pp. 16-22.

4. Diqun Yan, Rangding Wang, Xianmin Yu, Jie Zhu. Steganography for MP3 audio by exploiting the rule of window switching, Computers & Security 31, 2012. Elsevier publications. pp 704-716.

5. Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh, Eero P. Simoncelli. Image Quality Assessment: From Error Visibility to Structure Similarity. IEEE Transactions on image processing, Vol. 13, No. 4, 2004. pp. 600-612.

6. C.Sasi varnan, A. Jagan, Jaspreet Kaur, Divya Jyoti, Dr.D.S.Rao. Image Quality Assessment Techniques in Spatial Domain. IJCST Vol. 2, Issue 3, 2011. pp 177-184.

7. M. I. Khalil. Image steganography: Hiding short messages within digital images. JCS&T, Vol.11, No. 2. pp 68-73.

8. Yildiray YALMAN, Bsmail ERTURK. A new color image quality measure based on YUV transformation and PSNR for human vision system, 2011. pp 1-18.

9. Jose Juan Garcia-Hernandez, Ramon Parra-Michel, Claudia Feregrino-Uribe, Rene Cumplido. High payload data-hiding in audio signals based on a modified OFDM approach. Expert Systems with Applications 40, 2013. Elsevier publications. pp 3055-3064.

10. Vijay Kumar and Dinesh Kumar. Performance Evaluation of DWT based Steganography. IEEE 2nd International Advance Computing Conference, 2010. pp 223-228.

11. Ali Kanso, Hala S. Own. Steganographic algorithm based on a chaotic map. Communication Nonlinear Science Numerical Simulation, 17, 2012. pp 3287-3302.

12. S. Geetha, V. Kabilan, S.P. Chockalingam, N. Kamaraj. Varying radix numeral system based adaptive image steganography. Information Processing Letters 111, 2011. pp 792-797.

13. Tzu-Chuen Lu, Chin-Chen Chang & Ying- Hsuan Huang. High capacity reversible hiding scheme based on interpolation, difference expansion, and histogram shifting, Springer 2013.

14. Ki-Hyun Jung, Kee-Young Yoo. Data hiding using edge detector for scalable images, Springer 2012.