Design and Implementation of Fast FPGA Based

2013 International Conference on Electrical Information and Communication Technology (EICT)
Design and Implementation of Fast FPGA Based

Architecture for Reversible Watermarking
Sudip Ghosh1*, Bijoy Kundu2, Debopam Datta3, Santi P Maity,4 and Hafizur Rahaman1,4
1
School of VLSI Technology (Bengal Engineering and Science University at Shibpur, India)
Dept. of Electronics and Telecommunication (Bengal Engineering and Science University at Shibpur, India)
3
Dept. of Electrical and Computer Engineering (University of Illinois at Chicago, USA)
4
Dept. of Information Technology (Bengal Engineering and Science University at Shibpur, India)
*
E-mail: sudip_etc@yahoo.co.in
AbstractThere are diverse hardware realization for digital

watermarking of multimedia proposed in the literature. This
paper focuses on the design and implementation of a fast
FPGA(Field Programmable Gate Array) based architecture
using reversible contrast mapping (RCM) based image
watermarking algorithm. The specialty of this architecture
attracts to the fact of clock-less encoder design and
implementation which makes the design faster. The encoder
module response time is independent of clock frequency, so the
embedding of the watermark is possible as soon as the input is
fetched. The schematic based design and implementation of the
VLSI architecture have been done with Xilinx 14.1 on Spartan
3E FPGA family. The encoder requires 528 4-input LUTs and
303 slices. On the contrary, the decoder requires 613 LUTs and
347 slices. The maximum clock frequency of the decoder is 45
MHz. The results show the viability of low cost, high speed realtime use of the proposed VLSI architecture.
KeywordsFPGA .
VLSI
Architecture,Reversible
I.
Watermarking,
INTRODUCTION
Digital watermarking [1] is an efficient tool to prevent

unauthenticated use of data. Digital watermarks may be used
to verify the authenticity or integrity of the original data.
Nowadays, it is prominently used for tracing copyright
infringements and for banknote authentication. Digital
watermarking is broadly classified depending on the type of
signal like audio watermarking, image watermarking, video
watermarking, and database watermarking etc. The present
work is focused on image watermarking.
In image watermarking, the digital information (like a
digital image, a digital signature or a random sequence of
binary numbers) is embedded into an image. The embedded
information may or may not be perceptible after watermarking
and therefore falls into the category of visible or invisible
watermarking respectively. Depending on the robustness of
the watermark, it can also be categorized as robust or fragile
watermarking[2].
One limitation of watermarking-based authentication
schemes is the distortion inflicted on the host media by the
embedding process. Although the distortion is often
insignificant, it may not be acceptable for some applications,
especially in the areas of medical imaging and military
applications. Therefore, watermarking scheme capable of
removing the distortion and recovering the original media
after passing the authentication is desirable. Schemes with this
978-1-4799-2299-4/13/$31.00 2013 IEEE
capability are often referred to as reversible watermarking

schemes [3].Various Reversible Watermarking techniques
have been proposed with different type of algorithm [4]-[5].
Popular techniques of reversible watermarking are: i)
Difference Expansion, ii) Histogram bin Shifting, iii) Data
hiding using Integer Wavelet Transform, iv) Contrast
Mapping, and v) Integer Discrete Cosine Transform.Usually, a
reversible scheme performs some type of lossless compression
operation on the host media in order to make space for hiding
the compressed data and the Message Authentication Code
(MAC) (e.g., hash, signature, or some other feature derived
from the media) used as the watermark [6]. To authenticate the
received media, the hidden information is extracted and the
compressed data is decompressed to reveal the possible
original media. MAC is then derived from the possible
original media. If the newly derived MAC matches the
extracted one, the possible original media is deemed
authentic/original.
However, in this paper a reversible watermarking technique
is implemented using a specific transform reported by Coltuc
et. al. in [5]. The choice of this technique includes its low
computational complexity and robustness. The primary goal of
the proposed design is to achieve high speed hardware
efficient VLSI architecture. The RCM technique was first
implemented in Matlab to verify the algorithm and analyze
various design constraints. Later the desired architecture is
established in FPGA using Xilinx. The paper is organized by
starting with an abstract followed by section I with an
introduction. Section II describes the related works. Next
section III reports the proposed VLSI architecture of
reversible watermarking followed by the analysis and
experimental results in section IV, finally the work is
concluded in section V with references.
II.
RELATED WORKS
In the scheme proposed by Fridrich et al. [1], Discrete

Cosine Transform (DCT) technique has been implemented.
128-bit hash of all the DCT coefficients is used as the
watermark. The extracted compressed bit-stream is used for
verification;however, the hash contains only the signature of
the image, with no local information.Therefore, despite its
simplicity and ability to detect inauthenticity, this technique is
unable to locate the position where the tampering has been
done.
Van Leest et.al.[3] proposed another reversible
watermarking scheme based on a transformation function that
introduces gaps in the image histogram of image blocks.
One drawback of this scheme is its need for the overhead

information and the protocol to be hidden in the image.
Moreover, a potential security loophole in the scheme is that
given the fact that the computational cost for extracting the
watermark is insignificant; an attacker can defeat the scheme
by exhausting all the 256 possible gray level assuming that the
gray level being tried is the gap. In [5], Coltuc et al. proposed
a Reversible Contrast Mapping (RCM) based algorithm of
reversible watermarking in the spatial domain. It provides a
high data embedding bit-rate at a very low mathematical
complexity. The proposed scheme does not need any
additional data compression but is able to recover the original
image even after alterations in the encoded data.
Over the last decade, a lot of research is performed on
Reversible Watermarking, however, VLSI implementation of
RCM based approach is still an area to be explored. In this
paper, the advantages of RCM based watermarking technique
have been explored and implemented. Major concentration is
given on developing a low cost, high speed VLSI architecture
that can be used for real-time applications. Some significant
hardware implementations of digital watermarking include the
work [8], [9], [10], [11]. Mohanty et al. [9]concentrated on a
spatial-domain invisible-fragile watermarking and their
architecture. But these designs are seriously constrained due to
their hardware complexities. In [12], a hardware architecture
that can insert two visible watermarks in images in the spatial
domain is introduced. The main objective of the proposed
architecture was to decrease the hardware complexity keeping
the performance intact. Employing the advantages of RCM
technique, a low cost hardware efficient VLSI implementation
of RCM based RW has been presented in this paper.
III.
PROPOSED VLSI ARCHITECTURE OF REVERSIBLE

WATERMARKING
The implementation of the watermarking algorithm is done

using the ISE Design Suite of Xilinx for Spartan 3E FPGA
family. FPGA, because of its advantages like reconfigurability, low cost and simpler design process, is used
for the hardware implementation. The entire watermarking
architecture design involved construction of two main blocks,
the encoder and the decoder. Each of the blocks is further
divided into three sub-blocks named as module 1, module 2,
and module 3. Each of these modules is designed individually
through modularization and later interfaced with each other.
The encoder and decoder were designed and simulated
separately.
Both the encoder and decoder designs are described in detail
with their respective modules in the following subsections 1
and 2.
1. ENCODER:
In the proposed architecture, the encoder part is designed in
three stages as given in Fig. 1.
Image Acquisition and Pixel Transform:
The proposed architecture is implemented and optimized for
8 bit gray image. The source to the encoder, which is basically
a device providing image pixel as input, can be a storage
device like a RAM or direct external input by the user in 8 bit
Atransform
MODULE 1
Btransform
A
OUTA
B
Atransform
Btransform
X
Atransform(9:0)
MODULE 2
Btransform
X
Y
Z
OUTB
MODULE 3
Y
Z
Fig. 1. Data flow path in encoder
digital form. In the proposed architecture, original image data

is stored in a 256 byte RAM (eight 32-word by 8-bit SRAM).
As discussed in [5], a specific transformation technique is
performed on the image involving a pair of pixels. Among
various ways of acquiring these pixels from the source,
sequential column wise fetching (each element of a particular
row and column is an 8-bit pixel value represented by an 8-bit
address) from the memory is carried out in this design. The 8bit pixel value read from the memory is then converted to a
10-bit data (adding zero at the 9th and 10th bit position) to
provide the correct form of input for pixel transformation. The
transformation technique [5] is mathematically given by,

(1)
(2)
Where A, B are the pair of input pixels of the original image

and Atransform, Btransform are the pair of transformed pixels. This
transformation technique allows error free transmission and
A
B
SUB
SUB
SUB
SUB
Btransform
B
A
B
A
Atransform
256 byte
RAM
Address decoder
Fig. 2. Data flow path in image acquisition and pixel transform

module
detection of image both at the transmitter and receiver end

respectively [5].
Fig. 2 shows the data flow path in image acquisition and pixel
transform module gives the pixel transform module
implemented by using only two subtractor modules for each
pixel. 10 bit subtractor ensured signed subtraction using twos
complement logic and also prevented overflow.
Control Signal:
As mentioned earlier, the watermarking (embedding
watermark image data into original image) algorithm is
performed on image pixels constrained to a particular domain,
Dc, of the transformed pairs [5]. Domain Dc of transformed
pixels of the original image is defined such that the pair of
transformed pixels, Atransform& Btransform, belong to [0, L] where
L takes values from 0 to 254 leaving 1. The domain Dc
prevents underflow and overflow as well as removes
ambiguous pairs. It also ensures robust error free transmission
of the watermarked image. This module generates control
signals that are essential to carry out data embedding process
which include determining Dc along with other essential
control signals. As mentioned by Coltuc et. al. in [5], three
distinct groups are made partially depending on Dc which are
determined distinctly by three control signals (X, Y, and Z) in
Fig. 3.
The generation of these control signals is briefed below.
A low logic level, 0, of X is generated when pair
Atransform and Btransform belongs to Dc and each of them is
even.
A high logic level, 1, of Y is generated when pair
Atransform and Btransform belongs to Dc and is odd.
A high logic level, 1, of Z is generated when the pair
does not belong to Dc.
These control signals are generated completely using logical
gates as shown on Fig. 3. The 10th bit determines the polarity
(either positive or negative) of the transformed pair while the
9th bit determines if the transformed pair is below 255. The
LSB determines whether it is even or odd.
Data Embedding:
The circuit generating performing the task of watermark

image embedding in the original image is given in Fig. 4. This
module uses the control signals as input to selectively perform
watermarking depending on the control signals. The control
signals determine whether the pixels are to be transformed
before embedding the watermark sequence into the original
image. The LSB of the 2nd pixel among the pair is used to
embed the watermark image while the transformation
information (whether pixels are transformed or not) is
embedded into the LSB of the 1st pixel. The watermarking
algorithm in terms of the control signals is given in pseudo
code as follows.
Watermarking based on control signals:
When X=0, pass pair Atransform and Btransform for

watermarking. Set LSB of Atransform to 1 and LSB of
Btransform replaced by watermark image.
When Y=1, pass pair A and B for watermarking. Set
LSB of A to 0 and LSB of B replaced by watermark
image.
When Z=1, watermarking step is skipped. Set LSB of A
to 0 and the original image pixels are transmitted.
2.
DECODER:
The decoder block is structured similarly like the encoder
block. It is comprised of the three modules, the signal
generation block, the inverse transform block, and the image
and watermark extraction block. The entire decoder
architecture is given in Fig. 5. Following sections from (i)(iii) give a detailed hardware description of the individual
modules of the decoder.
Control Signal:
Similar to the encoder part, the control signals are generated
from the 8-bit input data received from the transmitter i.e. the
encoder. The watermark image data as well as the
transformation information has been embedded into the LSB
of the transmitted pairs. Therefore, the preliminary task of this
module is to extract LSB of both the received pairs. The LSB
of the WIA received signal contained the transformation
information and WIB contained the watermark data. The
Fig. 3. Circuit diagram of the control signal module of Encoder
Fig. 4. Circuit diagram of the watermark data embedding module of encoder
primary task of this module is the generation of the signal

labeled as Dc which checks if the corresponding signal in the
encoder input belonged to the domain Dc. The image
transform module, previously used in encoder, followed by the
Dc check block and few logical blocks generate this signal.
The LSB of WIA determines if the received pair was
transformed. If the LSB, WIA(0), is equal to logical 1 then
the pair was transformed. Consequently the pair is fed to the
inverse transform module; else it is passed on to the Dc check
block. This control is achieved by the 8-bit 2-1MUX and
DEMUX pair. If the generated signal Dc is satisfied, then the
pair corresponds to one of the odd pairs transmitted by the
encoder. It is then passed forward for further processing.
Inverse Transform:
This module is the most important part of the decoder and
consumes major processing time. The received transformed
pairs are performed inverse transform to get original image
pixel. As mentioned in [5], the inverse transform is achieved
by the mathematical expressions as given below:

(3)

(4)
where, denotes the ceil function (the smallest integer

greater than or equal to x).
From (3) & (4), the above inverse transform can be executed
by addition and division without using multiplier. Addition is
performed twice followed by division by 3. All of these tasks
are executed by the inverse transform block in Fig. 5. Keeping
the cost constraint in mind, the division is performed by
repetitive subtraction method limiting the overall decoding
speed. However, the delay due to other combinatorial blocks is
low enough to facilitate a high frequency clock. The divider
used in this module had to be 10-bit because the upper limit of
Atransform and Btransform (inputs WIA and WIB at the decoder) is

255 which could result a 10 bit input at the divider. The ceil
function is achieved by a check operation performed on the
two LSBs, 0th and 1st bit, followed by adder block. In Fig. 5,
the check is performed by a single OR gate, and the output is
added with the counter output, quotient, of the divider block.
Image and Watermark Extraction:
As mentioned earlier, watermark extraction isentirely
performed depending on the control signals. The watermark
image, embedded into the LSB of WIB, is extracted by using
the signal labeled Watermark_seq_sig in Fig. 5. This signal
determines the WIB that contains the embedded watermark
sequence. The watermark sequence is stored in a 128 byte
RAM (four 32-word by 8-bit SRAM). The size of the storage
device depends on the size of the watermark image. The signal
labeled Watermark_seq_sig is applied to the Write Enable
(WE) input to the RAM as shown in Fig. 5. The A and B
output signals from the 8 bit MUX forms the extracted image.
The select line of this MUX is generated from the LSB of the
WIA and carry out (Cout) of the inverse transform block as
shown in Fig. 5.
IV.
ANALYSIS AND EXPERIMENTAL RESULTS
The simulation and implementation of the entire architecture

is carried out in ISE Design suite and other tools of Xilinx.
The hardware is optimized in terms of hardware cost. A
size binary watermark image is used to perform the
watermark embedding and its extraction. The watermarking is
performed on an 8 bit gray image of size . The
experimental results and their analysis are summarized in
following part of this section.
1. Encoder Results :
As discussed earlier, the watermark encoding process is
carried by important blocks like the transform block, control
signal block and the final watermark image embedding block.
Considering the hardware complexity of these blocks, they
were designed and implemented separately and then integrated
Fig. 5. Circuit diagram of the decoder comprised of all necessary modules
to perform the desired operation. The multiplier requirement

of the transform operation of the pair of pixels is achieved
using four subtractors in order to maintain hardware
efficiency. The complex watermark embedding operation is
very efficiently performed by using the control signals
generated from combinational logical gates. Finally, the
watermark embedding is realized using customized
multiplexers and logical blocks. The implementation of the
entire encoder required 303 slices and 528 four input LUTs.
The hardware utilization of encoder along with its sub-blocks
is given in Table I.
TABLE I: DEVICE UTILIZATION SUMMARY (ESTIMATED VALUES)
Logic Utilization
Different modules
Pixel
Control
transform signal
Watermark Watermarking
embedding
encoder
Number of 4 input LUTs
432
17
19
528
Number of occupied Slices
268
10
303
Number of bonded IOBs
152
23
50
230
The encoder module is practically devoid of any clock signals

as the module is realized only using combinatorial and logical
elements. As a result, the watermarking process is fast
although some intensive operations are being performed. The
clock less architecture can be extended by incorporating
parallel processing and pipelining, enabling the system to be
highly effective in real time applications like in digital
cameras, printers, medical and military applications etc.
A concern of the implemented encoder is the combinational
path delay whose maximum value is found out to be 31.642ns.
However, there is a high scope of reducing this delay by
employing pipeline architecture.
2. Decoder results:
The decoder module is having a higher complexity as
compared to the encoder mainly because of the pixel inverse
transform and the recovery of the original pixels. As a result,
the hardware requirement is also higher that the encoder part
which is detailed in Table II. The divider (division by 3) used

for pixel inverse transformation is application specific and is
of subtraction followed by right shifting type. It require only
613, 4-input LUTs, 347 slices and 56 slice flip-flops.This
transform module also involved 4 adders, 1 subtractor, 8 bit
counter and multiplexers.
TABLE II: DEVICE UTILIZATION SUMMARY (ESTIMATED VALUES)
Logic Utilization
Different modules
pixel inverse
transform
control
signal
Watermark
extraction
(decoder)
Number of Slice Flip Flops
48
56
Number of 4 input LUTs
217
13
613
Number of occupied Slices
125
347
Number of bonded IOBs
42
20
37
Because of the hardware complexity, the implemented

architecture is prone to lower response time. The incorporation
of the pipelined architecture facilitated in reducing the delay to
a minimum of 22.663 ns. The maximum clock frequency of
the watermark extraction module is 45 MHz.
V.
CONCLUSION
This paper focuses on the design and implementation of a fast

FPGA(Field Programmable Gate Array) based architecture
using reversible contrast mapping (RCM) based image
watermarking algorithm. To the best of our knowledge, prior
research on RCM based watermarking algorithm with its
VLSI implementation is very shallow. This limited the
comparison of this hardware implementation with others and
hence sole significance has been summarized. The encoder
requires528 4 input LUTs and 303 slices. On the contrary, the
decoder requires 613 LUTs and 347 slices. The encoder
module is practically independent of clock, so the embedding
of the watermark is possible as soon as the input is fetched.
This feature along with low hardware cost facilitates the
prospect of its use in real-time applications like digital

cameras, medical and military applications. The hardware
complexity of the decoder module is higher compared to the
encoder module because of the division followed by the ceil
function in inverse transform module. The maximum clock
frequency of the decoder is 45 MHz.The design is fast, low
cost and easily implementable for real time watermarking.
REFERENCES
[1]
Fridrich, J., Goljan, M., & Du, R. (2001). Invertible authentication

watermark for JPEG images. Proceeding of the IEEE International
Conferenceon Information Technology, 223227.
[2] Cox, I., Miller, M., & Jeffrey, B. (2002). Digital watermarking:
Principlesand practice. Morgan Kaufmann.
[3] Van Leest, A., Van der Veen, M., & Bruekers, F. (2003). Reversible
image watermarking. Proceedings of the IEEE International Conference
onImage Processing, II, 731734.
[4] J. Tian. Wavelet-based reversible watermarking for authentication. In E.
J. Delp III and P. W. Wong, editors, Security and Watermarking of
Multimedia Contents volume 4675 of Proc. of SPIE, pages 679-690, Jan.
2002
[5] Coltuc, D., Chassery, J.M.: Very Fast Watermarking by Reversible
Contrast Mapping. IEEE Signal Processing Letters 14, 255258 (2007).
[6] Juergen Seitz. Digital Watermarking for digital media, University of
Cooperative Education Heidenheim, Germany, 2005
[7] M. U. Celik, G. Sharma, A. M. Tekalp, and E. Saber. Reversible data
hiding. In Proc. of International Conference on Image Processing,
volume II, pages 157-160, Sept. 2002.
[8] Mohanty SP, Ranganathan N, Namballa RK. VLSI implementation of
invisible digital watermarking algorithms towards the development of a
secureJPEG encoder. In: Proceedings of the IEEE workshop on signal
processing systems; 2003. p. 1838.
[9] Mohanty SP, Kougianos E, Ranganathan N. VLSI architecture and chip
for combined invisible robust and fragile watermarking. IETComput
Digital Tech(CDT) 2007;1(5):60011.
[10] Mohanty SP, Nayak S. FPGA based implementation of an invisiblerobust image watermarking encoder. In: Lecture notes in computer
science, vol.3356; 2004. p. 34453.
[11] A. Garimella, M. V. V. Satyanarayan, R. S. Kumar, P. S. Murugesh, and
U. C. Niranjan, VLSI Impementation of Online Digital Watermarking
Techniques with Difference Encoding for the 8-bit Gray Scale Images,
in Proceedings of the International Conference on VLSI Design, 2003,
pp. 283288.
[12] S. P. Mohanty, N. Ranganathan, and R. K. Namballa, A VLSI
Architecture for Visible Watermarking in a Secure Still Digital Camera
(S2DC) Design, IEEE Transactions on Very Large Scale Integration
Systems, vol. 13, no. 8, pp. 10021012, August 2005.

Design and Implementation of Fast FPGA Based

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Design and Implementation of Fast FPGA Based

Uploaded by

Copyright:

Available Formats

2013 International Conference on Electrical Information and Communication Technology (EICT)

Design and Implementation of Fast FPGA Based

AbstractThere are diverse hardware realization for digital

Digital watermarking [1] is an efficient tool to prevent

978-1-4799-2299-4/13/$31.00 2013 IEEE

capability are often referred to as reversible watermarking

In the scheme proposed by Fridrich et al. [1], Discrete

One drawback of this scheme is its need for the overhead

PROPOSED VLSI ARCHITECTURE OF REVERSIBLE

The implementation of the watermarking algorithm is done

Fig. 1. Data flow path in encoder

digital form. In the proposed architecture, original image data

Where A, B are the pair of input pixels of the original image

Fig. 2. Data flow path in image acquisition and pixel transform

detection of image both at the transmitter and receiver end

The circuit generating performing the task of watermark

When X=0, pass pair Atransform and Btransform for

Fig. 3. Circuit diagram of the control signal module of Encoder

Fig. 4. Circuit diagram of the watermark data embedding module of encoder

primary task of this module is the generation of the signal

where, denotes the ceil function (the smallest integer

Atransform and Btransform (inputs WIA and WIB at the decoder) is

ANALYSIS AND EXPERIMENTAL RESULTS

The simulation and implementation of the entire architecture

Fig. 5. Circuit diagram of the decoder comprised of all necessary modules

to perform the desired operation. The multiplier requirement

Number of 4 input LUTs

Number of occupied Slices

Number of bonded IOBs

The encoder module is practically devoid of any clock signals

which is detailed in Table II. The divider (division by 3) used

Number of Slice Flip Flops

Number of 4 input LUTs

Number of occupied Slices

Number of bonded IOBs

Because of the hardware complexity, the implemented

This paper focuses on the design and implementation of a fast

prospect of its use in real-time applications like digital

Fridrich, J., Goljan, M., & Du, R. (2001). Invertible authentication

You might also like