You are on page 1of 14

Engineering Applications of Articial Intelligence 28 (2014) 6477

Contents lists available at ScienceDirect

Engineering Applications of Articial Intelligence


journal homepage: www.elsevier.com/locate/engappai

Hardware opposition-based PSO applied to mobile robot controllers


Daniel M. Muoz a,n, Carlos H. Llanos b, Leandro dos S. Coelho c,d, Mauricio Ayala-Rincn e,f
a
Electronics Engineering Graduate Program, Faculty of Gama, University of Brasilia, 72444-240 Gama, DF, Brazil
b
Department of Mechanical Engineering, University of Brasilia, 70910-900 Brasilia, DF, Brazil
c
Industrial and Systems Engineering Graduate Program, Pontical Catholic University of Parana, 80215-901 Curitiba, PR, Brazil
d
Department of Electrical Engineering, Federal University of Parana (UFPR), 81531-980 Curitiba, PR, Brazil
e
Department of Mathematics, University of Brasilia, 70910-900 Brasilia, DF, Brazil
f
Department of Computer Sciences, University of Brasilia, 70910-900 Brasilia, DF, Brazil

art ic l e i nf o a b s t r a c t

Article history: Adaptation of mobile robot controllers commonly requires the computation of optimal points of
Received 15 February 2012 operation. Specically, for miniature mobile robots with serious computational limitations, that are
Received in revised form typical of embedded systems, one of the main challenges is the adaptation of efcient computational
27 September 2013
methods in order to nd solutions of complex optimization problems, which demand large execution
Accepted 2 December 2013
Available online 25 December 2013
times. This drawback compels the design of high-performance parallel optimization algorithms which
must run over embedded system platforms. This paper describes how adequate hardware implementa-
Keywords: tions of the Particle Swarm Optimization (PSO) algorithm can be useful for real time adaptation of mobile
Swarm intelligence robot controllers. For achieving this, a new architecture is proposed, which is based on an FPGA
Evolvable hardware
implementation of the opposition-based learning (OBL) approach applied to the PSO (for short HPOPSO),
FPGAs
and which explores the intrinsic parallelism of this algorithm in order to adjust the weights of a neural
Mobile robots
Learning from demonstration robot controller in real time according to desired behaviors. The proposed HPOPSO was applied to the
Fault tolerance learning-from-demonstration problem in which a teacher performs executions of the desired behavior.
Effectiveness of the proposed architecture was demonstrated by numerical simulations and the
feasibility of the adaptive behavior of the neural robot controller was conrmed for two obstacle
avoidance case studies that were preserved when one or more failures on the distance sensors occur. The
HPOPSO, which uses the OBL technique, improves the quality of the solutions in comparison with the
standard PSO. Comparisons of the adaptation time between hardware and software approaches have
demonstrated the suitability of the FPGA implementation of the proposed HPOPSO for attending specic
requirements of embedded system applications.
& 2013 Elsevier Ltd. All rights reserved.

1. Introduction Commonly, the adaptation process requires the renement


of the solution using optimization stages for adjusting the coef-
Designing efcient autonomous mobile robot controllers is a cients of the mobile robot controller. However, this is a challenging
difcult and time consuming process which requires wasting task since solving optimization problems using small robot plat-
resources and efforts. Commonly, the robot navigation is based forms with computational limitations is a time-consuming task.
on continuous interaction between the robot and the environ- Generally, in optimization problems developed over embedded
ment. However, in real world situations the environment is platforms, the dimensionality (number of decision variables) is
constantly changing, which makes desirable to develop robots smaller than in conventional ones because of several restrictions
that are capable to evolve to new behaviors, adapting the control in performance and power consumption related to embedded
strategies to the new situations. Providing the required degree systems. However, even running over these conditions, embedded
of exibility and maintaining robustness is one of the relevant optimization algorithms require to nd a near-optimal solution in
challenges when designing models for robot controllers because of a short elapsed time (order of milliseconds) (Boyd, 2009), above all
the mathematical complexity involved in the necessity of taking in real time requirement cases.
into account any possible emerging change in the environment. Among other approaches, Articial Intelligence (AI) techniques
have been studied for designing autonomous robot controllers
with adaptation characteristics (Pugh and Martinoli, 2009;
n
Corresponding author.
Chatterjee et al., 2005; Knudson and Tumer, 2011). In order to
E-mail addresses: damuz@unb.br (D.M. Muoz), llanos@unb.br (C.H. Llanos), achieve online adaptation characteristics, the proposed solutions
leandro.coelho@pucpr.br (L.d.S. Coelho), ayala@unb.br (M. Ayala-Rincn). must take into account the following aspects: (a) the involved

0952-1976/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.engappai.2013.12.003
D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477 65

algorithms require to solve non-linear and multimodal optimiza- which is implicit in the OPSO algorithm. The OBL technique is
tion problems; (b) the optimization must be carried out in time based on logical operations, and is used to preserve the swarm
scales of milliseconds or few seconds; (c) mobile robot platforms diversity and to improve the global search capabilities of the PSO
usually make use of embedded solutions with computational algorithm. The main objective of this work was to develop a
limitations; (d) embedded systems are usually designed for suitable FPGA embedded implementation for the OPSO that was
operating under several constraints such as portability, hardware focused on the efcient exploration of the intrinsic parallelism of
resources consumption, performance and low energy consu- the OPSO algorithm over FPGAs. Achieving this objective is of great
mption. interest due to the growth and diversity of applications of
The above-mentioned restrictions suggest the necessity of embedded system in several areas such as medicine, automation,
proposing efcient hardware solutions for embedded optimization control, robotics, among others.
problems which achieve real-time requirements. In this context, The HPOPSO architecture developed in this work is used to
Field Programmable Gate Arrays (FPGAs) are a feasible and cheap solve the learning-from-demonstration (LfD) problem (Billard
technology for exploiting the intrinsic parallelism of optimization et al., 2008), in which the training dataset is composed of example
algorithms allowing the solutions to be achieved with a short executions of two obstacle avoidance behaviors (performed by a
elapsed time. In addition, FPGAs provide several hardware demonstration teacher), and stored using on-chip RAM memory
resources such as DSP (digital signal processing) blocks, clock blocks. The HPOPSO architecture is used to solve the optimization
management units, RAM blocks, embedded software processors, problem of the training process of an ANN robot controller, in
among others facilities, which can be used for achieving hardware which the goal is to minimize the approximation error of the
solutions with a low operational frequency (Sass and Schmidt, desired behaviors. In addition, the proposed HPOPSO architecture
2010). All these characteristics make FPGAs suitable for embedded has been used to preserve the desired behavior when one or more
applications, such as those of mobile robotics, in which the failures on the distance sensors occur.
adaptation processes or optimization stages should be executed Synthesis results point out that the proposed HPOPSO archi-
in a short elapsed time and with low power consumption. tecture is feasible to be mapped on commercial FPGAs, requiring
Recently, evolutionary computing techniques have been applied around 63% of the available hardware resources of the selected
to nonlinear embedded optimization problems, in which the device and achieving an operational frequency of 130 MHz. The
gradient-based and the exhaustive search methods are impractical architecture speedups by 3.6 times a desktop Intel Core Duo
because of the assumptions about the existence of the rst processor at 1.6 GHz and achieves a speedup factor of three orders
derivative and because of the large execution time of the involved of magnitude in comparison with an embedded MicroBlaze soft-
algorithms (Rao, 1996). The Particle Swarm Optimization (PSO) is a ware processor implementation. Simulation results demonstrate
swarm intelligence algorithm bio-inspired on the social behavior of that by accomplishing the HPOPSO architecture it is possible to
models of schooling sh and ocking birds. The PSO is a stochastic train the ANN robot controller. This fact allows the robot to learn
population-based optimization technique which provides several new behaviors according to practical applications. In addition, the
desired attributes, such as simplicity, easy implementation, less proposed solution provides a fault-tolerant mechanism, allowing
computational requirements and parallel capabilities (Eberhart and the robot to automatically adapt its controller when a sensor
Kennedy, 1995; Kennedy and Eberhart, 1995; Poli et al., 2007; Banks malfunction occurs. In this case, the suitability of the opposition-
et al., 2007). Therefore, FPGAs can be properly used for exploring based approach was demonstrated by a performance comparison
the inherent parallelism of the PSO algorithm not only by imple- with the standard PSO algorithm.
menting parallel particles, but also by performing as much as The main contributions of this work are synthesized as follows:
possible simultaneous computations, allowing for a performance (a) The hardware implementation of the opposition-based tech-
improvement in terms of execution time. nique applied to the PSO algorithm which provides improved
An additional characteristic of the PSO algorithm is its rapid solutions and achieves expressive speed up factors (due to the
convergence towards optimal points that implies that after a few exploration of the intrinsic parallelism) in comparison with
number of iterations the particles are clustered in sub-optimal common software implementations; (b) The use of the FPGA
solutions (Riget and Vesterstrom, 2002; Pant et al., 2007). This specialized oating-point arithmetic which allows the operations
drawback, namely premature convergence (PC), becomes evident to be computed with high precision and large dynamic range; and
when solving problems with many local optimal points. The loss of (c) A case study demonstrating the facilities of the proposed
diversity among the particles is the main reason for the PC architecture for solving the online training process of an ANN
problem. Many efforts have been made to solve the PC problem, robot controller.
most of them making use of articial diversity for guiding the The remainder of this paper is organized as follows: Section 2
swarm (Pant et al., 2007). However, for embedded applications, presents the related works. Section 3 outlines the standard PSO
especially, when hardware solutions are needed, it is important to algorithm and the opposition-based approach. Section 4 describes
emphasize the importance of using simple operations for introdu- the implementation of the proposed HPOPSO algorithm and points
cing diversity, saving resources and power consumption. out several assumptions and decisions that were adopted for
This paper proposes a novel hardware architecture for explor- simplifying the hardware implementation. Section 5 describes
ing the parallel capabilities of the PSO algorithm applied to the the neural model of the mobile robot controller as well as the
online training process of an Articial Neural Network (ANN) robot LfD and fault-tolerant problems used for validating the proposed
controller. The proposed hardware architecture is based on the architecture. Section 6 details the hardware implementations.
parallel version of the opposition-based PSO algorithm (for short Section 7 presents the synthesis and simulation results as well
OPSO), abbreviated here as HPOPSO. Slight modications were as execution time comparisons and, before concluding, Section 8
addressed in the original OPSO (Wang et al., 2007) in order to discusses the implications of the reported results.
simplify its hardware implementation. The architecture was
mapped onto a Virtex5 FPGA and operates using a specialized
oating-point arithmetic, allowing the operations to be computed 2. Related works
with high precision and large dynamic range.
A special feature of the architecture is the application of the Most of the previous works regarding parallel PSO approaches
opposition-based learning (OBL) technique (Tizhoosh, 2005), in software make use of a networked array of master/slave Central
66 D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477

Processing Units (CPUs) and are tailored for large-scale optimiza-


tion problems (Schutte et al., 2004; Jin and Rahmat-Samii, 2005;
Poli et al., 2007; Koh et al., 2006).
On the other hand, different approaches have been proposed for
FPGA implementations of the PSO algorithm. For instance, a discrete
PSO has been developed using a hardware software co-design in
Farmahini-Farahani et al. (2010). In this approach the tness func-
tions were implemented on Altera NIOS II microprocessors and the
update process of the particles was performed in hardware. Although
the software implementation of the tness functions simplies the
implementation, the performance decreases due to the sequential
way of evaluating the cost functions (von Neumann bottleneck). In
general, the process that consumes most of the time during the
treatment of the optimization problem is the evaluation of the tness
functions. Therefore, a pure hardware implementation is a suitable
solution for real-time embedded applications.
A population-oriented hardware architecture for PSO with Discrete
Recombination was presented in Pena and Upegui (2007). In this
approach the architecture was validated using 32 dimensional bench-
mark functions. Mehmood et al. (2008) describe an FPGA parallel
version of the PSO algorithm called Hardware Oriented PSO (HPSO),
which presents a modular approach separating the swarm block from
the tness function block. The HPSO is applied to an object detection
problem showing that the algorithm can be easily recongured for
different detection tasks, either by setting up the parameters of the
tness functions or by dening new tness functions. An FPGA
implementation of the PSO algorithm and its application to coefcient
adaptation of Innite Impulse Response (IIR) lters were presented in
Gupta and Mehra (2011). In these hardware implementations PSO-
based adaptive lters with adjustable parameters are effectively used
for identifying unknown IIR systems and demonstrate a speed
enhancement of the execution time.
Most of the above-mentioned previous works, implemented in
hardware, are based on a xed-point arithmetic representation.
There are a few publications reporting FPGA implementations of the
PSO algorithm using oating-point arithmetic. For instance,
Tewolde et al. (2009, 2012) presented a single precision oating-
point serial PSO with interfaces to swarm memories and tness
evaluation modules, showing speed-up factors of 359 and 653,
respectively, for the Sphere and Rosenbrock benchmarks, in compar-
ison with a 16-bit microcontroller. Muoz et al. (2009, 2010a)
presented two hardware architectures of the PSO (respectively,
full-parallel and partially parallel). In this approach, the authors Fig. 1. (a) General FPGA block structure, (b) switching block and (c) connection block.
introduced the application of parameterizable oating-point arith-
metic and trigonometric libraries which allow the user to select the
adequate bit width representation according to the application. characteristics, in particular, Chowdhury et al. (2009) proposed an
There are few works reporting FPGA implementations of the FPGA based solution of an adaptive perceptive PSO (APPSO) for
PSO algorithm for training neural networks and most of them optimizing a fuzzy inference engine applied to medical diagnosis. This
propose the use of a xed-point arithmetic representation. In this APPSO approach allows particles to vary their perception of radius
case, Reynolds et al. (2005) implemented a PSO algorithm in and/or number of sampling directions. Also, focusing on the obtention
FPGAs for inversion of large neural networks and showed that of these characteristics, hardware parallel implementations of the
the PSO computing time is approximately six times faster than in attractive-repulsive PSO (HParPSO) and the passive congregation PSO
conventional computers. An FPGA architecture of a wavelet neural (HPPSOpc) (introduced in Muoz et al., 2010b, 2011, respectively)
network with PSO has been applied to a prediction problem were developed for preserving the swarm diversity. For both these
showing that the performance of the PSO algorithm improves by approaches, convergence results using two benchmark test functions
working with a suitable number of particles (Lin and Tsai, 2007). have demonstrated their suitability for embedded applications in
Duren et al. (2007) have presented a real-time inversion of a terms of execution time. Cavuslu et al. (2012) have implemented on
multilayer perceptron neural network using the PSO algorithm FPGAs a neural network with improved PSO learning for identication
and applied the circuit architecture for estimating the perfor- of dynamic systems. The implementations are based on a oating-
mance of a sonar system. The proposed circuits were mapped on point arithmetic and the improved PSO is obtained by modifying the
the SRC-6e computer which makes use of multiple FPGAs. velocity update function adding an extra random term for reducing
Efcient FPGA based solutions for implementing population based the possibility of stuckness in a local minimum.
optimization algorithms must be able to provide high performance Since the computational resources in FPGAs are limited, a
computations and to preserve the swarm diversity, producing good friendly hardware architecture of the PSO must consider easily
quality solutions and avoiding the premature convergence problem. implementable operators. In this work is proposed a novel hard-
Several works were developed focusing on providing these ware parallel PSO architecture, which makes use of the opposition
D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477 67

based learning (OBL) approach and apply it for training a neural updated by executing the following equations:
network mobile robot controller. The OBL approach, explained in
the next section, is an easily hardware implementable technique xijt 1 xt t 1
ij vij 1
(based on a NOT operator) which allows particles to improve the
quality of the solutions and to preserve the swarm diversity. vijt 1 wvt t t t t
ij c1 r 1 yij  xij c2 r 2 ysj  xij 2
Additionally, the proposed approach makes use of a oating-
point arithmetic representation allowing the computations to be where r1 and r2 are the uniformly generated random numbers
performed with a large dynamic range and high precision. in the range [0,1], yij is the personal best position found by the
particle i around the jth dimension and ysj is the global best
position among all the particles around the jth dimension. The
3. Background velocities vij are clamped to the range  vmax ; vmax  avoiding the
particles leave the search space.
This section presents the necessary concepts and nomencla- There are three parameters: the inertia (w), the cognitive (c1)
ture. Initially, a brief description of FPGA devices is presented, and the social (c2) coefcients. Large values for the cognitive
focusing on the advantages of using eld programmable devices coefcient (c1) indicate particles with a high self-condence on
for implementing hardware parallel computations. Finally, the their experience and large values for social coefcient (c2) provide
basic PSO operation and the opposition-based learning algorithms a particle with a high condence in the swarm (van den Bergh,
are described. 2002). At each iteration (t) a new velocity vi and a new position xi
for each particle in the swarm are computed. If the tness value of
the current position xi improves the tness value of the personal
3.1. FPGA concepts best position yi, then the personal best position is replaced with
the current position of the particle. If the tness value of the
FPGAs are eld-programmable logic devices which contain a current position xi is lower than the tness evaluated at the global
matrix of Congurable Logic Blocks (CLBs) interconnected by an best position ys, then the global best position is replaced with the
array of routing resources implemented in CMOS technology. Fig. 1 current position xi of the particle. The inertia weight coefcient w
details the FPGA block structure. CLB features depend on both can be congured to decrease linearly from 1 to 0 until the
producers and family devices; however, they are typically small stopping criteria are met.
tables with 4, 5 or 6-bit inputs, namely Lookup Tables (LUTs), D
ip-ops and several multiplexers, allowing the truth value table
of basic Boolean functions to be implemented in hardware. In 3.3. The opposition-based learning approach
order to implement complex circuits, CLBs are connected by a
programmable network of connection and switching blocks. The The opposition-based learning (OBL) approach, rst introduced
connection block (see Fig. 1c) allows logic block inputs and outputs by Tizhoosh (2005), is a simple technique which allows the
to be assigned to horizontal or vertical tracks. The switching block population-based algorithms to search for an optimal point in
(see Fig. 1b) allows a signal on a track to connect to another track. the opposite direction of the current search. The basic idea is that
The connections in the switching and connection blocks are whenever a solution is being explored in a direction, it is benecial
made by programmable points. Commonly, a programmable point to consider the opposite direction as well (Al-Qunaieer and
consists of a pass transistor controlled by a static random access Tizhoosh, 2010). The OBL approach is based on the denition of
memory cell (SR) to hold the user dened conguration values. opposite number, given by the below equation:
Fig. 1b depicts a planar switching box topology in which a wire in x a b  x 3
track number 0 (point A) connects only to wires in track number 0
(points B, C or D) in adjacent channel segments. Modern FPGA where x is a real number dened in the range a; b and x is the
devices contain embedded DSP blocks, RAM blocks, dedicated opposite number of x. This denition is also valid for an N-dimensional
processors and digital clock managers allowing the implementa- point xj dened in the range aj ; bj ; j 1; ; N.
tion of more complex designs (Hauck and Dehon, 2008). The OBL technique was initially applied to genetic algorithms in
One important issue in circuit design with FPGA platforms is which anti-chromosomes allow the search process to be accelerated
that the above-mentioned logic resources can be accessed in a (Tizhoosh, 2005). The OBL was also applied to neural computing in
parallel approach, which provides to the designer the necessary which the concepts of opposite-weight and opposite-network can be
exibility to improve performance of the algorithms (while used to improve the results (Tizhoosh, 2005). Rahnamayan et al.
programming FPGAs to the desired application or functionality (2008) demonstrated formally that, in the case of an unknown
requirements) by exploring different options of parallel architec- function in an N-dimensional space, xj has a higher chance to be
tures (Kilts, 2007; Hauck and Dehon, 2008). Therefore, FPGAs are closer to the solution than xj. Additionally, an empirical verication of
exible enough to implement adequately specic solutions when these mathematical proofs was performed, demonstrating the feasi-
compared with application specic integrated circuits (ASICs), bility of the OBL approach (Rahnamayan et al., 2008).
where the devices are designed in a customized manner for the
specic applications.
3.4. The opposition-based learning applied to the PSO

3.2. The standard PSO algorithm Recently the OBL approach has been applied to enhance the
solutions of the PSO algorithms. Wang et al. (2007) proposed an
In the PSO algorithm, the population is called swarm and opposition-based PSO with Cauchy mutation operation. The OBL
individuals are called particles (mass-less and volume-less). Each approach is applied at random iterations (generation jumping) to
particle i has a current velocity vector vi, a personal best position the entire population and the Cauchy operator is applied to the
vector yi in the search space and a position vector xi, that global best particle in order to avoid local optimal solutions. The
represents a possible solution of the optimization problem. Con- proposed modication achieves a better global search ability for
sidering an N-dimensional evaluation function and a swarm size of multimodal functions, maintaining and has a faster convergence
S particles, the position of the ith particle in the jth dimension is for unimodal functions if compared with the standard PSO.
68 D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477

Jabeen et al. (2009) proposed a PSO algorithm with opposition 3: repeat


based initialization. Then, the initial population is provided as 4: for i1 to S do
input to the standard PSO. The modied PSO was applied to four 5: if f xi o f yi then
benchmark functions, outperforming the solutions if compared 6: yi xi
with several PSO approaches. 7: f yi f xi
Lin and Xingshi (2007) proposed the PSO with the OBL at three 8: end if
stages: initialization, generation jumping and local improvement 9: end for {Find the minimum among all the tness
of the global best individual. The authors applied the proposed function values}
modication to noise benchmark functions and demonstrated that 10: f s minf yi
the OBL approach enhances the solutions of the standard PSO 11: Set ys as the particle with best tness value in the swarm
algorithm. (fs)
Wang et al. (2011) proposed a variant of the opposition-based {Check if the tness value has a noticeable
PSO in which a generalized OBL (GOBL) approach transforms the improvement}
search space of candidate solutions in order to overcome the PC 12: if lastf s  f s o 0:001 then
problem. The GOBL is applied at the initialization process and with 13: FNCFNC 1
generation jumping. Also, the Cauchy operator is applied to the 14: else
global best particle in order to avoid stacking in a local minimum. 15: FNC0
The proposed algorithm was applied to benchmark problems and 16: lastfs fs
results showed that this approach improves solutions for rotated 17: end if
multimodal problems, but performs badly on shifted and large {Check if tness does not change during maxFNC
scale problems. iterations}
Zhang et al. (2009) proposed a quasi-oppositional comprehen- 18: if FNCmaxFNC then
sive learning PSO algorithm. The quasi-opposite particles are 19: for i 1 to S do
generated from the interval between the median and the opposite 20: for j1 to N do
position of the particles. The proposed algorithm was applied to 21: r U  1; 1
benchmark functions demonstrating faster convergence and glo- 22: if absr o 0:5 then
bal search ability than non-quasi-oppositional approaches. 23: xij  xij r=2 {apply OBL to random
Kaucic (2013) proposed an opposite-based PSO with adaptive dimensions}
velocity applied to bounded constrained problems. The proposed 24: end if
algorithm uses a differential evolution scheme for updating the 25: end for
particle's velocity. In addition, it applies the OBL during the initializa- 26: end for
tion phase as well as during a re-initialization phase in which a 27: FNC0
super-oppositional approach is used to re-initialize particles in the 28: else
swarm. The proposed modication was applied to several benchmark 29: for i 1 to S do
functions, demonstrating that the OBL approach applied to the 30: for j1 to N do
initialization phase and to the restarting process prevents the 31: vij wvij c1 r 1  yij  xij c2 r 2  ysj  xij
algorithm from premature convergence. 32: xij xij vij
33: end for
4. The proposed version of the OPSO algorithm 34: end for
35: end if
In order to simplify the hardware implementation of the OBL 36: Update the value of weight factor w
approach applied to the standard PSO algorithm, symmetrical 37: until f s r Thres OR iter maxITER
search spaces are assumed which are appropriated for several
applications, including those proposed in this work. Thus, two
additions are saved, as stated by Eq. (3). In contrast to previous It can be observed in lines 1217 that if the tness value fs does
approaches of the OPSO (Wang et al., 2007, 2011; Jabeen et al., not have a noticeable improvement the counter FNC is incremen-
2009; Lin and Xingshi, 2007) in this work the OBL is applied only ted. Otherwise, the FNC variable is set to zero and the last tness
when the algorithm achieves several iterations without tness (lastfs) is updated to the current tness value fs. In addition, lines
improvement, indicating a possible stagnation. In addition, a 1828 show that the OBL approach is applied to random dimen-
uniformly distributed random number in the range  1; 1 is used sions when the FNC counter reaches the maximum number of
for applying the OBL to randomly selected dimensions. The same iterations without tness improvement maxFNC. The U  1; 1
random number is used for producing a small random modica- function in line 21 is a uniformly distributed random number
tion of the opposite position of the particles. generated in the range 1; 1. The abs function indicates the
The pseudocode of the proposed opposition-based PSO algo- absolute value.
rithm is listed in Algorithm 1. It is important to take into account that the extension of the
proposed algorithm to non-symmetrical search spaces can be
Algorithm 1. Pseudocode for the proposed opposition-based PSO
achieved easily by using one addition and one subtraction opera-
algorithm.
tion in the computation of the opposite number.
1: Set swarm size S, dimensionality N, search space domain
aj ; bj , maximum number of iterations without tness
improvement maxFNC, maximum number of iterations of 5. The problem description
the algorithm maxITER, the tness threshold Thres and
constants w0 , c1 , c2 , xmax , vmax This section initially describes two embedded optimization
2: Initialize randomly the swarm (uniform distribution): xij, problems which are addressed to solve the online training process
yij, vij of a neural network controller of small mobile robot platforms. The
D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477 69

Fig. 2. (a) Mobile robot overview. (b) Model of the neural network controller. The atan function is used as activation function for each neuron.

rst problem is related to the learning-from-demonstration (LfD) 5.3. The mobile robot
process of desired mobile robot behaviors. The second problem is
related to a fault tolerance design which allows the robot to adapt In this work it is used a simulated mobile robot, called Eve
the neural controller when sensor malfunction occurs. Afterwards, Robot, which has a synchronous drive actuator and is equipped
the simulator environment used to acquire the training data as with a ring of seven infrared proximity sensors (three left size
well as to validate the behavior of the mobile robot is described. sensors lf0, lf1 and lf2, one frontal sensor fr and three right size
Finally, a well known neural model used for controlling a syn- sensors rg0, rg1 and rg2), as shown in Fig. 2a.
chronous driven mobile robot and the tness function model used The EyeSim simulator environment (Bruln, 2006) has been
for the optimization problems are explained. used for validating the proposed HPOPSO for solving the LfD and
the fault tolerance problems. This simulator environment provides
the mathematical models of the robot kinematics, motor engines,
position sensors, inertial sensors, among others functionalities. It
5.1. LfD problem runs on Windows OS and uses a C code programming language.
The EyeSim simulator tool also provides a velocity control for
The LfD problem is a subset of supervised learning, in which the Eve robot, allowing the users to specify the rotational and
the agent is presented with labeled training data and learns an linear speed of the robot. Thus, during the teleoperation process
approximation to the function which produce the data (Billard the robot records the state of the distance sensors (inputs) and the
et al., 2008; Argall et al., 2009). In this work, the demonstration angular (wr) and linear (vr) velocities (actions or desired outputs).
process is provided by a teleoperation approach of the mobile The individual wheel speeds (_ R;L ) can be obtained through the
robot. The training dataset is composed of example executions of inverse kinematics equation (4) of a differential drive mobile robot
the task by a demonstration teacher. Thus, the robot is operated by (Bruln, 2006)
a human operator, who is the teacher, and records the state/action " # " # " #
pairs experienced during the execution.
_L 1 1  d=2 vr
 4
In the LfD problem, the proposed opposition-based PSO algo- _R 2 r 1 d=2 wr
rithm and its respective hardware implementation are applied to
where d is the distance between the two wheels and r is the wheel
adapt the weights of a neural network robot controller for
radius.
two obstacle avoidance behaviors: (1) performing trajectories in
the middle of the free space conguration and (2) performing
trajectories close to the external walls. 5.4. The neural model

The robot controller is a single-layer neural network of two


neurons (perceptron model with sigmoid activation function), one
5.2. Fault tolerance problem for each velocity output. The proximity sensors are used as input
neurons (see Fig. 2b), given a total of 14 weights that must be
This problem involves the online training process of the neural adjusted during the training process.
network controller when one or more faults of the robot sensors
are detected and isolated. In this situation, the robot must 5.5. The tness function specication
preserve a pre-dened behavior. Therefore, it is necessary to
specify a fault tolerant system with portability and small execution During the training process of the robot controller each particle
time requirements, allowing the robot controller to be adjusted in of the proposed opposition-based PSO algorithm moves randomly
order to accomplish a specic task. in the 14-dimensional search space (14 weights to be adjusted).
It is important to point out that in this work the fault detection The goal of the optimization process is to minimize the approx-
and isolation algorithms are not implemented. We assume that imation error of the desired robot behavior, provided by a
sensors have a well-known observation model and that fault teleoperation process. According to Algorithm 1, at each iteration
detection and identication techniques can be implemented. the particles evaluate their performance using a tness function
70 D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477

model which compares the state/action pairs between the current The LFSR operates in a xed-point arithmetic; therefore, a xed
solution and the desired outputs. to oat converter is used to represent the random number in
In this work, the proposed tness function is based on the sum oating-point arithmetic. This methodology constitutes a draw-
of square errors between the simulated outputs and the desired back in terms of efciency given that very small numbers are
outputs, as shown in Eq. (5). The simulated outputs are computed represented with less resolution after the xed to oat conversion.
by evaluating the current particle position using the neural model. It can be explained because the Probability Mass Function (PMF) of
The desired outputs were previously recorded using the example uniform oating-point numbers increases its resolution for close
executions of the task by the demonstration teacher to zero numbers (Thomas and Luk, 2008). However, for the
NTD NTD
proposed applications the RNGs are used for creating randomness
f x wri  wrdi 2 vri  vrdi 2 5 during the particles movement, thus, the loss of uniformity, which
i1 i1
is only veried for small numbers (less than 2  3), does not
where NTD is the number of training data samples, wr and vr are represent a relevant problem for the PSO operation.
the simulated outputs for the angular and linear velocities,
respectively, and wrd and vrd are the desired angular and linear
velocities, respectively, obtained from the training dataset.
6.2. Particle architecture

6. FPGA implementations The U1[0,1] and U2[0,1] RNGs of the update equation (2) can be
replaced by the U1[0,c1] and U2[0,c2] RNGs, saving two oating-
The hardware implementation of the opposition-based PSO point multiplications. Thus, the update process of the position of
algorithm (see Algorithm 1) was specied in VHDL hardware each particle requires to calculate two random numbers, ve add/
description language. This architecture, named as HPOPSO, makes sub and three multiplications. Fig. 4 shows the hardware archi-
use of a 27 bit precision oating-point arithmetic. As previously tecture for the update process of the particle position. The particle
remarked, this choice of using oating-point arithmetic is justied architecture is based on a Finite State Machine (FSM) approach. All
given the large dynamic range required during the optimization the operations in each state are executed in a parallel approach.
process in which small and large real numbers are computed. In When the opposition number must be computed, the FSM goes
previous works, several arithmetic and trigonometric operators directly to the opposition state, otherwise the FSM computes the
were developed in hardware, using the IEEE754 standard, and new position using the update equations (2) and (1). Notice that
were validated on FPGAs (Muoz et al., 2010c,d). the opposite number is easily computed by changing the signal bit
A tradeoff analysis, previously performed, demonstrates that of the oating-point representation.
the 27 bit width representation (8 bits for the exponent word and
18 bits for the mantissa word) allows the arithmetic and trigono-
metric operators to save 50% of the embedded FPGA DSP blocks.
This representation also provides a similar dynamic range if
compared with the single precision representation (32 bits). As
expected, a small reduction of precision was veried; however the
associated precision is satisfactory for the application presented in
this work.

6.1. Floating-point uniform random number generator

As explained in Section 2, the stochastic behavior of the PSO


algorithm requires the implementation of a random number
generator (RNG) for the particles movement process. In this work
a RNG based on a 20 bit linear feedback shift register (LFSR) was
used (see Fig. 3). The LFSR component is a shift register based Fig. 4. The opposition-based PSO particle architecture.
technique, in which several bits, called taps, are chosen as a
feedback function (logic XOR function) obtaining a new state.

Fig. 3. Floating-point pseudo-random number generator. Fig. 5. (a) ANN model architecture and (b) tness function evaluation.
D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477 71

6.3. The evaluation unit 7. Results

During the training process, each particle must simulate the This section summarizes the synthesis and simulation results of
ANN robot controller for evaluating its performance. Therefore, it the proposed circuit architecture. The HPOPSO was synthesized in
was necessary to implement in hardware the neural model shown the Xilinx ISE10.1 development tool and was validated for a swarm
in Fig. 2b (single-layer of two neurons). According to Algorithm 1 composed of 10 particles (S 10) optimizing the 14-dimensional
the tness evaluation process depends on the previous updated problem (N 14) of the neural model. Table 1 shows the para-
particle positions (synchronous PSO approach). This fact suggests meters used for the HPOPSO implementation.
that the evaluation unit can share the same resources as the
particle unit. Fig. 5a shows the hardware architecture of the tness
function evaluation. It uses two FPmul (oating-point multiplica- 7.1. The validation environment
tion), two FPadd (oating-point addition) operators and two
comparators for implementing the linear activation function of Fig. 7 depicts the validation environment which is composed
each neuron. of the EyeSim simulator running on a PC, a serial communication
The tness functions were implemented using the same FPmul interface and the HPOPSO architecture implemented on the FPGA
and FPadd units of the ANN output computation, see Fig. 5b. device. The teacher performs the robot teleoperation process using
the keyboard and the training dataset are recorded from the
distance sensor (inputs) and the linear and angular velocities
6.4. General architecture (outputs). After the training process, the global best particle
position (ys), which contains the weights of the neural controller,
The general HPOPSO architecture is shown in Fig. 6. It is is sent through the serial communication module to the PC in
composed of a swarm unit with S parallel particles and an order to validate the learning behavior of the mobile robot.
evaluation unit with S parallel tness functions for evaluating the As explained in Section 5, two main problems are solved by the
performance of each particle. A ROM memory stores the training HPOPSO optimization engine. The LfD problem, in which the robot
dataset. After the evaluation process, the individual detection unit must learn a specic task, and the fault tolerance problem, in
compares the current tness values with the respective best which the robot must perform correctly a task whenever sensor
tness found previously, and updates the individual best position. malfunctions occur. For both the problems, the training dataset
The global detection unit computes the global best tness and was recorded during the teacher demonstration of two desired
updates the global best position. Also, the global detection unit obstacle avoidance behaviors: (1) performing trajectories in the
increments the FNC counter when no tness improvement is middle of the free space conguration and (2) performing trajec-
detected. If the FNC counter equals the maximum number of tories close to the external walls. Thus, the goal of the optimization
iterations without tness improvement, then the opp signal process is to minimize the approximation error during the robot
indicates that the opposition number must be computed. learning process. This learning process is performed by adjusting
The RS-232 block is used to communicate with the simulator the weights of the neural network robot controller.
environment, sending the global best position (ys) when the
maximum number of iterations is reached. Additionally, the Table 1
simulator environment sends to the HPOPSO architecture infor- Parameter settings of the HPOPSO architecture.
mation about a failure in one or more sensors. In the case of a
Parameter Value
sensor malfunction, the architecture discards the respective infor-
mation from the ROM memory. For doing that, the fault vector Swarm size 10
signal, which indicates a failure in a sensor, addresses through a Dimensionality 14
multiplexer either the respective training data or a zero value. This Maximum of iterations 5000
maxFNC iterations 30
allows the HPOPSO to adjust the ANN weights using only the Search space domain [  15, 15]
available information provided by the operational sensors. Fitness threshold 1E  7
Inertia weight [0.9 to 0.1]
Cognitive coefcient c1 2.0
Social coefcient c2 2.0
Maximum velocity [  7.0, 7.0]

Fig. 6. The general HPOPSO architecture. The clk and reset signals are not drawn,
for the sake of clarity. Fig. 7. Validation environment.
72 D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477

Fig. 8. Scenario used for the demonstration process. (a) Behavior 1: trajectory in the middle of the free space conguration. (b) Behavior 2: trajectory close to the external
walls. A training dataset of 820 samples was recorded for each behavior.

Fig. 8 shows the trajectories performed during the demonstra- Table 2


Comparison of the convergence results between the proposed opposition-based
tion process for both the obstacle avoidance behaviors.
PSO and the standard PSO algorithms. Convergence is reported in terms of the
minimum tness value.

7.2. Convergence comparison between O-PSO and PSO Algorithm No fault Sensors failures

In order to demonstrate the suitability of the PSO with the OBL fr lf2,rg2 lf0,rg0
operator, several experiments were conducted for solving the
OPSO-behavior1 11.40 13.58 11.24 10.70
training process of a mobile robot neural controller for the above
PSO-behavior1 11.40 179.65 11.23 10.65
explained behaviors. Table 2 shows a convergence comparison OPSO-behavior2 9.95 10.31 37.84 10.90
between the opposition-based PSO and the standard PSO PSO-behavior2 10.88 10.31 10.39 26.50
algorithms.
It can be observed that in some cases, the PSO with OBL achieves
similar results to the standard PSO. However, the OBL improves the Table 3
results of the standard PSO in the case of the Behavior 1 with failure Synthesis results 10 particles, 14 dimensions, 27bits.
in the frontal sensor. In addition, a convergence improvement was
Implemented core FF 69120 LUTs 69120 RAM 148 DSP48E 64 Freq. (MHz)
achieved in the case of the Behavior 2 without sensor malfunction as
well as for simultaneous failures of the sensors lf0 and rg0. HPOPSO 20 499 43 595 9 20 130.7
27.7% 63.1% 6.1% 31.2%

7.3. Synthesis results

Table 3 shows the synthesis results of the HPOPSO architecture implementing more parallel particles in order to solve more
for a Xilinx Virtex5 family (chip xc5vlx110t). The cost in logic area complex optimization problems.
is reported in ip-ops (FFs), LookUp Tables (LUTs), RAM and DSP
blocks consumption. The performance is presented in megahertz.
These results point out that the proposed HPOPSO architecture 7.4. Behavior learning test
is feasible to be mapped on the selected FPGA device. The area cost
is satisfactory in terms of implemented combinatorial logic (LUTs) Table 4 shows the connection weights obtained for obstacle
and registers (FF) consumption. There are around 36% of the avoidance behaviors. Neuron 1 controls the robot angular velocity
available LUTs for future implementations. The maximum opera- w, whereas neuron 2 controls the robot linear velocity v.
tional frequency of the circuit is around 130 MHz. In the case of Behavior 1, the weight connections to the lfi and
Another important aspect is the DSP block consumption. In the rgi sensors i f0; 1; 2g have a major contribution to the angular
case of a 27 bit width representation, the 10 parallel particles velocity computation. In contrast, the weight connection to the
implementation consumes around 31% of the available DSPs and, frontal sensor fr has a major contribution for computing the linear
although not shown here, the same architecture using a 32 bit velocity. This model indicates that the robot increments or decre-
width representation requires 62.5% of the DSPs. This fact can be a ments the linear velocity according to the frontal distance to the
drawback when implementing more parallel particles or using obstacles whereas turns according to the lateral distance to the
large bit-width arithmetic representations (for example a double obstacles.
precision representation). This would imply that other solutions In the case of Behavior 2, the contribution of the left side
should implement the oating-point multipliers using the avail- sensors lf 0 ; lf 1 ; lf 2 is smaller for the angular and linear velocities
able logic area of the FPGA. estimation. In contrast, the weight connections to the right side
It is important to take into account that the selected FPGA chip sensors rg 0 ; rg 1 ; rg 2 have a major inuence on the computation of
is not the largest device from the Virtex5 family. There are FPGA the velocities. This fact allows the robot to perform counter-
devices with more hardware resources which allow for clockwise trajectories close to the external walls.
D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477 73

Fig. 9 depicts the trajectories obtained for both the obstacle robustness and exibility of the proposed HPOPSO architecture is
avoidance behaviors. The solid lines are the trajectories used by explored by adapting the robot controller when a sensor malfunc-
the teacher demonstration during the teleoperation process. The tion occurs.
dotted lines are the obtained trajectories after the training process. A sensor failure can be detected by using the sensor model or
It can be observed that the proposed HPOPSO architecture satis- analyzing the variance of the associated measurements. However,
factorily performs the training process of the robot neural con- in this work, the sensor malfunction is emulated using some user
troller for both the simulated behaviors. buttons available on the simulator environment. It sends the ID
In order to validate the effectiveness of the training process, the sensor number to the FPGA which starts the ANN training process
learned behaviors were analyzed for unknown environments. disregarding the training data of failure sensors. Once the new
Fig. 10 shows the trajectories performed by the robot in different weights have been adjusted to the new situation the FPGA device
scenarios. It can be observed that the validation test was per- returns the ys values through the serial communication.
formed in scenarios with different degrees of complexity. In Table 5 shows the weights obtained for a normal sensor
general, the robot performs the task correctly. However, some operation and three sensor failures. These results were obtained
collisions were detected in the case of the Behavior 2 (right for trajectories performed in the middle of the free space cong-
column), specically in scenarios with bordering obstacles (see uration (Behavior 1). The rst seven rows correspond to the weight
Fig. 10h and i). Notice that this situation was not considered during connections of the rst neuron (angular velocity output) and the
the demonstration process (see Fig. 8). last seven rows to the weight connections of the second neuron
(linear velocity output).
7.5. Fault tolerance test It can be observed that under the normal operation of sensors
(no fault column) the ANN imitated the teacher demonstrations
One of the advantages of using hardware parallel architectures used for acquiring the training dataset. Notice that except for the fr
for training an ANN robot controller is the possibility of achieving sensor, all the other sensors contribute to the angular velocity.
good quality solutions in a short elapsed time. This fact allows the Only the fr sensor contributes to the linear velocity. All the weights
robots to adapt the control strategies in an online fashion. The connecting input data of failure sensors are equal to zero. In the
case of a failure in the frontal sensor (fr column), one can observe
that the sensors lf2 and rg2 (ys41 and ys51 weights) are used for
estimating the linear velocity. In the case of simultaneous failures
Table 4 of the sensors lf2 and rg2, the training process has increased the
Connection weights obtained by the HPOPSO architecture.
contribution of the sensors lf1 and rg1 (ys20 and ys30 weights).
Behavior 1 Behavior 2
Finally, in the case of failures of the sensors lf0 and rg0, no major
modications in the neural controller were conducted by the
Connection weight Value Connection weight Value HPOPSO architecture.
Fig. 11 shows the trajectories obtained for each failure case
ys00 lf 0  w 0.069869 ys00 lf 0  w 0.024279
problem. It can be observed that the trajectory obtained after the
ys10 rg 0  w 0.041551 ys10 rg 0  w 0.855401
ys20 lf 1  w 0.182169 ys20 lf 1  w 0.307474 compensation of the failure of the sensor fr presents some
ys30 rg 1  w  0.188934 ys30 rg 1  w  0.378247 collisions, specically for curves in reduced spaces.
ys40 lf 2  w 0.149396 ys40 lf 2  w 0.052946 In the case of failures on sensors lf2 and rg2 free collision
ys50 rg 2  w  0.031821 ys50 rg 2  w  0.504158 trajectories were achieved for the rst two scenarios. However,
ys60 fr  w  0.079617 ys60 fr  w  0.064920
 0.003519  0.057201
some collisions were observed for complex scenarios in which
ys01 lf 0  v ys01 lf 0  v
ys11 rg 0  v 0.006841 ys11 rg 0  v  0.912649 border obstacles are presented (Fig. 11g and h). In that case, one
ys21 lf 1  v  0.005563 ys21 lf 1  v 0.032133 can conclude that the compensation performed by incrementing
ys31 rg 1  v  0.007534 ys31 rg 1  v 1.471020 the contribution of sensors lf 1 and rg 1 is not enough for estimating
ys41 lf 2  v  0.011730 ys41 lf 2  v 0.002326
the diagonal distance to the obstacles. Finally, in the case of
ys51 rg 2  v  0.006720 ys51 rg 2  v 0.361089
ys61 fr  v 0.368095 ys61 fr  v 0.019588
failures in sensors lf 0 and rg 0 the weights of the neural controller
were effectively adjusted, obtaining free collision trajectories.

Fig. 9. Simulation results for the obstacle avoidance behaviors. (a) Behavior 1 and (b) Behavior 2.
74 D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477

Fig. 10. Simulation results for unknown environments. Left and right columns correspond to the trajectories for the behaviors 1 and 2, respectively. Points indicate collisions.

7.6. Execution time comparison Table 6 shows a comparison of the execution time per iteration
between the proposed HPOPSO architecture and two software
The large computational cost of population-based optimization implementations. The rst one is based on a C code implementa-
algorithms is a drawback for portable systems applications, in tion using an Intel Core 2 Duo, at 1.6 GHz, 2 GB RAM, Windows XP
which the algorithms must be executed with a high performance O.S. The second one is based on the Microblaze embedded soft
and low power consumption. In order to demonstrate the feasi- processor, operating at the same frequency as the HPOPSO
bility of the proposed HPOPSO architecture it is important to architecture and 64 kB program memory.
compare the execution time between hardware and common Simulation results using the ModelSim simulator tool show
software solutions for mobile robotics. that one iteration of the proposed HPOPSO architecture requires
D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477 75

106 118 clock cycles. According to the synthesis results reported in numbers are only valid as a rough approximation to the expected
Table 3 the maximum operational clock frequency of the HPOPSO speedup factor. However, a fair performance comparison between
circuit is around 130 MHz. Therefore, for a total of 5000 iterations, hardware and software solutions should use the same computa-
one can expect an execution time of 5.3 s. On the other hand, the tional platform, operating at the same clock frequency. From this
Desktop solution requires around 19.29 s to execute the same point of view, it was demonstrated that the proposed hardware
number of iterations. Therefore, the proposed hardware architec- approach reaches an expressive acceleration in comparison with
ture achieves a speedup factor of 3.6 in comparison with the common microcontroller based miniature mobile robot platforms.
Desktop software solution. In addition, the proposed HPOPSO
achieves a speedup factor of 6248 in comparison with the Micro-
blaze implementation. This fact demonstrates the suitableness of 8. Discussion of the results
the hardware based solution for training/adapting neural robot
controllers on portable applications. According to the previously mentioned results, one can conclude
It is important to stress here that more efcient software that a hardware implementation of the parallel opposition-based
implementations might be achieved in assembler code, and these PSO algorithm is, in general, suitable for embedded optimization
problems in terms of quality of the solution and performance.
Synthesis results (Table 3) pointed out that the proposed
Table 5 HPOPSO architecture is effectively implementable on commercial
Best solutions obtained by the HPOPSO after the training process when failures in FPGAs. The circuits spend around 27% of the available ip-ops
the IR sensors are emulated.
and 63% of the available LUTs for implementing the combinatorial
Connection No fault Sensors failures logic.
weights The hardware resources consumption depends not only on the
fr lf2, rg2 lf0, rg0 proposed architecture but also on the bit-width of the arithmetic
representation. The Virtex5 family FPGA uses DSP48Es blocks
ys00 0.069869 0.075708 0.107307 0.000000
ys10 0.041551 0.014642 0.103043 0.000000 which performs 23  18 bit multiplications. In this work, an
ys20 0.182169 0.186576 0.203626 0.196303 efcient hardware implementation of the HPOPSO architecture
ys30  0.188934  0.193240  0.221292  0.151724 was achieved using a 27 bit width oating-point representation,
ys40 0.149396 0.106532 0.000000 0.160072
ys50  0.031821  0.074847 0.000000  0.056022
ys60  0.079617 0.000000 0.009205 0.070009 Table 6
ys01  0.003519 0.035967 0.006154 0.000000 Comparison of the execution time per iteration.
ys11 0.006841 0.100241  0.053956 0.000000
ys21  0.005563  0.002861 0.000085 0.000499 Hardware Software Software
ys31  0.007534 0.030455 0.020045 0.002083 HPOPSO Intel Core2 Duo MicroBlaze
ys41  0.011730 0.153950 0.000000  0.009881 FPGA xc5vlx110t 2 GB RAM, 1.6 GHz 64 kB, 100 MHz
ys51  0.006720 0.145633 0.000000  0.000868 100 MHz ANSI C code ANSI C code
ys61 0.368095 0.000000 0.359959 0.368050 1.061 ms 3.858 ms 6.63 s

Fig. 11. Sensor fault tolerance results. Solid lines are trajectories without sensor malfunction. Dashed lines correspond to the robot trajectories after the sensor failures
compensation. Points indicate collisions with obstacles.
76 D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477

which allows the representation of real numbers using 18-bits for As future works is intended the implementation of the training
the mantissa word. This fact allows for saving logic area and DSP process in a real small mobile robot as well as to include an infrared
blocks in comparison with a single precision implementation. sensor model in order to automatically detect sensor failures.
The HPOPSO architecture for training the neural robot con-
troller has been implemented entirely in hardware. This choice is
justied because of the high computational cost required for the Acknowledgments
neural network training process. However, several considerations
must be done for improving the performance of the proposed
The authors would like to thank the National Council of
architecture. Additional parallel computations, especially during
Scientic and Technological Development of Brazil CNPq (Pro-
the evaluation of the tness functions which is the most time
cess 142033/2008-1), the PRONEX program (FAPDF/MCT/CNPq
consuming process, will decrease the number of clock cycles per
Process 193.000.580/2009) for its nancial support and to the
iteration of the algorithm. However, it will require FPGA devices
Xilinx University Program.
with more hardware resources than those applied in the given
experiments. The use of embedded soft-processors can provide a
exible platform for partitioning the navigation algorithms and References
executing functions that are not computationally complex.
Simulation experiments have demonstrated the learning cap- Al-Qunaieer, H.R.S., Tizhoosh, F.S., 2010. Opposition based computing a survey. In:
abilities of the robot controller which allows the robot to accom- Proceedings of the International Joint Conference on Neural Networks. Barce-
plish different tasks as well as to maintain different desired lona, Spain, pp. 10987576.
Argall, B., Chernova, S., Veloso, M., Browning, B., 2009. A survey of robot learning
behaviors when one or more faults occur in the distance sensors. from demonstration. Robot. Auton. Syst. 47 (5), 469483.
Another important aspect is that the proposed hardware Banks, A., Vincent, J., Anyakoha, C., 2007. A review of particle swarm optimization.
architecture takes advantage of parallel processing for accelerating Part I. Background and development. Int. J. Nat. Comput. 6 (4), 467484.
Billard, A., Callinon, S., Dillmann, R., Schaal, S., 2008. Robot programming by
the training process of the robot controller. Execution time results demonstration. In: Siciliano, B., Khatib, O. (Eds.), Handbook of Robotics.
point out that the training process requires 5.3 s (3.86 ms per Springer, New York, USA. (Chapter 59).
iteration). Although these results are not tailored for real time Boyd, S. Real time embedded convex optimization. In: International Symposium on
Mathematical Programming, online (August 2009). URL http://ismp2009.eecs.
mobile robot applications, it is important to take into account that
northwestern.edu/Plenaries/.
the proposed architecture achieves a speed up factor of 3.6 in Bruln, T., 2006. Embedded Robotics. Springer-Verlag, Germany.
comparison with a software implementation running on a desk- Cavuslu, M., Karakuzu, C., Karakaya, F., 2012. Neural identication of dynamic
systems on FPGA with improved PSO learning. Appl. Soft Comput. 12 (9),
top. Therefore, the proposed HPOPSO architecture can be a useful
27072718.
solution for small mobile robot platforms with high performance Chatterjee, A., Pulasinghe, K., Watanabe, K., Izumi, K., 2005. A particle-swarm-
and low power consumption requirements. optimized fuzzy-neural network for voice-controlled robot systems. IEEE Trans.
Ind. Electron. 52 (6), 14781489.
Chowdhury, S., Chakrabarti, D., Saha, H., 2009. Medical diagnosis using adaptive
perceptive particle swarm optimization and its hardware realization using eld
9. Conclusions programmable gate array. J. Med. Syst. 33 (6), 447465.
Duren, R., Marks, R., Reynolds, P., Trumbo, M., 2007. Real-time neural network
inversion on the SRC-6e recongurable computer. IEEE Trans. Neural Netw. 18
This work presents an FPGA implementation of the PSO algorithm (3), 889900.
with opposition based learning approach (HPOPSO). The proposed Eberhart, R., Kennedy, J., 1995. A new optimizer using particle swarm theory. In:
HPOPSO architecture takes advantage of simple operators for Proceedings of the International Symposium on Micro Machine and Human
Science. IEEE, Nagoya, Japan, pp. 3943.
improving the quality of the solutions preserving swarm diversity Farmahini-Farahani, A., Vakili, S., Fakhraie, S., Safari, S., Lucas, C., 2010. Parallel
and avoiding the problem of premature convergence. The entire scalable hardware implementation of asynchronous discrete particle swarm
architecture is described in VHDL and it makes use of a suitable optimization. Eng. Appl. Artif. Intell. 23 (2), 177187.
Gupta, L., Mehra, R., 2011. Modied PSO based adaptive IIR lter design for system
oating-point arithmetic, allowing the optimization process to oper- identication on FPGAs. Int. J. Comput. Appl. 22 (5), 17.
ate with high precision between a large dynamic range. Hauck, S., Dehon, A., 2008. Recongurable Computing. The Theory and Practice of
The HPOPSO has been applied to the learning from demonstra- FPGA-based Computing. Elsevier Inc., Burlington, MA, United States.
Jabeen, H., Jalil, Z., Baig, A., 2009. Opposition based initialization in particle swarm
tion problem in which the training dataset is composed of
optimization (O-PSO). In: Proceedings of the ACM Conference on Genetic and
example executions of the task by a demonstration teacher. Thus, Evolutionary Computation, Montreal, Canada, pp. 20472052.
the HPOPSO architecture adjusts the weights of a neural network Jin, N., Rahmat-Samii, Y., 2005. Parallel particle swarm optimization and nite-
robot controller in order to perform the desired task as well as to difference time-domain (PSO/FDTD) algorithm for multiband and wide-band
patch antenna designs. Int. J. Numer. Methods Eng. 53 (11), 34593468.
preserve the desired behavior when malfunctions on the distance Kaucic, M., 2013. A multi-start opposition-based particle swarm optimization
sensors occur. algorithm with adaptive velocity for bound constrained global optimization.
Two different obstacle avoidance behaviors were used as case J. Glob. Optim. 55 (1), 165188.
Kennedy, J., Eberhart, R., 1995. Particle swarm optimization. In: Proceedings of the
study. The mobile robot behavior has been analyzed for different International Conference on Neural Networks, Perth, Australia, pp. 19421948.
scenarios and test results demonstrate that the robot successfully Kilts, S., 2007. Advanced FPGA Design: Architecture, Implementation and Optimi-
avoids obstacles according to the desired behavior demonstrated zation. John Wiley & Sons, NJ, United States.
Knudson, M., Tumer, K., 2011. Adaptive navigation for autonomous robots. Robot.
by the teacher. Additionally, the HPOPSO architecture obtained Auton. Syst. 59 (6), 410420.
satisfactory solutions adapting the weights of the neural controller Koh, B., George, A., Haftka, R., Fregly, B., 2006. Parallel asynchronous particle swarm
when failures on one or more infrared sensors were simulated. optimization. Int. J. Numer. Methods Eng. 67 (4), 578595.
Lin, C., Tsai, H., 2007. FPGA implementation of a wavelet neural network with
Synthesis results demonstrate that the proposed hardware particle swarm optimization learning. Trans. Math. Comput. Model. 47 (9),
architecture requires around 63% of the available LUTs of the FPGA 982996.
device and achieves an operational frequency around 130 MHz. Lin, H., Xingshi, H., 2007. A novel opposition-based particle swarm optimization for
noisy problems. In: Third International Conference on Natural Computation
Execution time comparisons demonstrate an acceleration factor of
(ICNC 2007), vol. 3, pp. 624629.
3.6 in comparison with a software implementation running on a Mehmood, S., Cagnoni, S., Mordonini, M., Matrela, G., 2008. Hardware-oriented
common desktop platform operating at 1.6 GHz. In addition, the adaptation of a particle swarm optimization algorithm for object detection. In:
proposed architecture achieves a speed up factor of 6248 in Proceedings of the IEEE Euromicro International Conference on Digital System
Design, Parma, Italy, pp. 904911.
comparison with an embedded Microblaze software processor, Muoz, D.M., Llanos, C., Coelho, L.S., Ayala-Rincn, M., 2009. Hardware architecture
operating at 100 MHz. for parallel particle swarm optimization using oating-point arithmetic. In:
D.M. Muoz et al. / Engineering Applications of Articial Intelligence 28 (2014) 6477 77

Proceedings of the International Conference on Intelligent Systems Design and Proceedings of the Swarm Intelligence Symposium. IEEE, Pasadena, CA, USA,
Applications. IEEE, Pisa, Italy, pp. 243248. pp. 389392.
Muoz, D.M., Llanos, C., Coelho, L.S., Ayala-Rincn, M., 2010a. Comparison between Riget, J., Vesterstrom, J., 2002. A Diversity-Guided Particle Swarm OptimizerThe
two FPGA implementations of the particle swarm optimization algorithm for ARPSO (Technical Report). EVALife, Aarhus, Denmark.
high performance embedded applications. In: Proceedings of the International Sass, R., Schmidt, A., 2010. Embedded Systems Design with Platform FPGAs
Conference on Bio-Inspired Computing, Theories and Applications, Liverpool, Principles and Practices. Morgan Kaufmann.
UK, pp. 16371645. Schutte, J., Reinbolt, J., Fregly, B., Haftka, R., George, A., 2004. Parallel global
Muoz, D.M., Llanos, C., Coelho, L.S., Ayala-Rincn, M., 2010b. Hardware particle optimization with the particle swarm algorithm. Int. J. Numer. Methods Eng.
swarm optimization based on the attractive-repulsive scheme for embedded 6 (13), 22962315.
applications. In: Proceedings of the International Conference on Recongurable Tewolde, G., Hanna, D., Haskell, R., 2009. Accelerating the performance of particle
Computing and FPGAs. IEEE, Cancn, Mxico, pp. 5560. swarm optimization for embedded applications. In: Proceedings of the Inter-
Muoz, D.M., Snchez, D., Llanos, C., Ayala-Rincn, M., 2010c. FPGA-based oating-
national Congress on Evolutionary Computation. IEEE, Trondheim, Norway,
point library for CORDIC algorithms. In: Proceedings of the International
pp. 22942300.
Southern Programmable Logic Conference, Porto de Galinhas, Brazil, pp. 5560.
Tewolde, G., Hanna, D., Haskell, R., 2012. A modular and efcient hardware
Muoz, D.M., Snchez, D., Llanos, C., Ayala-Rincn, M., 2010d. Tradeoff of FPGA
architecture for particle swarm optimization algorithm. Microprocess. Micro-
design of a oating-point library for arithmetic operators. J. Integr. Circuits Syst.
5 (1), 4252. syst. 36 (4), 289302.
Muoz, D.M., Llanos, C., Coelho, L.S., Ayala-Rincn, M., 2011. Hardware particle Thomas, D., Luk, W., 2008. Resource efcient generators for the oating-point
swarm optimization with passive congregation for embedded applications. In: uniform and exponential distributions. In: Proceedings of the International
Proceedings of the International Southern Programmable Logic Conference, Conference on Application-specic Systems, Architecture and Processors. IEEE,
Crdoba, Argentina, pp. 173178. Leuven, Belgium, pp. 102107.
Pant, M., Radha, R., Singh, V., 2007. A simple diversity guides particle swarm Tizhoosh, H., 2005. Opposition-based learning a new scheme for machine intelli-
optimization. In: Proceedings of the IEEE Conference on Evolutionary Compu- gence. In: Proceedings of the International Conference on Computational
tation, Singapore, pp. 32943299. Intelligence for Modelling, Control and Automation, Vienna, Austria, pp. 695
Pena, J., Upegui, A., 2007. A population-oriented architecture for particle swarms. 701.
In: Proceedings of the International Conference Adaptive Hardware and van den Bergh, F., 2002. An Analysis of Particle Swarm Optimizers (Ph.D. Thesis).
System. NASA/ESA, Edinburgh, Scotland, pp. 563571. Department of Computer Science, University of Pretoria, South Africa.
Poli, R., Kennedy, J., Blackwell, T., 2007. Particle swarm optimization. Swarm Intell. Wang, H., Li, H., Liu, Y., Li, C., Zeng, S., 2007. Opposition-based particle swarm
1 (1), 3357. algorithm with cauchy mutation. In: Proceedings of the IEEE Congress on
Pugh, J., Martinoli, A., 2009. Distributed scalable multi-robot learning using particle Evolutionary Computation, Singapore, pp. 47504756.
swarm optimization. Swarm Intell. 3 (3), 203222. Wang, H., Wu, Z., Rahnamayan, S., Liu, Y., Ventresca, M., 2011. Enhancing particle
Rahnamayan, S., Tizhoosh, H., Salama, M., 2008. Opposition versus randomness in swarm optimization using generalized opposition-based learning. Inf. Sci. 181
soft computing techniques. Appl. Soft Comput. 8 (2), 906918. (20), 46994714.
Rao, S., 1996. Engineering Optimization, Theory and Practice. John Wiley & Sons, Zhang, C., Z.N, Wu, Z., Gu, L., 2009. A novel swarm model with quasi-oppositional
USA. particle. In: Proceedings of the IEEE Forum on Information Technology and
Reynolds, P., Duren, R., Trumbo, M., Marks, R., 2005. FPGA implementation of
Applications, Chengdu, China, pp. 325330.
particle swarm optimization for inversion of large neural networks. In:

You might also like