Scholarly article on topic 'A Real-time 3D Pose Based Visual Servoing Implementation for an Autonomous Mobile Robot Manipulator'

A Real-time 3D Pose Based Visual Servoing Implementation for an Autonomous Mobile Robot Manipulator Academic research paper on "Computer and information sciences"

CC BY-NC-ND
0
0
Share paper
Academic journal
Procedia Technology
OECD Field of science
Keywords
{"Autonomous Mobile Manipulator" / "Visual Servoing" / "Robotic Arm Control" / "Mobile Robotics"}

Abstract of research paper on Computer and information sciences, author of scientific article — Jose R. Sanchez-Lopez, Antonio Marin-Hernandez, Elvia R. Palacios-Hernandez, Homero V. Rios-Figueroa, Luis F. Marin-Urias

Abstract Today, the manipulation of objects by mobile robots is still a challenging task. This task is commonly decomposed on three stages: a) approaching to the objects, b) path planning and trajectory execution of the manipulator arm and finally c) fine tuning and grasping. In this work is presented an implementation of a 3D pose visual servoing for an autonomous mobile manipulator dealing with the last stage of the manipulation task (fine tuning and grasping). The methodology proposed consists of three steps: a) beginning with a fast monocular image segmentation, followed by b) 3D model reconstruction and finally c) pose estimation to feedback fine tuning manipulation control loop. Objects and end-effector are colored in different colors and their models are supposed to be known. Our mobile manipulator prototype consists of a stereo camera under a binocular stand-alone configuration and an anthropomorphic 7DoF arm with a parallel end-effector (gripper). Our methodology runs in real time and is suitable to perform continuous visual servoing. Experimental results are reported.

Academic research paper on topic "A Real-time 3D Pose Based Visual Servoing Implementation for an Autonomous Mobile Robot Manipulator"

Available online at www.sciencedirect.com

SciVerse ScienceDirect

Procedía Technology 7 (2013) 416 - 423

The 2013 Iberoamerican Conference on Electronics Engineering and Computer Science

A Real-time 3D Pose Based Visual Servoing Implementation for an Autonomous Mobile Robot Manipulator

Jose R. Sanchez-Lopeza, Antonio Marin-Hernandeza'b'*, Elvia R. Palacios-Hernandezc'd, Homero V. Rios-Figueroaa, Luis F. Marin-Uriasa

a Department of Artificial Intelligence, Universidad Veracruzana, Sebastián Camacho No. 5, Xalapa, Ver., CP 91000, Mexico bCNRS; LAAS; Université de Toulouse; 7 avenue du Colonel Roche, F-31077 Toulouse Cedex, France cFacultad de Ciencias, Universidad Autónoma de San Luis Potosí, San Luis Potosí, Mexico dSección de Mecatronica, CINVESTAV-IPN, Mexico, D.F.

Abstract

Today, the manipulation of objects by mobile robots is still a challenging task. This task is commonly decomposed on three stages: a) approaching to the objects, b) path planning and trajectory execution of the manipulator arm and finally c) fine tuning and grasping. In this work is presented an implementation of a 3D pose visual servoing for an autonomous mobile manipulator dealing with the last stage of the manipulation task (fine tuning and grasping). The methodology proposed consists of three steps: a) beginning with a fast monocular image segmentation, followed by b) 3D model reconstruction and finally c) pose estimation to feedback fine tuning manipulation control loop. Objects and end-effector are colored in different colors and their models are supposed to be known. Our mobile manipulator prototype consists of a stereo camera under a binocular stand-alone configuration and an anthropomorphic 7DoF arm with a parallel end-effector (gripper). Our methodology runs in real time and is suitable to perform continuous visual servoing. Experimental results are reported.

© 2013 The Authors. Published by Elsevier Ltd.

Selection and peer-review under responsibility of CIIECC 2013

Keywords: Autonomous Mobile Manipulator, Visual Servoing, Robotic Arm Control, Mobile Robotics

1. Introduction

In recent years, the advances on mobile robotics research have been very successful. Many of the original problems have been, or at least partially, solved. Nowadays, it is possible to have autonomous robots with many abilities. For example, mobile robots can: build and maintain maps of their environments [1]; plan and execute collision-free paths in dynamic environments [2]; or they can plan how to approach humans in a safety way [3].

Mobile robots research has evolved to include new challenges. One of the areas offering a wide source of problems and applications is service robotics. To realize many of the tasks, a service robot requires

* Corresponding author. Tel:+52-228-817-4200 Ext. 10204

Email addresses: sanchezlopezjr@gmail.com (Jose R. Sanchez-Lopez), anmarin@uv.mx (Antonio Marin-Hernandez), epalacios@fciencias.uaslp.mx ( Elvia R. Palacios-Hernandez), hrios@uv.mx (Homero V. Rios-Figueroa),

luisfelipe.marin@gmail.com (Luis F. Marin-Urias)

2212-0173 © 2013 The Authors. Published by Elsevier Ltd. Selection and peer-review under responsibility of CIIECC 2013 doi: 10. 1016/j .protcy .2013.04.052

Fig. 1. Binocular stand-alone configuration system, composed of a stereo-vision camera under a pan/tilt unit and a 7 DoF robotic arm. Both camera and arm are fixed on a common base in order to incorporate the system over a mobile platform.

working with humans in a very close way. This collaboration implies the manipulation of different objects in human environments. Nevertheless, mobile robot manipulation is still a challenging task, mainly because these kinds of environments enclose non-controlled and highly dynamic scenes.

The problem of mobile robot manipulation is commonly decomposed on three stages. The first one implies the motion of robot itself to the proximity of the object to manipulate. This task is considered accomplished when the robot has the desired object inside the configuration space of their manipulator. The second task consists broadly in motion planning and execution to drive the robotic arm nearby the object. Finally, the third step consists on fine-tuning between the final position of previous stage and the correct position, to effectively complete the grasping.

The uncertainty due to mobile robot localization and to the position of each of the joints of the manipulator, produce that methods commonly used in manufacturing robotics are not appropriate under these situations. In order to deal with such uncertainties a servoing scheme must to be applied, to do efficiently the last stage of the manipulation task. Visual servoing schemes are common in the area. They can be roughly divided in image-based servoing or pose-based servoing.

In this paper, is presented a visual servoing scheme based on the position difference of end-effector and the object to manipulate. The methodology proposed consists of three steps: a) Image color segmentation, b) 3D model registration and c) finally 3D pose estimation. The proposed algorithm works in real-time under a binocular stand-alone camera manipulator configuration [4] (Fig. 1) and enough robust to deal with illumination changes commonly produced in the environment.

This paper is organized as follow. Some of the most representative works are presented in section 2. In section 3, we present and describe the proposed methodology. Section 4 contains results and discussion about the image-processing proposed approach. Finally on section 5 are presented the conclusions and future work.

2. Related Work

Many research groups around the world have already addresses diverse parts of mobile robot manipulation. For example Fedrizzi et al. in [5] deal with the problem of planning with uncertainty. They propose method is called ARPlace (Action-Related Place). In [6] is proposed the use of a coupling between feedback and visual servoing to introduce small objects in holes by a mobile robot.

When the robot is guided by planning algorithms to grasp objects, these methods can causes unwanted motions. In [7] this problem is addressed by means of a coordinated and a reactive speed control. The problem of generation correct grasping postures has get attention by different groups, for example in [8].

A very useful survey on visual servoing can found in [9, 10]. As described in [11] the problem of visual servoing control can be addressed by means the kinematic model of the manipulation in conjunction with the Jacobian matrix of visual features extracted from images. However, one of the main problems in computer

Fig. 2. Colors considered for diverse objects to manipulate and robotic arm end-effector.

vision is still the robust extraction of features under dynamic conditions. The problem of illumination conditions in color segmentation can be solved by means discarding the corresponding component in the HSI color space [12].

As described in [11], visual servoing is divided in: image-based visual servoing and pose-based visual servoing. The first scheme uses measures in image space to control the robot, while pose-based visual servoing uses real pose estimation to feedback control loop. In this work, we address the problem of pose estimation from a pair of images.

We are interested to solve the problem of fast segmentation to compute the position of the end-effector of a 7 DoF (Degrees of Freedom) robotic arm and the position of the objects. 3D pose estimation can be done with a variety of sensors, i.e. Time of Flight (ToF) cameras, RGB-D (Kinect like) cameras or stereovision systems.

Stereovision systems commonly present the problem of burred edges detection, specially when the objects to detect are small, as in the case of human environments this can be really a problem, as described in [13]. When dealing with closer objects, most of these sensors have similar problems. For example, ToF and RGB-D cameras only detect efficiently objects at distance longer than 60 cm.

For a visual servoing system, under a stand-alone configuration, i.e. visual sensors in the head and not in the arm, the effective depth detection distance restricts the usability of this configuration. In order to avoid this problem a stereo camera can be adjusted to detect closer objects. However, visual field is reduced, so also the capabilities to detects correctly arms configuration.

In this paper, we propose a methodology to segment colored objects in both cameras for a stereovision system in order to recover 3D pose to feedback a controller, and not directly from disparity images as is commonly done.

3. 3D Visual Pose Estimation

Below, are described the three stages of the proposed methodology to recover 3D positions for end-effector and objects.

3.1. Color Image Segmentation

In order to feedback efficiently the controller of the robotic arm, the process involved should be fast enough. Images are acquired from stereo camera at a frequency of 25Hz, being a boundary to solve the problem.

The use of color spaces having an illumination component can be used to get invariance to illumination conditions. In this work is used the HSV color space, using only H and S components and leaving out of our procedure the value component (V). End-effector and diverse objects to manipulate are at different colors as showed in figure 2.

The problem of segmentation can be solved by using machine learning approaches. In this kind of solution, classifying a pixel implies having two things: a data base information for training and a similarity measure to compare.

Table 1. Corresponding values of H, S and radius for every color

i Color H S Radius

0 Purple 100 59 I5~~

1 Blue 102 88 15

2 Yellow 37 27 30

3 Red 143 19 35 0 Orange 5 98 20

The color segmentation of this stage is achieved using HS color pixels classification and Euclidean distance as similarity measure. Using a machine learning approach the color classification space have been constructed as showed on Figure 3. The segmentation process and object classification are strongly correlated.

To construct the color data base, n = 100 representative samples of each color have been randomly selected. The centroids of each color cluster are defined by:

cenHSW=i m.M (1)

where cenHS(i) represents the ith mean value for every color and i takes values between [0,4] representing the different colors; H(j) is the jth value of the every H component and the same for S( j).

To simplify the classification, we have modeled the feature space using circles r(i) centered at cenHS(i). A specific radius r(i) for each color component has been selected using the Euclidean distance from cenHS(i) to cover most of the samples. Results of this process are shown in table 1. Finally pixels are classified using this information, resulting in binaries images for each color in the working universe.

(a) (b) (c)

Fig. 4. Freeman chains point sets for every segmented image of the gripper: a) and b) left and right corresponding images, and c) vertex alignment.

3.2. 3D model registering

As have been mentioned, the 3D model of objects and gripper are know, however in order to recover their positions is enough to match superior planar patches in both images coming from stereo camera. In order to get planar patches binary images should be processed. In some cases, the segmentation algorithm does not produce good results, or produces some misclassified pixels. To filtering this noise, we first use the connected component algorithm implemented in [14] and then we reject the small areas, keeping only the bigger area in the binary image. This area represents the object that we are looking for.

The edges of the biggest area are then extracted and matched against the known models of the objects and the gripper, depending on color. The objects (cylinders) are easily matched by circular planar patches, but in the case of gripper the process is a little more complicated.

In order to register gripper planar patch model, it is required to align vertices of the contours obtained. The contours extracted can be treated as curves. A curve is represented by a point sequence and some of this points could be vertex. To extract the vertices we use the implementation of Freeman chains [15] and then we group together the points belonging to a straight line (vertical or horizontal). This representation compacts the contour and let us to perform more complex calculations, mainly because the number of points has been reduced (Figure 4).

3.3. Pose estimation

In order to get the 3D pose of gripper and objects, centroids of the registered patches in the image reference frame must to be obtained. This is get by the following equations:

area(cc) = (x, y) (2)

f _ ( area(cc) > th, 1

fa _ \ area(cc) < th, 0 ()

c(Je, y)_( , ^ac) (4)

\area(cc) area(cc) J

where area(cc) is the area of the contour in pixels, f (x,y) is the binary image. The function fa has been implemented to reject small areas. Typically, th _ 15 pixels and c(x, y) is the center of mass in the 2D binary image.

Using stereo-vision camera intrinsic and extrinsic parameters it is possible to obtain the 3D position of an object. A single point in left camera can be projected in 3D coordinates if disparity and projection matrix are known. It is necessary to compute the disparity between points obtained in both cameras. At this stage of the work, the 3D pose estimation of objects is obtained by projecting models centroid over the flat surface, in this case the table, obtained as described in [13].

4. Results

The vision algorithms was implemented on a single core Intel Pentium Centrino with 1.8 Ghz and 512 MB of RAM. We are using Open Computer Vision Libraries (OpenCV) under an Ubuntu 8.04 Operating System. The images obtained from stereo camera have a size of 640x480 pixels, but we are interesting only in cases when the gripper is closer to the objects. Assuming that, we cut images to create a region of interest, in reference to the center point of the gripper model in the image and with a fixed size of 320x240 pixels.

The objects that we are interesting to segment are colored cylinders (imitation of wood) taken from a children toys set. The end effector has an orange planar patch.

In the Figure 5 are shown the results for color segmentation process and model registering for the four colored objects and the gripper. In images (a), (c), (e) and (g) are the color segmentation results for each colored object and in (b), (d), (f), and (h), are the contours where the models are matched. The gripper segmentation and model matching is presented in (i) and (j).

The results of 3D pose estimation are shown in Figure 6, where 3D positions have been obtained by using the projection matrix. The time for get the final position of the objects and gripper is in the order of 50 ms (> 15Hz). The proposed methodology has been tested with two controllers: a PID and a fuzzy schemes. Feedback period has to be good enough to deal with such controllers, however complete analysis is outside the scope of this paper.

5. Conclusions

In this paper, we have presented a methodology to compute the 3D position of simple colored objects and the end-effector of an anthropomorphic 7 DoF arm. The methodology starts with a fast color segmentation performed by an Euclidean Classifier under HSV color space. This segmentation is executed in both images taken from a stereo camera. After the segmentation process is carry out, the next step is to extract information in the binary images. This steps contemplates to obtain statistical measures like the centroid and vertices extraction using a compression of contours of the binary images. At this point some kind of noise is filtered using a simple conditions that acts like low pass filter.

Once of information extraction is done, the process continues solving the matching problem. When this process ends, we can compute the disparity of a point and then to apply the projection matrix to get the 3D point measured in the camera reference frame.

For visual servoing purposes, we need translate this set of points to the arm reference frame. To compensate the vision errors, we apply a single transformation to the translated points and get more accuracy results. This systems runs in real time and is adequate for implement a visual servoing approach for grasping objects. Because all references are obtained relative to camera and arm, this prototype can be mounted easily on a mobile platform without affecting the performance. Another important aspect is that, as we are using only the H, S components of HSV color space, the process is not affected a lot by illumination variations.

In future work, we need to implement a more sophisticated module for shape recognition in order to deal with more complex objects. The extraction of more 3D points or characteristics will lets us to test approaches like the one proposed by Chaumette et al. in [11]. Evaluation over different control schemes are on the run.

6. Acknowledgments

This work was partially supported by CONACyT grants No. 106812 and No. 61375.

Fig. 5. Binary images and information extraction

Fig. 6. 3D point cloud of the four colored objects and of the end-effector

References

[1] H. Durrant-Whyte, T. Bailey, Simultaneous localization and mapping: part i, Robotics Automation Magazine, IEEE 13 (2) (2006) 99 -110. doi:10.1109/MRA.2006.1638022.

[2] S. Petti, T. Fraichard, Safe motion planning in dynamic environments, in: Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on, 2005, pp. 2210 - 2215. doi:10.1109/IR0S.2005.1545549.

[3] S. Satake, T. Kanda, D. Glas, M. Imai, H. Ishiguro, N. Hagita, How to approach humans?-strategies for social robots to initiate interaction, in: Human-Robot Interaction (HRI), 2009 4th ACM/IEEE International Conference on, 2009, pp. 109 -116.

[4] D. Kragic, H. Christensen, A framework for visual servoing, in: J. Crowley, J. Piater, M. Vincze, L. Paletta (Eds.), Computer Vision Systems, Vol. 2626 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2003, pp. 345-354.

[5] A. Fedrizzi, L. Mosenlechner, F. Stulp, M. Beetz, Transformational planning for mobile manipulation based on action-related places, in: Advanced Robotics, 2009. ICAR 2009. International Conference on, IEEE, 2009, pp. 1-8.

[6] B. Hamner, S. Koterba, J. Shi, R. Simmons, S. Singh, An autonomous mobile manipulator for assembly tasks, Autonomous Robots 28 (2010) 131-149. doi:10.1007/s10514-009-9142-y.

URL http://dx.doi.org/10.1007/s10514-009-9142-y

[7] B. Siciliano, O. Khatib, Springer Handbook of Robotics, Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2008.

[8] K. Harada, K. Kaneko, F. Kanehiro, Fast grasp planning for hand/arm systems based on convex model, in: Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on, IEEE, 2008, pp. 1162-1168.

[9] F. Chaumette, S. Hutchinson, Visual servo control. ii. advanced approaches [tutorial], Robotics & Automation Magazine, IEEE 14(1) (2007) 109-118.

[10] F. Chaumette, S. Hutchinson, Visual servo control. i. basic approaches, Robotics & Automation Magazine, IEEE 13 (4) (2006) 82-90.

[11] F. Chaumette, S. Hutchinson, Visual servoing and visual tracking, Springer Handbook of Robotics, Springer-Verlag (2008) 563-583.

[12] S. Vuppala, S. Grigorescu, D. Ristic, A. Graser, Robust color object recognition for a service robotic task in the system friend ii, in: Rehabilitation Robotics, 2007. ICORR 2007. IEEE 10th International Conference on, IEEE, 2007, pp. 704-713.

[13] L. Morgado-Ramirez, S. Hernandez-Mendez, L. Marin-Urias, A. Marin-Hernandez, H. Rios-Figueroa, Visual data combination for object detection and localization for autonomous robot manipulation tasks, Research in Computing Science: Advances in Soft Computing Algorithms 54 (1) (2011) 285-293.

[14] S. Suzuki, K. be, Topological structural analysis of digitized binary images by border following, Computer Vision, Graphics, and Image Processing 30 (1) (1985) 32 - 46. doi:10.1016/0734-189X(85)90016-7.

[15] H. Freeman, On the encoding of arbitrary geometric configurations, Electronic Computers, IRE Transactions on 10 (2) (1961) 260- 268.