Scholarly article on topic 'Multimodal Fusion, Fission and Virtual Reality Simulation for an Ambient Robotic Intelligence'

Multimodal Fusion, Fission and Virtual Reality Simulation for an Ambient Robotic Intelligence Academic research paper on "Computer and information sciences"

CC BY-NC-ND
0
0
Share paper
Academic journal
Procedia Computer Science
OECD Field of science
Keywords
{multimodality / "multimodal fusion" / "multimodal fission" / "virtual reality" / "ambient intelligence" / "robotic intelligence"}

Abstract of research paper on Computer and information sciences, author of scientific article — Omar Adjali, Manolo Dulva Hina, Sebastien Dourlens, Amar Ramdane-Cherif

Abstract In this paper, we present an architecture that demonstrates multimodal fusion and fission using semantic agents and web services that interacts with worldwide-web computers, services and users. This solution extracts the meaning of a situation using the semantic memory of agents to manage the interaction process involved. The fusion of values from different sensors produces an event that needs implementation. The fission process suggests a detailed set of actions that are for implementation. Before such actions are implemented by actuators, these actions are first evaluated in a virtual environment which mimics the real-world environment. If no danger arises from such virtual evaluation, then implementation is feasible. Otherwise, there might be a need to add one or more smaller actions to render the action safe and free from danger. Our work presents the following contributions: (i) a design of agent memory and a model of the world environment using a knowledge representation language that is compatible with the current standards, (ii) creation of a pervasive architecture with several scenarios of composition and adaptation, (iii) presentation of how agents and services interact to provide support in a real-world, and (iv) simulation of an event in a virtual environment to assess the feasibility of the event's implementation.

Academic research paper on topic "Multimodal Fusion, Fission and Virtual Reality Simulation for an Ambient Robotic Intelligence"

(8)

CrossMark

Available online at www.sciencedirect.com

ScienceDirect

Procedia Computer Science 52 (2015) 218 - 225

The 6th International Conference on Ambient Systems, Networks and Technologies

(ANT 2015)

Multimodal Fusion, Fission and Virtual Reality Simulation for an

Ambient Robotic Intelligence

Omar Adjali, Manolo Dulva Hina, Sebastien Dourlens, Amar Ramdane-Cherif*

LISVLaboratory, Université de Versailles St-Quentin-en-Yvelines, 10-12 avenue de l'Europe, 78140 Velizy, France

Abstract

In this paper, we present an architecture that demonstrates multimodal fusion and fission using semantic agents and web services that interacts with worldwide-web computers, services and users. This solution extracts the meaning of a situation using the semantic memory of agents to manage the interaction process involved. The fusion of values from different sensors produces an event that needs implementation. The fission process suggests a detailed set of actions that are for implementation. Before such actions are implemented by actuators, these actions are first evaluated in a virtual environment which mimics the real-world environment. If no danger arises from such virtual evaluation, then implementation is feasible. Otherwise, there might be a need to add one or more smaller actions to render the action safe and free from danger. Our work presents the following contributions: (i) a design of agent memory and a model of the world environment using a knowledge representation language that is compatible with the current standards, (ii) creation of a pervasive architecture with several scenarios of composition and adaptation, (iii) presentation of how agents and services interact to provide support in a real-world, and (iv) simulation of an event in a virtual environment to assess the feasibility of the event's implementation.

© 2015 The Authors. PublishedbyElsevier B.V.This is an open access article under the CC BY-NC-ND license

(http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the Conference Program Chairs

Keywords: multimodality; multimodal fusion; multimodal fission; virtual reality; ambient intelligence; robotic intelligence

1. Introduction

The interaction of robots in a human environment using agents is complex because robots/machines do not think by themselves. Hence, we look for solutions to reinforce understanding and disambiguation of such interaction. As a solution, we propose the design of a pervasive architecture of generic agent components, taking into consideration several existing standard technologies and integrating various domains.

* Corresponding author. Tel.: (+33) 06.63.76.65.49; Fax: (+33) 01.39.25.49.85 E-mail address: omar.adjali@lisv.uvsq.fr

1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license

(http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the Conference Program Chairs

doi: 10.1016/j.procs.2015.05.060

A pervasive multimodal interaction for ambient robotic intelligence system consists of human, robots and smart components cooperating with one another to perform a specific task. In such system, a rich symbolic representation is needed to facilitate interaction and storage of knowledge. The use a narrative knowledge representation language (KRL) 6 7 that resembles human natural language (NL) is preferred. This KRL can be used to communicate events between agents, between agents and services, and also to represent knowledge. Such knowledge should represent entities, actions and general knowledge of what is happening in the environment in order to take decisions10.

Robotic interaction is a multimodal interaction that involves several input/output modalities2. Sensors and actuators are driven by services. Services are driven by agents. Hence, our architecture must be able to represent service and agent composition. Consider, for example, a man living alone in a house. In a pervasive multimodal interaction for ambient robotic intelligence system, his house walls, doors, windows, and equipment are connected to a network. Both robots and the house will have semantic software agents and services. Robots communicate with sensors and actuators, such as video cameras and vocal synthesis, to perform face and objects recognition. Every agent memory contains its own set of event models to recognize/use other agents and services in the network. In the cited example, events and concepts related to objects in the house and human activities are stored in agents' memory as a priori knowledge.

The multimodal interaction architecture for ambient robotic intelligence contains a simulation component that assesses the feasibility of executing the result of a decision process. It simulates the situation in a virtual environment that is similar to a real-world environment. Its importance becomes more explicit if we consider that the system is implemented in an environment where risk or dangerous situations exist. This component validates safety of different entities before an event will be called for actual implementation.

2. Related Works

We made a non-exhaustive comparison of multi-agent system (MAS) architectures5, 8 with our own architecture. These architectures possess some properties similar to that of ours: pervasiveness, multimodality, types of agents (rational or logical), ways of communication, knowledge base containing formal ontologies, managing entities as concepts, types of models, knowledge base stored in the agent or shared and accessed in the network, closeness to natural language, and consistency checking.

Our preference is a pervasive and multimodal MAS architecture that contains the most generic agents, with the same semantic language for all operations of agents (fact storage, event, inference, and communication) for easy understanding of agent behaviors and decisions. Our environment knowledge representation language (EKRL) will be used as a form of simplified natural language exchanges between agents. Our models of EKRL language are narratives (agent knows and tells what is happening), performant (agent sends orders to be executed) and permit an organization and storage of knowledge in an ontological manner. Finally, the main difference is that our architecture does reasoning on facts by manipulating concepts, just like what human beings do. The virtual reality simulation component that validates the feasibility of implementing an action in real-world environment before actually doing the action makes our architecture design stands out.

3. Architecture of Multimodal Interaction for Ambient Robotic Intelligence

Multimodal interaction3 is that part of human-robot interaction12 that involves events acquisition and context awareness, interpretation and execution.

Multimodal interaction refers to two processes of interaction - the fusion and the fission. Multimodal fusion covers low-level integration (signal information) up to high-level storage of its meaning (semantic information) by composing and correlating data coming from multiple sources (sensors, interaction context, software services or web services) 4. Hence, information fusion may be a function, an algorithm, or a method/procedure of combining data. Multimodal fission, on the other hand, is the process of physically acting or reacting to the data just provided. Following the fusion process, the fission will split the semantic result into action/s to be sent to the actuators. While the result of the fusion process provides the action to be implemented, the fission process splits the action into smaller details sent to the actuators to carry out the action or event for implementation.

Multimodal fusion provides advanced human-computer interaction using complementary or redundant modalities. Multimodal fission determines the best modalities and actions based on the given context and evaluation of events.

The essential requirements for multimodal fusion and fission engines are: (i) synchronization of modalities of sending events, (ii) cognitive algorithms, including formal logic, to obtain the meaning of the fusion process, (iii) context representation, considering all concepts and actions, (iv) data transfer bandwidth, necessary for efficient application and real-time constraints, (v) determination of best modalities to implement the intended action, as obtained from the fusion process, (vi) validation that the chosen modalities to implement the intended action is feasible.

Fig. 1(a) shows the fusion and fission agents connected to input/output services of a network. Here, a service represents any smart or reactive component of a pervasive environment. Services are ambient communicating entities (step 1). The meaning of the situation is extracted by the fusion agent in order to take a reactive decision (step 2). The extraction of the meaning to understand what is happening, as well as ontological storage of the events, are essential for interpretation.

Events are stored under related classes of behavioral models. The aim of the architecture is to perform fusion and fission processes using all situational knowledge (i.e. past and present states of entities) stored in the autonomous agent's memory (step 3).

To assess whether the fission process will yield an action that poses no danger to the entities involved, this set of fission implementation is evaluated through a virtual simulation environment (step 4). Its result is returned back to the fission component for validation or modification for improvement (step 5). A valid result means it is ready for implementation and hence should be sent to actuators (step 6). Step 7 brings us to the implementation of the action in a pervasive environment.

To satisfy architectural requirements, we design two autonomous components, namely semantic agents and services. For multimodal interaction, we created two types of agents: fusion agents and fission agents.

Semantic services are standard web services that create messages and communicate with agents using EKRL. A semantic service may send information to agents using sensors, or execute orders to control actuators. Services can be viewed as reactive agents with no cognitive part.

Semantic agents are also web services but they are cognitive and possess capabilities to achieve goals11. A semantic agent contains embedded inference engine capable of processing a matching operation or answering queries. Scenarios or execution schemes are stored in a semantic agent's knowledge base. Semantic agents differ from service agents in the sense that they have knowledge stored in their memory. The execution codes of the two agents are, however, similar.

Fig. 1 (b) differentiates the structure of a semantic agent from a semantic service. An agent contains its knowledge base (i.e. memory), its inference engine and its communication module. A service, on the other hand, has only code and standard memory, a communication module and a hardware controller which enables the service to receive information from a sensor or to drive an actuator in the environment.

The query models stored in a semantic agent's memory define the agent's program (i.e. the role of agent in the organization). Semantic agents receive, filter and attach facts to their own event model. In our work, we program agents by reusing default concepts and models, and by adding specific query models in the agent memory using our memory editor.

Fig. 1. (a) Multimodal interaction architecture; (b) Semantic agents and services architecture

4. Environment Knowledge Representation Language (EKRL)

4.1. EKRL and Agent Memory

The EKRL is a semantic formal language that describes events in a narrative way. It resembles huma n's natural language. EKRL is used to build event messages and store facts into the models of an agent's memory. An agent has a memory containing all models of events that it recognizes; hence, when it encounters any event that does not match with any model stored in its memory, it will simply do nothing on the event received. It uses semantic inference to extract the meaning of a particular situation. Ontologies are used as structures to store events and extract meaning. In EKRL, frames are predicates with slots that represent pieces of information. A slot is represented by a role associated to an argument. A predicate P is a semantic n-ary relationship between roles and arguments and represents a simple/composed event; it is given by the expression:

PCCfli^i)... CRn^n))

where Ri = a predicate role, and Aj = a list of arguments. Role Rn = the maximum possible roles (dedicated variables that store argument/s) in the event model, and An = the maximum combination of values in a stored model.

"<RootPredicate>:<PredicateName>, <Role 1>:<Argument 1>, <Role2>: <Argument2>, <Role3>: <Argument3>" is a sample model written using EKRL syntax. "RootPredicate" may be one of the following: "Exist", "Move", "Receive", "Behave", and "Own". "Role" can be any of the following: "Objective", "Source", "Beneficiary", "Modality", "Topic", and so on.

"Exist: Available Service, SUBJECT: Composition, SENDER: Services, DATE1: Start date, DATE2: end date, LOCATION: location" is a predicate model of "Available Service" event. "Exist" is one of the root predicates of the model. "Exist" is a general event model expressing a creation or discovery of something. "Subject", "Sender", "Datel", "Date2", and "Location" are roles of this predicate. This event model permits an agent to find new service available for use. This event is sent by services to other agents to inform them of their existence. Using an instance of this model, various agents will be able to know the instance's service name, start date, end date of existence, and location of the service.

4.2. Agent Memory

An agent memory is essential for a semantic agent to be cognitive. As stated, agents have the abilities to store and

retrieve events, understand the meaning of a situation and create new events to be sent to other semantic agents and services. The meta-concepts, events models, query models and instances are stored as ontologies7,9. Instances themselves are facts, scenarios and context knowledge.

The agent memory is a knowledge database that stores all events coming from the network and used for cognitive operations by recalling any previously-stored facts for reasoning and/or acting on them following some stored event models.

4.3. Agent Memory Editor

Using the editor, a user can build and import concepts, models, term definition and media unified related links, requiring requires no programming on the part of the user. The intent of having an editor is to build and modify model frames of the model ontology while respecting EKRL syntax. As shown, the agent memory is composed of three ontologies. An agent's memory is developed using SQL database of 8 tables.

5. The Semantic Agent's Inference Engine

The functionalities of an inference engine are discussed below.

We want a well-organized symbolic memory that can fulfill robotic interaction requirements. Models are used for the following reasons:

• Agent would be able to store various events.

• Agent would be able to read its program. This program is in the form of query models stored in the memory. Note that the task of a fusion or fission agent is to produce events that are already stored in the memory.

• A user may query the memory to check instances of a concept, an event and or a fact.

• A user may program an agent. He may add or modify concepts/event models or request models to store facts directly under models of corresponding events.

An agent's inference engine processes information and is used to store events using event model, query the memory, find answers (direct matching) or find indirect answers (matching needing operations execution) using operations on concepts as arguments or operations on events.

6. Virtual Reality Simulation

We elucidate details of the virtual reality simulator in this section.

6.1. Specimen Case: "Put That There"

Assume that a person asks assistance from our humanoid robot, Nao. The task is to move an object from one place to another. To implement the request, our actor orally demanded Nao, "Put that there" 1, pointing first at the object and then at the destination of the object. The input modalities are voice and vision. The robot uses speech recognition to understand the order. It also uses gesture recognition to determine the object's position and its destination. The robot must also use object recognition. The robot receives three events (speech recognition, gesture recognition and object recognition) from the environment. It performs multimodal fusion to combine these events. The fusion process result indicates that Nao must get the object pointed by the person and move it to its destination. It then performs the fission process to determine what actuators must be used to implement the task and in what order.

6.2. Fusion and Fission

The fusion process uses the event process of the following: "Behave:ArmShowsObject", "Behave:ArmShowsPosition", and "Behave:SpeechOrder". Fig. 2(a) shows the predicate of the event models in EKRL and the instance of such event model. Given the event received, the fission agent will have to select in order (a task) to send into the network with the name of the event with the role Subject (Behave:Grasp). The roles of the

task will be collected with the arguments of "Behave: Put That There". See Fig. 2(a).

6.3. Case 1: Fission Successful

The fission process yielded all the necessary actions (Walk, Move Arm, Grab Object, Drop the task. In effect, when the robot performed the various tasks as per fission result in virtual sent informing the success of the execution of different sub-actions that compose the task. As of the task by a real robot is possible.

6.4. Case 2: Fission Failed and Fission Re-adaptation

Consider a case where there is a closed door that separates the place where the object is located and where it is to be deposited. In the virtual reality simulation, Nao cannot proceed to go to his destination because the door is closed. Due to the presence of an obstacle, the result of the fission process is inadequate. This is a case where an intervention is needed. The feedback control is essential especially in a case where the virtual simulation result is not successful. The fission control process takes note of the obstacle and the action that cannot be implemented. Hence, a new action before (or after) the failed action must be added for a successful simulation. As a consequence, the fission agent introduces a new event before (or after) a failed event: OpenDoor(). In this case, the action Open Door is introduced after the Walk() action. In effect, a new series of events are produced given the presence of an obstacle (e.g. a closed door). This new list of events is sent back to the virtual environment simulator for reevaluation. This time, it yields a positive result (Okay). The result is then sent into the real-world environment; indeed, with re-adaptation, the new sequence of events can be implemented without creating any problem. See Fig. 3(a).

6.5. Simulation via Virtual Robotic Experimental Platform

We use the Virtual Robotic Experimental Platform (V-REP PRO EDU), an application designed for robot simulation^, to virtually simulate the two cases of the "Put That There" scenario - one without an obstacle and another with obstacle. An input to the simulator is the EKRL events and objects. As shown in Fig. 3(b), the 3D models the scenario "Put That There", taking into account all actions (Walk, Open Door, etc.) and objects (robot, desk, glass, door, etc.) involved. All objects behave according to the script attached to them. In our work, scripting is done using C# programming language. Fig. 3(b) is a snapshot of a scene.

Behave:PutTh at There SUBJECT: COORD(ArmShowsObject, ArmShowsPosition, SpeechOrder) SENDER: COORD(GestureSensors, VocalSensors) DATE: date time LOCATION: location Behave:PutThatThere SUBJECT: COORD(ArmShowsObject, ArmShowsPosition, SpeechOrder) SENDER: COORD(GestureSensor1, GestureSensorl, VocalRecognitionl) DATE: 09/04/2010 10:04 LOCATION: room 5

Behave: PutThatThere Event Behave: PutThatThere Fact

Fig. 2. (a) Fusion: "Put That There

Object) to accomplish simulation, a result is a result, the execution

^ http://v-rep-pro-edu.soitware.informer.com/

Move:Walk Move:GrabObject

SUBJECT: ALTERN(Behave:PutThat Here,Behave:DoWaik) SUBJECT: ALTERN(Behave:PutThatHere,Behave:OpenHand;

OBJECT: entities Behave:CloseHand)

TARGET: entities TARGET: entities

MODALITY: NaoProxyArms MODALITY: NaoProxyArms

DATE: date time DATE: date time

LOCATION: location LOCATION: location

Move:MoveArm

SUBJECT: ALTERN(Behave:PutThatHere,Behave:DoMoveArm;) Move:DropObject

OBJECT: entities SUBJECT: ALTERN(Behave:PutThatHere,Behave:OpenHand)

TARGET: entities TARGET: entities

MODALITY: NaoProxyArms MODALITY: NaoProxyArms

DATE: date time DATE: date time

LOCATION: location LOCATION: location

Fig. 2. (b) Fission: "Put That There".

6.6. Results and Analysis

We have stored knowledge about objects, signal and behavioural scenarios in the memory of a few agents. We evaluated scenarios by querying such memory. Agents do evaluate the meaning of a situation using its previously recorded events. After validation of event models, we check the correlation between expected outputs and inputs. We also check the robustness of the inference engine by increasing the number of events in a given time. Our result shows that consistency decreases when too many events occur. Performance of agents decreases due to processing speed, but robustness is always good. The efficiency of our multimodal interaction architecture for ambient robotic intelligence can be extended by adding more components (services and agents) and knowledge being extended in the agents' memory, making the architecture scalable.

In the case of virtual simulator, we can extend it by adding new definitions, scenarios and events. For example, the object to be transferred from one place to another can be liquid. In the case of water, the robot needs to fill in a glass with water, and it is the glass of water that is put from one place to another. The domain of this application can be healthcare services. For instance, the intended destination of the "Put That There" scenario could be a hospital patient.

Fig. 3. (a) "Put That There": Fission readapted; open door is added into the fission process; (b) Virtual robotic simulation using V-REP PRO

EDU software application.

7. Conclusion

In this paper, we presented our multimodal interaction architecture for ambient robotic intelligence using semantics agents and services to solve some interaction problems. The design of the architecture is reliable. The discovery and communication between agents and services are tested to be effective. This ambient and robotic architecture is suitable for multimodal interaction. Our pervasive multimodal interaction architecture is well designed for ambient intelligence, cognitive analysis and understanding of an environment. Our architecture's virtual environment simulator is a validating component that tells us if a sequence of events can be implemented without risk or not. The concept is that every scenario is simulated virtually. Generally, a model event is built on an ideal situation (i.e. no obstacle). A modified model event is created when an obstacle is introduced; new model event include actions and events that will get over such obstacle. We need to make various event models in the agents' memory for such agent to function like a human being capable of doing task with little intervention. We are currently developing more event models. We are also introducing various kinds of obstacle in each event model and introducing resolutions to get over those obstacles.

References

1. Bolt, R., " "Put That There": Voice and Gesture at the Graphics Interface," in 7th Annual Conference on Computer Graphics and Interactive Techniques, 1980.

2. Daghan Lemi Acay, Pasquier Philippe, Sonenberg Liz, "Extrospection: Agents Reasoning About the Environment," presented at the 3rd IET International Conference on Intelligent Environments (IE'07), Germany, 2007.

3. Landragin, Frédéric, "Physical, Semantic and Pragmatic Levels for Multimodal Fusion and Fission," presented at the Seventh International Workshop on Computational Semantics, Tilburg, The Netherlands, 2007.

4. Landragin, Frédéric, Denis A., Ricci A., Romary L., "Multimodal Meaning Representation for Generic Dialogue Systems Architectures," in Language Resources and Evaluation (LREC 2004), 2004, p. 521—524.

5. Milind Tambe, David V. Pynadath, Nicolas Chauvat, Abhimanyu Das, Gal A. Kaminka, "Adaptive Agent Integration Architectures for Heterogeneous Team Members," in Fourth International Conference on MultiAgent Systems, Boston, MA , USA, 2000, pp. 301 - 308.

6. Minsky, Marvin, "Matter, Mind and Models," Washington, DC, 1965, pp. 45 - 49.

7. Piero, Zarri Gian, Representation and Management of Narrative Information, Theoretical Principles and Implementation vol. 302. London: Springer-Verlag, 2009.

8. R. J. Bayardo Jr., W. Bohrer, R. Brice, A. Cichocki, J. Fowler, A. Helal, V. Kashyap, T. Ksiezyk, G. Martin, M. Nodine, M. Rashid, M. Rusinkiewicz, R. Shea, C. Unnikrishnan, A. Unruh, and D. Woelk. , " Infosleuth: Agent-Based Semantic Integration of Information in Open and Dynamic Environments," in ACM SIGMOD International Conference on Management of Data, 1997.

9. Ross, Quillian, "Semantic Memory," Semantic Information Processing, Massachissets Intstitute of Technology, 1966.

10. Sébastien Dourlens, Amar Ramdane-Cherif, "Semantic Modeling and Understanding of Environment Behaviors," presented at the IEEE Symposium Series on Computational Intelligence, Symposium on Intelligent Agents, Paris, France, 2011.

11. Sébastien Dourlens, Amar Ramdane-Cherif, Eric Monacelli, "Tangible Ambient Intelligence with Semantic Agents in Daily Activities," Journal of Ambient Intelligence and Smart Environments, vol. 5, pp. 351-368, 2013.

12. Terrence Fong, Illah Nourbakhsh, Kerstin Dautenhahn, "A Survey of Socially Interactive Robots," Robotics and Autonomous Systems, vol. 42, pp. 143 - 166, 2003.