Scholarly article on topic 'Evaluation of Cognitive Architectures Inspired by Cognitive Biases'

Evaluation of Cognitive Architectures Inspired by Cognitive Biases Academic research paper on "Psychology"

CC BY-NC-ND
0
0
Share paper
Academic journal
Procedia Computer Science
OECD Field of science
Keywords
{"Cognitive Fallacy" / Evaluation / "Cognitive Architecture"}

Abstract of research paper on Psychology, author of scientific article — Christoph Doell, Sophie Siebert

Abstract Cognitive architectures are frequently built to model naturally intelligent behavior. This aims on two primary goals: On one hand these architectures model human behavior in order to give a better understanding of the human thought process. On the other hand cognitive architectures are an approach of modeling artificial intelligence. Those two goals might be conflicting, as humans sometimes act irrationally e.g. because they were cognitively biased. In this work, we analyze on a theoretical level whether cognitive architectures are also biased. Therefore we first abstract more general behavior from cognitive fallacies. Then we evaluate for the architectures Clarion, Leabra and Lida to what extent they can be biased.

Academic research paper on topic "Evaluation of Cognitive Architectures Inspired by Cognitive Biases"

Procedia Computer Science

Volume 88, 2016, Pages 155-162

7th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2016

Evaluation of Cognitive Architectures Inspired by

Cognitive Biases

Christoph Doell1 and Sophie Siebert1

Otto von Guericke University of Magdeburg, Germany doell@ovgu.de sophie.siebert@st.ovgu.de

Abstract

Cognitive architectures are frequently built to model naturally intelligent behavior. This aims on two primary goals: On one hand these architectures model human behavior in order to give a better understanding of the human thought process. On the other hand cognitive architectures are an approach of modeling artificial intelligence. Those two goals might be conflicting, as humans sometimes act irrationally e.g. because they were cognitively biased. In this work, we analyze on a theoretical level whether cognitive architectures are also biased. Therefore we first abstract more general behavior from cognitive fallacies. Then we evaluate for the architectures Clarion, Leabra and Lida to what extent they can be biased.

Keywords: Cognitive Fallacy, Evaluation, Cognitive Architecture

1 Introduction

Most cognitive architectures are built to model specific aspects of human cognition and therefore they are evaluated in those tasks they were created for: The Chrest architecture[1] aims on modeling human perception. Therefore it successfully simulates humans eye movements, when playing chess. In contrast, the cognitive architecture 'Adaptive Control of Thought Rational' (ACT-R) [2] models the human thought process. It can for example simulate the processing of cognitive arithmetic [3]. The BICA Society gives an overview showing 26 cognitive architectures [4] with their different goals and approaches. Comparing these architectures is difficult, as they process different kinds of data in different ways. Therefore questions like 'Which of them model human cognition the best?' cannot be clearly answered. In this work, we propose to compare cognitive architectures, by a general measure, that can be applied even if architectures are built to process different kinds of inputs. The key to this measure is cognitive biases.

When solving tasks, humans do not always act rationally. They tend to rely on learned heuristics and make systematic errors - known as cognitive biases, e.g.:

• When humans were asked to estimate their own skill, Kruger and Dunning [5] showed that experts tend to underestimating, while unskilled individuals tend to overestimating.

Selection and peer-review under responsibility of the Scientific Programme Committee of BICA 2016 155 © The Authors. Published by Elsevier B.V.

doi: 10.1016/j.procs.2016.07.419

• When asked about the valuation on furniture, self-made products achieve higher values, even if all the parts were bought [6]. This is known as IKEA effect.

• People were asked two questions in a row, both on estimating a number. First, they had to decide whether or not the estimated number was bigger than a given value. Then they were asked for the exact value. The results showed, that the given value highly influenced the exact estimation; the answered value was close to the anchoring value[7].

Many of these effects are known, but they mostly apply only to a specific given task. Thus, they can hardly be applied directly to evaluate cognitive architectures. This is why we generalize them in order to make them applicable for a measurement of cognitive architectures.

Cognitive fallacies are often judged as weakness in humans behavior, as the actions are considered irrational. In contrast to this opinion Gigerenzer [8] showed evidence that in real life situations, with limited knowledge, human behavior often leads to better, more robust results, than rationally acting algorithms. Following this argumentation, it is possible, that those very processes which cause cognitive fallacies or biases also result in general intelligent behavior.

Given a natural cognitive system as inspiration for the construction of an artificial cognitive architecture. When modeling, the focus can lie on the functionality or the structure of the original. Functionality is important for the performance of the system, while the structure might give hints on the internal processes, and possibly lead to better explanations[9]. Thus, this work investigates how structural differences can lead to functional differences. Further it builds a theoretical basis for later empirical studies to compare cognitive architectures. We analyze the internal processes of three exemplary cognitive architectures to predict whether they can be biased. In the following section, we give more details on cognitive biases and generalize the behavior in order to be able to measure the effect in a cognitive architecture. Therefore we summarize the effects of cognitive biases abstractly. In Section 3 we describe three exemplary cognitive architectures: Clarion, Lida and Leabra. Their behavior with respect of the abstracted features are explained in Section 4. Finally, in Section 5, the results are summarized and an outlook on future empirical evaluations is given.

2 Cognitive Biases

Cognitive biases are effects indicating a systematic deviation from a rational judgment. In an early work, Kahneman and Tversky [7] presented 13 different cognitive biases. For example they provided evidence that human's natural way of decision making ignores prior probabilities. The availability heuristic shows up when estimating probabilities of occurrences of events. Humans tend to estimate the ease of retrievability, instead.

Participants were asked to decide for one of two mathematically identical alternatives -one was described with mostly positive words, the other one used negative words[10]. People significantly preferred the positive formulation - the corresponding effect is called framing effect.

Another long known psychological effect is the halo effect. Few attributes of a person were read to participants of a study. When the participants were later asked to characterize the person, the result was highly influenced by the order of the given attributes [11].

Another study described the priming effect: people ware asked to complete the letters 'so_p' such that an existing word is created. The result varied: When people had the context of cleaning or washing in mind, the word they came up with was soap. When the context was set to food, people answered soup [12, p. 72].

Abstracting from these examples, for a cognitive bias to be visible, we need a system with three properties:

1. The system has prior knowledge

2. The system has a current problem task to be solved

3. The system has had recent inputs before confronted with the current problem task

It is expected normal, that people with different experiences or systems with other learning examples, will differ in their behavior. So in all our examples, we keep the prior knowledge aspect fixed and just vary in the recent inputs or in the current problem.

In the example of the priming effect, the prior knowledge is the knowledge of words from daily life. So in this case, it can be assumed that each participant knew several fitting words. The given problem task was to complete the given letters such that an existing word is created. The recent inputs were given by setting the context to either food or cleaning. This builds the first characteristic to be measured for the cognitive architecture: The study showed, that the result varies depending on recent inputs.

The example of the halo effect showed that participants were influenced such that information earlier given in the current problem, were taken as higher prioritized. So the task given was to characterize a person, by its earlier given attributes. The problem description just varied the order. So the second characteristic to be measured is: when given an ordered problem task, the result varies, depending in the order of the problem.

For the framing effect, the task was to choose between two alternatives. Prior knowledge is that the participant understands and can valuate both alternatives. Then he is asked to choose the preferred one. We do not consider any recent inputs, but just the variation in the given problem task. For the given choices people chose that formulation, which used more positive words. In order to measure this effect, we abstract this to: the result varies, depending on the problem formulation. Note that this effect can be considered as issue of communication or of internal representation - we focus on internal representation.

3 Cognitive Architectures

One purpose of cognitive architectures is to model general intelligence. Other than algorithms, which are designed to solve a specific task, cognitive architectures should be able to present solutions to a various field of problems. Like humans, they should provide a solution, even if they never encountered the problem before. This solution may be suboptimal, but should at least be reasonable. In the following, three architectures are presented and shortly explained.

3.1 Clarion

CLARION means 'Connectionist Learning with Adaptive Rule Induction ON-line'. It was designed to explicitly differ between an explicit and an implicit level and their interaction.

It consists of four systems: the Action-Centered Subsystem (ACS), the Non-Action-Centered Subsystem (NACS), the Meta-Cognitive Subsystem (MCS) and the Motivational Subsystem (MS). The Action-Centered Subsystem processes procedural knowledge, for example which action to carry out in a given situation. The Non-Action-Centered Subsystem handles declarative knowledge, such as facts, associations and memories. The Motivational Subsystem cares about the basic needs and derives goals from them. It sends the goals to the ACS for processing. The Meta-Cognitive Subsystem combines goals by reinforcement, observes the system and choose between different algorithms and parameters.

Each of these system consists out of an implicit and an explicit level. The explicit levels are networks of chunks. Chunks are nodes, which represent a concept and can be connected. The

implicit level is a neural network, which is learned with different algorithms such as back-propagation, Q-Learning and Top-Down-Assimilation. Corresponding levels are connected and represent similar knowledge. The neurons of the implicit level serve as attributes of the chunks of the explicit level. Because of the different representation and processing of the information, the results need to be combined [13, p. 11-13][14, p. 5-10].

Action-Centered Subsystem The chunks of the ACS represent concepts, which are combined to rules. A rule consists of a condition and an action. If the condition fits to the current situation, the action is recommended. There are three kind of rules: Fixed Rules (FR), Rule-Extraction-Refinement rules (RER) and Independent Rule Learning Rules (IRL). While fixed rules model evolutionary reflexes and moral basics which cannot be changed, the others can be generated, changed, generalized and specialized. For example two existing rules can be combined based either on the base-level activation, the utility or the support. Base-level activation expresses, how often the rule was used recently. Utility is the relation of the successful applying of a rule and its costs. Support specifies, how good the situation fits to the condition. The choice between these measures is up to the MCS[14, p. 24-44].

Non-Action-Centered Subsystem Chunks of the NACS represent facts, associations and sets. In contrast to to processing in ACS, they are not combined to rules, but to associations. Chunks can be activated in following five ways: They are a subset of an active chunk, due to actions of the ACS, by an association, by their attributes of the implicit level or by similarity. The strength of a single chunk is the sum of all these sources. When the NACS receives a chunk or single attributes from the ACS it activates them and spreads activation in both levels. The NACS also handles episodic memories by saving every action, calculation and experience and combining them with the time of occurrence. Every chunk in this memory has a decreasing activation, and is deleted when inactive for too long. Further NACS saves frequency distributions about the situations, it executes actions and their reinforcement, to improve further learning processes [14, p. 62-89].

ACS and NACS cooperate as follows: When ACS retrieves the information 'X', it searches for condition 'X'. If not present, it is surpassed to the NACS, where the superset of 'X', 'Y' is found and returned. ACS then finds rules 'Y ^ Z' and processes them [14, p. 84-86].

Motivational and Meta-Cognitive Subsystems The Motivational Subsystem derives goals from primal existential needs (e.g. food and sleep) and psychological needs (e.g. society and self awareness) as well as secondary needs. The Meta-Cognitive Subsystem models the self-awareness, by regulating the calculations of clarion. It chooses algorithms and parameters and decides which inputs are considered in order to optimize itself. Therefore it can cancel processes and prioritizes goals to trace, by a majority vote of the different needs for the goals. To prevent inconsistent behavior, a potentially new goal is compared to the current goal. If the difference is bigger than a specified threshold, the goal is updated. A further task of the MCS is to generate reinforcement, which is defined by the grade of fulfillment of needs. It weights the implicit and explicit level to obtain different outcomes. [14, p. 100-116]

3.2 LEABRA

LEABRA means 'Local Error-driven and Associative, Biologically Realistic Algorithm' and is based on an adapted layered neural network. Its main purpose is not to model cognition, but to model learning more biologically plausible than normal neural networks and thus resigns

some abstractions. It has an adjusted activation function and uses local mistakes for learning; because of these interesting learning properties, we consider it, anyway. [15, p. 12].

Structure The neural network of Leabra is an auto-encoder with bidirectional and symmetric weights. Since there is no back propagation in the biological archetype, leabra learns with local errors. Therefore the principle of Boltzmann-Machines is used. In addition to the two learning phases of Boltzmann-Machines, a third phase is used, which tries to reconstruct the input signal. The difference between the original and reconstructed signal is used as an additional error. This construction allows for both associative and error-driven learning[15, p. 11-14, 34,113-114].

ReBel ReBel means 'Relative Belief Framework'. It regulates of the overall activity of the neural network by a soft-k-winner-takes-it-all-principle. This means, that only up to k neurons in one layer of the neural network can be active simultaneously. This leads to a sparse distributed representation, where one category is represented by only a few neurons and non-overlapping categories. So the activation of a neuron depends not only on its parameters and inputs, but also on the overall activity. These limitations assure that small changes of the input lead to small changes in the calculations, which leads to robust behavior[15, p. 83-97].

GausSig The Sigmoid function models the gradual implication between two connected neurons. It can be used for filtering and covers large value spaces and thus is well generalizing. The Gaussian function can compare the output of one neuron with the weights of the following. The activation of a neuron depends on the similarity of the weights and the corresponding input signals. This is why the Gaussian function defines the relationship of two neurons. It creates a sparse and disjoint representation and is therefore better suited for the categorization and specialization. Leabra combines Sigmoid and Gauss function to the GausSig function, covering both properties, specialization and generalization [15, p. 101-107].

MaxIn Shorter training phases need smaller amounts of training samples, which increases the risk of an unlucky sampling, such that the result is not representative for the rest of the learning samples. To overcome this error, Leabra maximizes the quality of the input signal by reducing the learning rate for input signals far away from the expected range [15, p. 108-112].

3.3 LIDA

LIDA is short for 'Learning Intelligent Distribution Agent'[16]. Its cognitive process is based on iterative processing of cognitive cycles. This happens asynchronously in parallel such that cycles can be overlapping, processing different tasks and sharing their used memory. Several kinds of memories are modeled explicitly, e.g. sensory memory, procedural memory, perceptual associative memory[17]. In the latter, there are different kinds of nodes, for example nodes representing feelings for emotions.

Cycles mainly go through three different phases: perception, attention adjustment and action selection. The perception compounds several sensory stimuli to objects until the situation is determined. The current situation is compared to memorized situations and similar situations are loaded to the working memory. Since the working memory is limited, Lida permanently focuses its attention and removes unimportant pieces.

To select an action, the procedural memory comes into account. It saves schema, consisting of actions and their results. Each schema has a base activation, which determines how likely an action provides the expected outcome, and an actual activation, that indicates the relevance of

the schema in the actual context. The most fitting action is handed to the motor memory. At the end, the learning phase takes place. Perception learning saves new objects and categories, episodic learning saves events and procedural learning saves new actions and their execution.

Cognitive Processes There are three different kinds of processes: reactive, deliberative and meta-cognitive processes. Reactive processes directly map an action to a situation. Deliberative processes are decision processes, responsible for planning and problem solving. Meta-cognitive processes reflect the own mind. Deliberative and meta-cognitive processes are represented by several circuits, which are build up on another. The control is given by the fact, that a circuit saves its results in the working memory, which can be accessed by the next circuit.

Emotions Goals are derived from emotions and thus are the motivation for calculations. Emotions are represented with feeling nodes, which are unique and have a positive or negative value. They are handled like all nodes and can for example be part of a situation or are loaded into the working memory. In addition to the feeling itself, they also contain the reason for this feeling.

4 Results

In the following, we check for each architecture respectively if their output varies when: 1) recent information to the system is different 2) the order in the problem task in changed 3) the representation of the problem task is changed.

Clarion 1) Clarion can 'remember' different recent inputs by the base-level activation of chunks. After receiving an input, the corresponding chunks are activated and the activation is propagated increasing their base-level activation. This expresses, how often a rule or chunk was used recently. The higher the activation, the more likely its selection becomes. If different recent inputs were shown, different chunks would vary in their base-level activation, resulting in the choice of different rules or chunks as output.

2) To process the order of the problem task, the architecture needs to be able to consider early parts of the problem task differently to later ones. As there is no such mechanism present in clarion, it cannot be biased, this way.

3) In clarion there are several ways to achieve different representations for the same problem. First, both layers - chunk network and neural network - represent the situation in different ways. Not only because the storing of information differs, but also, because the neural network is more specific than the chunk network. The neurons define the chunks in the upper layer. But a chunk can be active, if only a subset of its neurons are active, while a neuron can be active, without activating a chunk. The general meaning of the situation is the same, but it is represented differently in the layers. Besides this separation into layers, the representation of one problem might be different dependent on the time. The Meta-cognitive Subsystem can filter the input with its filtering option, can set weights to zero and can permit information to flow. It can control the representation of a situation in the system, which might differ from time to time. So one situation can be represented differently in the system, resulting in different activation flows and thus different outcomes.

Leabra For Leabra part 1) is already shown [15, p. 199-201].

2) Leabra can produce different outputs depending on the order of the inputs in several ways:

It can weight earlier inputs higher or even dismiss later ones. Leabra might produce this behavior due to categorization and the characteristics of its neural network. The categorization is produced by two mechanisms: First, through a sparse representation generated by ReBel. It regulates the overall activity resulting in only a few active neurons at a time. Second, the cluster forming Gauss part of the GausSig function, results in an information loss and a generalization. The cluster centers are initialized by the first inputs, while later inputs can influence them to a smaller degree, depending on the decay of the learning rate. Therefore, later inputs are either pressed into the first formed categories or handled as outliers and thus are not considered. 3) The third possible bias considers different representations, which is not present in Leabra. It would need different representations of information, which is not given due to the single neural network. A filtering, which could achieve this goal is not present, either.

Lida 1) Lida's circles share the same memory. Therefore recent information are present, when a problem task occurs. So the circles, processing the problem, compete with other circles for active memories. This can lead to different results, when the recent inputs differ.

2) Lida's reaction on different orders of inputs depends on the fact if the task is presented such that one single circle starts processing it, or not. When this is the case the order does not play a role as every part of it gets the same attention. When many successive circles are processing it, the order plays an important role: those first circles would change the early memory states, such that later parts of the problem statement are processed differently. So dependent on the time needed to present the task, we expect this effect to occur or not.

3) Lida is not created to build separate representations for the same problem. As for Leabra, there is no filtering present, which might generate this feature.

These results are summarized in Table 1. For each cognitive architecture, it is shown whether or not the presented variation can cause a change in the result.

Clarion Leabra Lida

different recent inputs / / /

different order of inputs X / ✓/ X

different representation / X X

Table 1: Overview of the properties of the architectures

5 Conclusion and Future Work

In this work we have investigated how three structurally different cognitive architectures process information and to what extent this leads to functional changes in their output, namely to cognitive biases. Therefore, we have abstracted three characteristics, which cognitive architectures should show, to be able to be biased just as humans. The results should depend on 1) recent inputs, even if they are independent of the current problem 2) the order of the given problem task 3) the representation of the given problem task.

We have evaluated the three exemplary cognitive architectures Clarion, Leabra and Lida with respect to these characteristics. All architectures are influenced by recent inputs and could return different results. Clarion cannot distinguish the situations, when the problem task is given in a different order. Leabra weighs earlier inputs more and can therefore get different results, here. For Lida it depends: if the task is given during one cycle, then it cannot create different solutions, otherwise this is possible. The representation of the problem can influence Clarion,

because of the filtering possibility of the metacognitive system. The combination of chunk network and neural network can also generate different representations respectively. Leabra and Lida lack the filtering mechanisms to do so.

In future work more cognitive architectures could be evaluated in order to make them comparable as well. Further, practical evaluations with these architectures can be performed to verify to what extend the expected effects actually occur. Therefore as first step test data sets with the corresponding attributes need to be generated. For a given architecture and characteristic two sets of test data are created, e.g. only varying the representation of the problem. They are then independently given to two identical cognitive architectures and the results are compared. Thus, the current theoretical measure will become a practical metric for measuring cognitive architectures ability to be biased.

References

[1] F. Gobet, P. C. Lane, The chrest architecture of cognition: The role of perception in general intelligence, in: Procs 3rd Conf on Artificial General Intelligence, Atlantis Press, 2010.

[2] J. R. Anderson, Human symbol manipulation within an integrated cognitive architecture, Cognitive science 29 (3) (2005) 313-341.

[3] C. Lebiere, The dynamics of cognition: An act-r model of cognitive arithmetic, Kognitionswis-senschaft 8 (1) (1999) 5-19.

[4] Comparative table of cognitive architectures, last Retrieved: 2016-20-04. URL http://bicasociety.org/cogarch/architectures.htm

[5] J. Kruger, D. Dunning, Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments., Journal of personality and social psychology 77 (6) (1999) 1121.

[6] M. I. Norton, D. Mochon, D. Ariely, The'ikea effect': When labor leads to love, Harvard Business School Marketing Unit Working Paper (11-091).

[7] A. Tversky, D. Kahneman, Judgment under uncertainty: Heuristics and biases, science 185 (4157) (1974) 1124-1131.

[8] G. Gigerenzer, D. G. Goldstein, Reasoning the fast and frugal way: models of bounded rationality., Psychological review 103 (4) (1996) 650.

[9] A. Lieto, D. P. Radicioni, From human to artificial cognition and back: New perspectives on cognitively inspired {AI} systems, Cognitive Systems Research 39 (2016) 1 - 3.

[10] A. Tversky, D. Kahneman, The framing of decisions and the psychology of choice, Science 211 (4481) (1981) 453-458.

[11] S. E. Asch, Forming impressions of personality., The Journal of Abnormal and Social Psychology 41 (3) (1946) 258.

[12] D. Kahneman, Thinking, fast and slow, Macmillan, 2011.

[13] R. Sun, The importance of cognitive architectures: An analysis based on clarion, Journal of Experimental & Theoretical Artificial Intelligence 19 (2) (2007) 159-193.

[14] R. Sun, A tutorial on clarion 5.0, Cognitive Science Department, Rensselaer Polytechnic Institute,.

[15] R. C. O'Reilly, The leabra model of neural interactions and learning in the neocortex, Ph.D. thesis, Carnegie Mellon University Pittsburgh, PA (1996).

[16] S. Franklin, F. Patterson Jr, The lida architecture: Adding new modes of learning to an intelligent, autonomous, software agent, pat 703 (2006) 764-1004.

[17] U. Faghihi, C. Estey, R. McCall, S. Franklin, A cognitive model fleshes out kahnemans fast and slow systems, Biologically Inspired Cognitive Architectures.