URL: http://www.elsevier.nl/locate/entcs/volume67.html 12 pages

States of Knowledge

Rohit Parikh 2,1

Department of Computer Science Graduate Center City University of New York

365 Fifth Avenue New York, NY 10016-4309 USA

Abstract

When a group of people are thinking about some true formula A, one of them may know it, all of them may know it, or it might be common knowledge. But there are also many other possible states of knowledge and they can change when communication takes place. What states of knowledge are possible? What is the dynamics of such changes? And how do they affect possible co-ordinated action? We describe some developments in this domain.

Key words: knowledge, states of knowledge, social software, distributed knowledge

A travelling salesman found himself spending the night at home with his wife when one of his trips was accidentally cancelled. The two of them were sound asleep, when in the middle of the night there was a loud knock at the front door. The wife woke up with a start and cried out, 'Oh my God! It's my husband!' Whereupon the husband leapt out of bed, ran across the room and jumped out the window.

Schank and Abelson, 1977, p. 59.

1 Introduction

Wimmer and Perner begin their paper [19] on Beliefs about beliefs with this story from Schank and Abelson which may seem amusing to some and disturbing to others. But the point of the story seems to be that husband and wife each have their own scenario and neither corresponds to the actuality.

1 Email: rparikhQgc . cuny. edu

2 Research supported by grants from the NSF and the CUNY-FRAP program

©2002 Published by Elsevier Science B. V.

Wimmer and Perner themselves are concerned primarily with the perception by children of other people's mindsets. The following quote from [19] is a story (in Austrian English) about Maxi which they told a group of children:

Mother returns from her shopping trip. She bought chocolate for a cake, Maxi may help her put away the things. He asks her 'Where should I put the chocolate?' In the blue cupboard, says the mother.

Later, with Maxi gone out to play, the mother transfers the chocolate from the blue cupboard to the green cupboard, Maxi then comes back from the playground, hungry, and he wants to get some chocolate.

In Wimmer and Perner's experiment, little children who were told the Maxi story were then asked the BELIEF question, "Where will Maxi look for the chocolate?"

Children at the age of 3 or less invariably got the answer wrong and assumed that Maxi would look for the chocolate in the green cupboard where they knew it was. Even children aged 4-5 had only a one third chance of correctly answering this question or an analgous question involving Maxi and his brother (who also wants the chocolate and whom Maxi wants to deceive). Children aged 6 or more were by contrast quite successful in realizing that Maxi would think the chocolate would be in the blue cupboard - where he had put it and that if he wanted to deceive his brother, he should lead his brother towards the green cupboard.

Thus it seems that representation of other people's mindset comes fairly late in childhood, well after they have learned to deal with notions of belief and belief based action for themselves and for others who share their own view of reality. In [18] Chris Steinsvold investigates modal logics which are intended to represent the states of mind of young children. See also [17],

Older children are not much better. In an experiment in my daughter's 7th grade class, I found that they were unable to deal with the muddy children puzzle beyond the first one or two levels.

In this, by now well known puzzle, a number of children are playing in the mud and some of them get their foreheads dirty. At this the father comes on the scene and announces, "at least one of you has got her forehead dirty",

Scenario 1: Suppose there is only one child, say Amy, who is dirty. Then she will realize that her own forehead must be dirty since she can see that the others are clean.

Scenario 2: Suppose now that there are two dirty children, Sarah and Amy, who are asked in turn, "Do you know if your forehead is dirty?" Now when Sarah is asked, she can see Amy's dirty forehead and she replies, "I don't know", However, when Amy is asked, she is able to reason, "If my forehead were clean, Sarah would have known that hers must be dirty since all the others are clean. But Sarah did not know. So my forehead must be dirty,"

This reasoning on Amy's part requires a representation by Amy of Sarah's state of mind, and clearly Amy must be at least six for this to work. However,

Sarah herself must have some reasoning ability and Amy must know that she has such abilities. It is not enough for Amy to know Sarah's view of reality, she must also represent Sarah's logical abilities in her own mind.

As the number of dirty children goes up, there is a need for higher and higher levels of "I know that he knows that she knows that,,,". Common knowledge is at the end of this road and has been offered as the explanation of co-ordinated behaviour ([8,6,2]), For instance Halpern and Moses in [6] show that the co-ordinated attack problem requires common knowledge between the two generals, and that given the means of communication they have, such common knowledge is impossible to attain, Clark and Marshall indicate similar difficulties with the referent of "the movie playing at the Roxv today",

While it is true that co-ordinated actions and, supposedly, common knowledge do happen, it may also be relevant to consider other levels of knowledge, short of the infinite, common knowledge, level, 3 Such levels also arise in certainly pragmatic situations, e.g. with email or snailmail or messages left on telephones as voice mail. Thus the purpose of this paper is to study levels other than common knowledge.

In typical co-operative situations even if a certain level of knowledge is needed, a higher level would also do. If Bob wants Jill to pick up the children at 4 PM, it is enough for him to know that she knows. Thus if he sends her email at 2 PM and knows that she always reads hers at 3 PM, he can be satisfied. In such a situation Bob knows that Ann will know about the children in time, or symbolically Kh(Ka(C)) and he may feel this is enough. However, if he telephones her at 3 PM instead, this will create common knowledge of C, much more than is needed. But no harm done, since in this context, Ann and Bob have the same goals, Halpern and Zuck also state a knowledge level requirement for the sequence transmission problem, which suffices as a minimum, but since the parties are co-ordinating, a higher level does not harm.

But in other contexts one may wish for just a particular level of knowledge, no lower, and no higher. Suppose for instance that Bob wants Ann to know about a seminar talk he is giving, just in case she wants to come, but he does not want her to feel pressured to come - she should come only out of interest and not from politeness. In that case he will want to arrange that Kh(Ka(S)), (he himself knows that she knows about the seminar) but not Ka(Kh(Ka(S)))

3 The following, possibly apocryphal story about the mathematician Norbert Wiener, who was well known for his absent mindedness, illustrates something even more subtle. At one time the Wieners were moving and in the morning as he was going to work, Mrs. Wiener said to him, "Now don't come home to this address in the evening." And she gave him a piece of paper with the new address. However, in the evening Wiener found himself standing in front of the old address and not knowing what to do - he had in the meanwhile lost the slip of paper with the new address. He went to a little girl standing by and said, "Little girl, do you know where the Wieners have moved to?" The little girl replied, "Daddy, Mom knew what would happen so she sent me to fetch you." The moral of the story, for us, is that common knowledge works only if the memory of all parties involved is reliable.

(Ann knows that Bob knows that Ann knows about the seminar), for in the latter ease she would feel pressured. Instead of telling her about his talk, which would create common knowledge, he may arrange for some other method, perhaps for a student to tell her, but without saying that it is a message from Bob.

Suppose a pedestrian is crossing the street and sees a car approaching him. It happens in many cities, Boston, Naples, etc., that the pedestrian will pretend not to notice the car, thereby preventing KdKp(C) with C representing the car, d being the driver and p the pedestrian. If the driver knew that the pedestrian knew, he might drive aggressively and try to bully the pedestrian into running or withdrawing. But if he does not know that the pedestrian knows, he will be more cautious.

While the social questions are fascinating and are addressed elsewhere (Cf, [13]), in this paper we shall concentrate on the technical aspects of knowledge, where it is assumed that everyone involved is logically perfect. One can still ask, what are the various levels of knowledge which can arise under various circumstances of communication?

2 Model of a distributed system

Note: most of the results which follow are joint with Paul Krasucki, except where indicated, and full proofs are available in [14]-

We assume that there are a finite number of processes, l,,,,,n, which compute and communicate with each other either by asynchronous messages or by broadcasts. Our network is assumed to be fully connected4 (there is a channel from every process to every other process).

Asynchronous communication consists of two phases: send and receive. All messages sent are ultimately delivered (and they are delivered in the order in which they were sent) but the delay (transmission time) may be arbitrarily long.

Broadcasts are fully reliable, synchronous communications 5 where all processes involved simulteneouslv receive the message sent by one of them.

Now we formally specify our class of models. Let N = {1,,,,, n} be the set of all processors. Every processor i has infinitely many possible initial states

4 If the network is not fully connected then some levels of knowledge may be impossible to realize due to the lack of communication capabilities, e.g. if a processor is isolated (cannot communicate with anyone)then the other processes cannot learn anything from that process. Interesting questions arise in case of a directed network where every process may-communicate with every other process but some communications are necessarily indirect (go through other processes). We will not analyze this case here.

5 The two kinds of communications can be looked at as two kinds of communication media e.g. mailing system (asynchronous) and telephone lines (synchronous). Since we allow for synchronous communication between more than two processes at a time, our telephone system must have "conference call" capability.

v. Every initial state is a string of O's and l's (v e {0,1}*), The set of initial states for i we denote by Vi. The set of global initial states is V =

From now on we will use lower ease letters to denote everything pertaining to a single process. Capitals will be used where all the processes are involved (e.g. Vi is an initial state of a processor i, while V is an initial configuration of the whole system: V = (vi,,,,, vn)).

Events: Ei denotes the set of all events in which processor i can participate (events local to i). There are the following types of events (or actions):

(i) Lf Local computation steps,

(ii) s(i,j, m): Sending a message m, to a processor j, j e N.

(iii) r(j, i,m): Receiving a message m, from a processor j, j e N.

(iv) bc(i, U, m): Sending a broadcast m, to a group of processors U,i E U C N.

The same event is receiving a broadcast m by a group of processes U.

Ei = Li U {s(i,j, m)\m e M, j e N} U {r(j, i, m)\m e M, j e iV}

U {bc(j, U, m)\m e M, i, j e U C N} U {bc(i, U, m)\m G M, i e U C N}

We define the set of global events G in our system, G C n"=i№ u {null}) (a cartesian product) s.t, if (ei,..., e«,..., en) E G for some i and ei = bc(j, U, m) then for all i! e U, e¿/ = bc(j, U, m). If = null for some i, it means that there is no local event at i at this point. Note that null is not local to any process. We use the notation (G)i to denote the ith coordinate of G, so (ei ,...,ei,...,en)i =

Histories: A history (a run) is an input value followed by a sequence of events. Let's call the set of all possible histories of the system - a protocol P. So P C V; G*. Protocols are always closed under taking an initial segment of a history: H 6 P implies that every H' which is an initial segment of H is in P.

We will require that for every receive in every history in every protocol there is exactly one corresponding send and it occurs before receive (this condition we will call time-consistency).

We say that two histories H and H' are compatible iff they start with the same input values.

We can define the concatenation of compatible histories: If Hi = V; (•',:...; Gk, and H-> = V; G[;...: (•',. then H is the concatenation of //| and I/•< iff H = V', (71:,,,; (7/,: G\:,,,; GJ,

Local histories are the projections of global histories onto the sets of local events of the processors. They are "time-forgetting",

We assume that a global event - the ticking of the clock - takes place even if no local events take place at a particular moment. Given i, and the global history H, the local history hi consisting of the events seen by i, is uniquely defined and we let be the map which takes us from H to hi.

The local history is everything the processor sees, so all the global histories which correspond to the same local history hi look the same to the processor

i. Note that the length of <&i(H) is less than or equal to the length of H. In fact length($i(H)) = length(H) iff there are no null events on i in H.

For every i we can define an equivalence relation on the set of global histories:

H^H' iff $i{H) = $i{H')

This relation is extended to groups U by letting H H' iff there exists a

chain II = Hi. //•_>.....//„, = II' and for all i < n. there is a j E U such that

Hj, II; i •

We use capital letters to denote global histories, events etc, lower case letters do denote local histories, events etc.

Closure Conditions for the Protocol: We impose some additional conditions on the protocol P. We want to ensure that the initial state of i (v^ cannot be known to any other process j at any run of the system, unless j learns about Vi from some communication. We want to exclude the possibility that something is common knowledge "accidentally", To achieve that we will make sure that all the initial states are possible. Moreover, if vjt is the initial state of i, all other strings v[ will remain possible for j as initial states of i, unless j gets some message from i to the contrary (directly or via some other processors),

1) All vectors of input values are possible: VV s.t, V = (vi,..., vn) where every Vi is a sequence of O's and l's there is some H G P s.t, for some H',

H = V; II'.

2) No sequence of local events on some group of processes can influence possible actions of some other group of processes unless there are some communications (of course assuming that both groups are disjoint).

For that we need some closure conditions on the set of all protocols. The first condition we use is due to [2] (it is the first of their principles of computation extension ),

We need one definition: Let G = (ei,... en), G is on U if U = {i\{G)i ^ null} (so U is the set of processes which have some local events in G).

Closure conditions:

(i) Extension Rule: Let Vi G U, H p^i H', G is on U, none of (G)i is receive r(j, i, m) for any j not in U, then

( II' G P. ll:G G P) ^ ll':G G P

The extension rule guarantees that if we have a protocol P, some history H in P and some action of a group of processes U is possible in H, then the same action must be possible in every history H' which looks the same to all processes in U unless it violates time-consistency. In order to explain why ei cannot be a receive from a processor outside of U let us examine an example:

Let N = {1,2,3}, U = {1,2}. H = (null, null, s(3,1, m)), H' = (null, null, null). Clearly H «x H' and

H H'. If we take G = (r( 3,1, m), null, wuli) s.t. II :(i'eP then requiring iï'; G to be in P would violate time-consistency.

The following conditions ensure that no process can get any additional information about the other processes by observing its own local events (no hidden synchronization). These conditions are necessary because (unlike [2]) we allow local events at different sites at the same instant of time. Condition (ii) says that if some local events have occurred in parallel, and the sets of participating processes were disjoint, they could have occurred in sequence. We'll call it the splitting rule.

(ii) Splitting Rule:

G = (<■ i.... < „ ). G 0 V, G is on U. Given U\, I '•_> s.t. U\ U U2 = U and U\, I '•_> disjoint, then we can "split" any G into Gi and G2:

( II: G e P) => //:(;,:(;., e P

where (G)i = {Gi)i for i e U\, (G)i = (G2)i for i e U2, (G\)j = null = (G2)k for j U\, k 0 U2 provided that we don't split any broadcasts: (G)i = bc(i, V, rn) ^ VCUi V V Ç U2.

Condition (to) says that if some local events have occurred in sequence, the sets of participating processes were disjoint, and there was no send receive pair in them, they could have occurred in parallel.

(iii) Joining Rule:

Given i'i. U2 s.t. Ui U U2 = U and i'i. U2 disjoint, Let G'i be on Ui, G2 on U2, and if (G\)i = s(i,j,m) then (G2)j ^ r(i,j,m).

( II: : G-j e P ^ 11: G e P)

where (G)i = {G\)i for i e Ui, (G)i = (G2)i for i e U2.

Systems: We consider three kinds of systems. Asynchronous systems are the systems as described above but without broadcasts. So in asynchronous systems the only communications are via send and receive. Synchronous systems are the systems in which all the communications are done using broadcasts where we don't have the events send and receive. Finally, we use the name mixed communications systems for the systems with both kinds of communications available.

2.1 Language and Semantics

Let L0 be a language which describes properties of the global histories in a protocol P. So for every sentence A in L0, and for every history H EP, A is either true or false in H.

We want to make sure that in every history initially every processor has some "private" information not known to any other processor. To accomplish that we assume that we have in our language a countable set of propositions Li = {Qi,j}i,j£N- Qi,j is the proposition that the jth input value of i is 1.

All Qij are independent. Private information of i in H are Py which are Qy or its negation depending on whether Qiyj is true in H or not. Note that the private information is not a truth value of any formula, but which formula we're looking at,

L is the closure of L0 under truth functional connectives, L can be extended to a larger language Lc which is the closure of L under common knowledge operators Cv (for U C N) and the usual truth functional connectives, CV(A) means that there is common knowledge of A among processes from U.

The knowledge of a single process corresponds to C{¿}, We will then use the notation6 Ki for C^j. When we restrict ourselves to a subset of Lc in which all common knowledge operators are in fact the knowledge operators (the sets U in Cv are always singletons) then we use the notation LK.

The class of all models we consider is the class of all protocols P as described in the previous section. Fix P, Now we define the notion H |= A for A in L+ by recursion on the complexity of A.

0) If A is from L0 then the semantics is given,

1) If A is Qij then A is true in H if the jth bit of an input of processor i in H is 1:

H\=A iff // = (>,.....!■„): II'. (>v)j = 1

2) If A is -iA' then

H 1= A iff H Y= A'

If A is B V C then

H |= A id' ( II = H or //=C)

3) If A is of the form Kt(B) then

II |= KjA iff V//' e P II //' ^ H' |= A

4) If A is of the form Cu(B), then

H |= A iff for all //'. II' p^u II. II' |= B

Also if U is empty, then CijA iff A.

Theorem 2.1 Let be the alphabet whose symbols are {Q-jv v For all x, y in Y?c, and all formulae A, for all H, V C U C N, H \= ,r( ) ■( \ i/A iff H |= xCyCuyA iff H |= ,r( '/•//• 1-

Corollary 2.2 Let TiK be the alphabet whose symbols are {Ki,... ,Kn} For all a in Tik, and for all x, y, in H*K, and all formulae A,

b xayA -H- xaayA

6 Fact that C{¿i. = Kt was noticed earlier, compare e.g. [4]. It is important that we assume that Lk and Lq are S5 (we need at least S4).

and hence for all H, H |= xayA iff H \= xaayA. I.e. repeated occurrences of a are without effect and if xay e Tk(A, H) then Vn xany & Tk(A, H).

Definition 2.3 Given a formula A and a history H, the level of A at H, L(A, H) is the set of x in such that H |= xA, and x contains no substrings CVCV- CVCV for any V CUCN.

If H is clear from the context, or not important, then we shall drop it as a parameter. If we restrict ourselves to the Kjt operators, we denote the level of A in H by LK(A, H).

3 Embeddability

Now we will try to characterize levels of knowledge. First we need to introduce the embeddability ordering on strings which turns out to be important here.

Definition 3.1 Given two strings x and y, we say that x is embeddable in V (x < y), if all the symbols of x occur in y, in the same order, but not necessarily consecutively. Formally:

1) x < x, e < x for all x

2) x < y if there exist x', x", y', y", (y', y" ^ e), such that x = x'x", y = y'y", and x' < y', x" < y".

and < is the smallest relation satisfying (1) and (2),

Thus the string aba is embeddable in itself, in aaba and in a,bca, but not in aabb.

Properties of the embeddability relation <

Fact 3.2 Embeddability is a well partial order, i.e. it is not only well founded, but every linear order that extends it is a well order (equivalent condition: it is well founded and every set of mutually incomparable elements is finite).

Fact 3.3 Embeddability can be tested in linear time, e.g. by a nondeterminis-tic finite automaton with two input tapes.

For a proof of fact 3,2 see [?]. Fact 3,3 is straightforward.

We also need a stronger relation defined on which we call C-embeddability.

Definition 3.4 Given two strings x and y, we say that x is C-embeddable in y (x <y), if

1) If V C U then cv ^ ()■

2) x < y if there exist x', x", y', y", (y1, y" ^ e), such that x = x'x", y = y'y", and x' <y', x" < y".

and ^ is the smallest relation satisfying (1) and (2),

Fact 3.5 For any x, y e £*, x <y iff x <y.

Fact 3.6 C-embeddability is a well partial order.

Fact 3,5 is easy. It is also easy to check that C-embeddability is a partial order. It is well founded, because regular embeddability is well founded and

for given igE^ there are only finitely many t/eE^ s.t. \x\ = \y\ and y < x. There are only finitely many incomparable elements in with respect to <, and there are more incomparable elements with respect to < than with respect to so ■< is a well partial order, □

If < is a partial order on S, we can define a notion of a downward closed subset of S:

Definition 3.7 R C S is downward closed iff x E R implies Vy < x, y E R.

We will look at downward closed sets with respect to embeddabilitv and C-embeddabilitv,

Theorem 3.8 Let £c he the alphabet whose symbols are {Q-jv v- Then for all strings x, y Y?c, if x -< y then for all histories H, if H \= yA then H \= xA.

4 The Main Results

Corollary 4.1 Every level of knowledge is a downward closed set with respect to □

Theorem 4.2 There are only countably many levels of knowledge and in fact all of them are regular subsets ofH* (where £ is either or D

Fact 4.3 Eric Pacuit of the CUNY Graduate center and ourselves have shown that in contrast with knowledge there are uneountablv many possible levels of rational belief. This is curious as truth is the only condition which (formally) separates knowledge from rational belief These results will appear elsewhere.

Corollary 4.4 The membership problem for a level of knowledge can be solved in linear time.

Theorem 4.5 If L is a non-empty finite subset of H*K, then L is downward closed iff for some k,

L = \Jdc({xi})

where Xi E H*K. This theorem reiterates the fact that the finite levels are characterized by their maximal elements (xi, ...,Xk are maximal).

Definition 4.6 A formula A is persistent if whenever H |= A and H' extends II. then II' \= A.

Theorem 4.7 If A is persistent then so is Ki(A) for any i.

Theorem 4.8 Every formula A which is a boolean combination of Pi's is persistent. □

Theorem 4.9 Every formula of the form xA where A is a boolean combination of Pi's, and x is a string of knowledge operators is persistent. □

Theorem 4.10 (Chandy, Misra) If communication is purely asynchronous, and for some histories H,H', s.t. H is an initial segment of H':

II' \= KlK2...KnA and II £ KnA

then in II' — II there must be a sequence of messages: ran_i, ■ ■ ■ ,m,i s.t.

mn-i is sent by n and reaches n — 1 (maybe via some other processes),.. ,,mi is sent by 2 and (maybe indirectly) reaches 1 (messages may be different but they all must imply A). Moreover if A doesn't depend on any local event of n (its truth value depends on some event e 0 En) then there must be some event of the form r(i, n, m) occurring after H but before s(n, n — 1, mn_i).

Theorem 4.11 Every finite downward closed set is the set L(A,H) for an appropriate A and H in some asynchronous protocol.

Theorem 4.12 Every downward closed set L of strings without repetitions is L(A,H) for suitable A and H in a synchronous system with at least 3 processors.

Theorem 4.13 In a two processor system with only synchronous communication available, no finite level containing strings of length > 2 can be achieved for any formula A.

Theorem 4.14 In system with k-casts, i.e. with broadcasts involving at most k processors, it is impossible to achieve common knowledge of any new fact in a group of size > k.

References

[1] J. Barwise, Three Views of Common Knowledge, in TARK-2,_ Ed. M. Vardi, Morgan Kaufmann 1988, pp. 369-380.

[2] Chandy M. and Misra J., "How Processes Learn", Proceedings of 4th ACM Conference on Principles of Distributed Computing (1985) pp 204-214.

[3] H. H. Clark and C. R. Marshall, Definite Reference and Mutual Knowledge, in Elements of Discourse Understanding, Ed. Joshi, Webber and Sag, Cambridge U. Press, 1981.

[4] M. Fischer and N. Immerman, "Foundations of Knowledge for Distributed Systems, Yale Univ. Tech. Report YALEU/DCS/TR-450, December 1985

[5] J. Hintikka, Knowledge and Belief Cornell U. Press, 1962.

[6] J. Halpern and Y. Moses, Knowledge and Commmon Knowledge in a Distributed Environment, Proc. 3rd ACM Symposium on Distributed Computing 1984 pp. 50-61.

[7] J. Halpern and L. Zuck, A Little Knowledge goes a Long Way, Proc. 6th PODC, 1987, pp. 269-280.

[8] D. Lewis, Convention, a Philosophical Study, Harvard U. Press, 1969.

[9] R. Marvin, M. Greenberg and D. Mossier, "The Early development of conceptual perspective thinking", Child Development, 47 (1976) 511-514.

[10] Y. Moses and M. Tuttle, Programming Simultaneous Actions using Common Knowledge, Research Report MIT/LCS/TR-369 (1987).

[11] R. Parikh, "Knowledge and the Problem of Logical Omniscience", ISMIS-87, North Holland, pp. 432-439.

[12] R. Parikh, "Finite and Infinite Dialogues", Proceedings of a Workshop on Logic and Computer Science, ed. Moschovakis, Sprinter 1991, 481-98.

[13] R. Parikh, "Social Software", to appear in Synthese September 2002.

[14] R. Parikh and P. Krasucki, "Levels of knowledge in distributed computing", Sadhana - Proc. Ind. Acad. Sci. IT (1992) pp. 167-191.

[15] R. Parikh and R. Ramanujam, Distributed Computing and the Logic of Knowledge, Logics of Programs 1985, Springer LNCS 193, 256-268.

[16] R. Schank and R. Abelson, Scripts, Plans, Goals, and Understanding, Erlbaum Hillsdale, NJ (1977).

[17] C. Steinsvold and R. Parikh, "A Modal analysis of some phenomena in child psychology", Bulletin of Symbolic Logic, Mar 2002, Logic Colloquium '01, page 158.

[18] C. Steinsvold, "Trust and other modal phenomena", research report, CUNY Graduate Center, February 2002.

[19] H. Wimmer and J. Perner, "Beliefs about beliefs: representation and constraining function of wrong beliefs in young children's understanding of deception", Cognition, 13 (1983) 103-128.