Scholarly article on topic 'Extremism and Social Learning'

Extremism and Social Learning Academic research paper on "Law"

Share paper
Academic journal
Journal of Legal Analysis
OECD Field of science

Academic research paper on topic "Extremism and Social Learning"


Edward L. Glaeser1 and Cass R. Sunstein2


When members of deliberating groups speak with one another, their predelibera- °

tion tendencies often become exacerbated as their views become more extreme. _

The resulting phenomenon — group polarization — has been observed in many :

settings, and it bears on the actions of juries, administrative tribunals, corporate a

boards, and other institutions. Polarization can result from rational Bayesian updat- X

ing by group members, but in many contexts, this rational interpretation of po-

larization seems implausible. We argue that people are better seen as Credulous u

Bayesians, who insufficiently adjust for idiosyncratic features of particular environ- a

ments and put excessive weight on the statements of others in situations of (1) 0

common sources of information; (2) highly unrepresentative group membership; /

(3) statements that are made to obtain approval; and (4) statements that are S

designed to manipulate. Credulous Bayesianism can produce extremism and sig- g'

nificant blunders — the folly of crowds. We discuss the implications of Credulous S

Bayesianism for law and politics, including media policy and cognitive diversity on S

administrative agencies and courts.


Many people have celebrated the potential value of deliberation, includ- 1 1

ing its uses in democracy (Habermas 1998), and it is tempting to think 0

that group decision-making will both produce wiser decisions and aver- 5 age out individual extremism. In many settings and countries, however,

1 Fred and Eleanor Glimp Professor of Economics, Harvard University.

2 Felix Frankfurter Professor of Law, Harvard Law School. For valuable comments and sugges-

tions, we are grateful to Daniel Benjamin, David Laibson, Eric Posner, Richard Posner, Andrei Shleifer, Adrian Vermeule, and participants in a workshop at Harvard Law School. We are also

grateful to participants in a workshop at the National Bureau of Economic Research, above all our commentator Oliver Hart, and to an anonymous referee for an exceedingly careful report.

Winter 2009: Volume 1, Number 1 ~ Journal of Legal Analysis ~ 263

researchers have found that group deliberation leads people to take more extreme positions (Brown 1986). The increased extremism, often called group polarization, is usually accompanied by greater confidence and significantly decreased internal diversity, even when individual opinions are given anonymously (Schkade, Sunstein & Kahneman 2000; Brown 1986, 207). These facts, which are summarized in Section 2 of this article, appear to cast doubt on the wisdom, and certainly the moderation, of crowds.

If deliberation leads liberals to become more liberal, and conservatives to o

become more conservative, the effects of deliberation are unlikely to be O

desirable in both cases. Deliberation might account for the folly, not the d

wisdom, of crowds. °

Group polarization has evident implications for many issues in law and g

politics. It suggests, for example, that like-minded jurors, judges, and ad- j

ministrative officials will move to extremes. If group members on a corpo- c

rate board or in a political campaign are inclined to engage in risk-taking d

behavior, group deliberation will produce increased enthusiasm for taking c

risks. But the mechanisms behind group polarization remain inadequately . understood, and it is difficult to make predictions or to offer prescriptions / without identifying those mechanisms. S

In Section 3 of this article, we show that group polarization is predicted I by a highly rational process of Bayesian inference. If individuals have in- a

dependent information, which is shared in the deliberative process, then 3

Bayesian learning predicts that ex post opinions will be both more homoge- v

neous within the group and more extreme than individual opinions. Bayes- g. ian inference suggests that individuals with access only to their own private | information will recognize their ignorance and hew towards the center. The 1

information of the crowd provides new data, which should lead people to 2 be more confident and more extreme in their views. Because group mem- 5 bers are listening to one another, it is no puzzle that their post-deliberation opinions are more extreme than their pre-deliberation opinions. The phenomenon of group polarization, on its own, does not imply that crowds are anything but wise; if individual deliberators tend to believe that the earth is round rather than flat, nothing is amiss if deliberation leads them to be firmer and more confident in that belief.

While group polarization may reflect perfect Bayesian inference, there are other facts, summarized in Section 4, that cast doubt on this rosy ratio-

nal interpretation. Often group deliberation produces greater confidence and greater extremism when little or nothing has been learned. Group polarization occurs even when little information is exchanged (Brown 1986). People appear to attend to the stated opinions of others even when those opinions are patently wrong (Asch 1955). Individuals often fail to give sufficient weight to the possibility that offered opinions are distorted by private incentives to mislead (Camerer, Ho & Chong 2004) or that people's actions reflect private information (Eyster & Rabin 2005). Outside the laboratory, professional persuaders, such as advertisers, political leaders, and clerics, have successfully led people to hold disparate religious beliefs that cannot all be true (Glaeser 2004), and to think, falsely, that they prefer the taste of Coke to Pepsi (Shapiro 2006) and that Mossad was responsible for the attacks of September 11, 2001 (Getzkow & Shapiro 2004).

In Sections 5-8 of this article, we suggest that social learning is often best characterized by what we call Credulous Bayesianism. Unlike perfect dd Bayesians, Credulous Bayesians treat offered opinions as unbiased and c

independent and fail to adjust for the information sources and incen- .

tives of the opinions that they hear. There are four problems here. First, /

Credulous Bayesians will not adequately correct for the common sources S

of their neighbors' opinions, even though common sources ensure that I

those opinions add little new information. Second, Credulous Bayesians a

will not adequately correct for the fact that their correspondents may not 3

be a random sample of the population as a whole, even though a non- V

random sample may have significant biases.3 Third, Credulous Bayesians

will not adequately correct for any tendency that individuals might have to |

skew their statements towards an expected social norm, even though peer 1

pressure might be affecting public statements of view. Fourth, Credulous 3-

Bayesians will not fully compensate for the incentives that will cause some 5 speakers to mislead, even though some speakers will offer biased statements in order to persuade people to engage in action that promotes the speakers' interests. Our chief goal in Sections 5-8 is to show the nature and effects of these mistakes, which can make groups error-prone and anything but wise,

3 It is possible, of course, that a non-random sample will be unbiased; consider a non-random sample of neutral experts on the question whether, say, DDT imposes serious health risks. We use the term "representative sample" to mean relevantly representative, in the sense of lacking any kind of bias or skew.

especially if they lack sufficient diversity.

In Section 5 of the article, we assume that errors in private signals are correlated across individuals. Credulous Bayesians overestimate the extent to which these signals are independent. The first proposition of the paper shows that when individuals are Credulous Bayesians, their post-deliberation beliefs become more erroneous and they acquire more misplaced confidence in those erroneous beliefs. This proposition helps explain why socially formed beliefs, like those about religion, politics, and constitutional law (and some- o times science as well), can be quite strongly held, despite a lack of evidence O and an abundance of other groups holding opposing beliefs. d

Our second proposition shows that when individuals are Credulous <f Bayesians, accuracy may decline as group size increases. As group size in- t creases, mistakes can become more numerous and more serious. After all, j the essence of Credulous Bayesianism is that people misuse the informa- c tion of their neighbors, so more neighbors means more errors. This find- d ing suggests that in some settings individuals may be wiser as well as less | extreme than crowds (compare Surowiecki 2004; Page 2006). .

We then turn to the possibility that an individual's friends and social net- / works are not a random sample of the population. A group of people might S have skewed views on questions of policy or fact, and group members may I not sufficiently adjust for that fact. We formalize this possibility by assum- a? ing that noise terms in the sample are correlated, rather than independent, f

as they could be if the group has been selected on some attribute or taste. v

Credulous Bayesians underestimate the correlation of the signals and act as if their neighbors are a random sample of the population as a whole. In CI this case, Credulous Bayesianism again causes more extremism and more 1 error. Here too, larger group sizes (so long as they do not produce repre- t

sentativeness) can make decision-making less accurate. For a wide range 5 of parameter values, more correlation decreases accuracy. This is our first result favoring intellectual diversity.

We then model intellectual diversity more formally as mixing people whose information reflects different group-specific error terms. Intellectual diversity is much more valuable for Credulous Bayesians than it is for perfect Bayesians. When a group moves from being totally homogenous to being formed out of two equally sized populations with different sources of common information, then the variance of the post-deliberation error falls

more quickly for Credulous Bayesians than it does for perfect Bayesians. Intellectual diversity always has value (Page 2006), but it becomes particularly important when decision-makers are Credulous Bayesians.

A large body of research has discussed the human tendency to give state- 10 ments that conform to an expected community norm. For group deliberation, the problem is that people may discount this tendency and think, wrongly, that public statements actually convey information. In Section 6, we model conformism by assuming that individuals' statements reflect a o combination of private information and an expectation of what individu- O als think that the group wants to hear. Credulous Bayesians fail fully to d adjust for the fact that statements are skewed to the norm. The combina- <f tion of conformism and Credulous Bayesianism creates error, tight homo- g geneity within groups, and greater heterogeneity across groups. If people j utter politically correct statements, with the aim of avoiding the wrath of c others, then Credulous Bayesianism could help explain both the blue state/ d red state phenomenon of ideological homogeneity within areas and hetero- | geneity across areas (Glaeser & Ward 2006). .

In Section 7, we assume that some individuals, like legal advocates or 11 / politicians, have incentives to report misleading information in their quest S to change people's decisions. "Polarization entrepreneurs," in law and poli- I tics, might attempt to do exactly that. This claim is in a similar spirit to a? Sendhil Mullainathan, Joshua Schwarzstein, and Andre Shleifer (2007), "

who examine the interaction between persuasion and categorical think- v

ing. In this case, Credulous Bayesians fail fully to correct for the motives of g those around them. The combination of incentive-created misstatements CI and Credulous Bayesianism always leads to less accurate assessment and 1 can lead to bias as well. The degree of bias depends on the imbalance of 2 resources or incentives across persuaders, not the persuasion per se. 5

Section 8 of the paper briefly considers some policy implications of 12 Credulous Bayesianism. Of course it is true that identification of potential group errors cannot lead to any particular set of institutional arrangements, which must be chosen after considering many variables. But in the legal setting, our results help to explain the longstanding practice of requiring a degree of political diversity on the independent regulatory commissions, such as the National Labor Relations Board, and also cast light on the current debate over intellectual diversity on the federal judiciary.

For public and private institutions, unbiased decision-making may depend more on maintaining balance across decision-makers than on the elimination of misleading statements. We also offer some brief remarks on media policy, with particular reference to the now-abandoned fairness doctrine. Our question here involves the likely consequences if people are exposed to beliefs from sources with a defined set of ideological convictions.


The original psychological experiments

on the effects of deliberation involved ft

risk-taking behavior, with a demonstration that risk-inclined people become o

still more risk-inclined after they deliberate with one another (Stoner 1961). g

Risky decisions include taking a new job, investing in a foreign country, es- j

caping from a prisoner-of-war camp, or running for political office (Hong c

1978). With respect to many such decisions, members of deliberating groups d

became significantly more risk-inclined after a brief period of collective dis- |

cussion. On the basis of this evidence, it became standard to believe that de- .

liberation produced a systematic "risky shift" (Brown 1986). /

Later studies drew this conclusion into serious question. On many of S

the same questions on which Americans displayed a risky shift, Taiwanese |

subjects showed a "cautious shift." Deliberation led citizens of Taiwan to a

become significantly less risk-inclined than they were before they started to f

talk (Hong 1978). Among American subjects, deliberation sometimes pro- V

duced a cautious shift as well, as risk-averse people became more averse to g. certain risks after they talked with one another (Moscovici & Zavalloni 1969). | The principal examples of cautious shifts involved the decision whether to sf marry and the decision whether to board a plane despite severe abdomi- 2 nal pain, possibly requiring medical attention. In these cases, the members 5 of deliberating groups shifted not toward risk but toward greater caution (Moscovici & Zavalloni 1969).

A straightforward interpretation was able to reconcile these competing findings: the pre-deliberation median is the best predictor of the direction of the shift (id.; Brown 1986, 210-212). When group members are disposed toward risk-taking, a risky shift is observed. Where members are disposed toward caution, a cautious shift is observed. Thus, for example, the strik-

ing difference between American and Taiwanese subjects is a product of a difference in the pre-deliberation medians of the different groups on the relevant questions (Hong 1978). Thus the risky shift and the cautious shift are both subsumed under the rubric of group polarization.

In the behavioral laboratory, group polarization has been shown in a re- 16 markably wide range of contexts (Brown 1986; Turner et al. 1987, 142-170). Group deliberation produces more extreme judgments about the attractiveness of people shown in slides (Turner et al. 1987, 153). It also occurs o

for obscure factual questions, such as how far Sodom (on the Dead Sea) is C

below sea level (Turner et al. 1987). In a revealing finding at the intersection d

of cognitive and social psychology, groups have been found to make more, rC

rather than fewer, conjunction errors (believing that A and B are more like- g

ly to be true than A alone) than individuals when individual error rates are j

high—though fewer when individual error rates are low (Kerr, MacCoun c

& Kramer 1996). Even burglars show a shift in the cautious direction when d

they discuss prospective criminal endeavors (Cromwell et al. 1991). |

There is pervasive evidence of group polarization on issues that bear 17.

directly on politics and political behavior. With respect to affirmative ac- /

tion, civil unions for same-sex couples, and climate change, liberals become S

significantly more liberal as a result of discussion, while conservatives be- CI

come significantly more conservative (Schkade, Sunstein & Hastie 2007). a?

One experiment, conducted in Colorado, found that internal discussions 3

among liberals in Boulder produce a strong shift to the left, resulting in v

both less internal diversity and more extremism; conservatives in Colorado g.

Springs show a similar shift to the right (Schkade, Sunstein & Hastie 2007). |

In the same vein, white people who are not inclined to show racial prejudice 1

show less prejudice after deliberation than before; but white people who 2

are inclined to show such prejudice show more prejudice after deliberation 5 (Myers & Bishop 1970). After deliberation, French people become more distrustful of the United States and its intentions with respect to foreign aid (Brown 1986, 224). Similarly, feminism becomes more attractive to women after internal discussions, at least if the relevant women are antecedently inclined to favor feminism (Myers 1975).4

4 There is a parallel literature that follows Lord, Ross & Lepper (1979) and that shows that initial views become more extreme after reading the same research. This finding — that people

In the domain of law, there is considerable evidence of group polarization as well. Group polarization occurs for judgments of guilt and sentencing in criminal cases (Myers & Kaplan 1976; Kaplan 1977). In punitive damage cases, deliberating juries have been found to polarize, producing awards that are typically higher than those of the median juror before deliberation begins (Schkade, Sunstein & Kahneman 2000). When individual jurors begin with a high degree of moral outrage about a defendant's conduct, juries

become more outraged, after deliberation, than their median member had o

been; but when jurors begin with little outrage about a defendant's conduct, O

juries become less outraged, after deliberation, than their median juror had d

been. Dollar awards are often as high as or even higher than the highest <f

award favored, before deliberation, by any individual juror (Schkade, Sun- t

stein & Kahneman 2000). j

With respect to purely legal questions, panels of appellate judges polar- c

ize too. In ideologically contested areas (involving, for example, disability d

discrimination, sex discrimination, affirmative action, environmental pro- c

tection, and campaign finance regulation), Republican appointees show .

especially conservative voting patterns when sitting on panels consisting /

entirely of Republican appointees, and Democratic appointees show es- S

pecially liberal voting patterns when sitting solely with other Democratic I

appointees (Sunstein et al. 2006; Sunstein, Schkade & Ellman 2004). This a

pattern has proved highly robust; it has been found in over twenty areas of f

substantive law. As a result, panels consisting of three Republican appoin- V

tees show radically different voting patterns from those of three Democratic appointees, in a way that seems to ensure that political judgments | compromise ideals associated with the rule of law (Sunstein, Schkade & 1 Ellman 2004). In short, federal judges show the same polarization effects as t

do liberals and conservatives in Colorado. 1


20 Why does group polarization occur? In this section, we show that the preced-

with different beliefs disagree more after being exposed to the same information — is often said to involve "biased assimilation" of new material; it is harder to reconcile with standard Bayesian learning than the finding that group discussion increases polarization.

ing facts about group polarization are not only compatible with Bayesian inference, but are primary predictions of the standard Bayesian model of social learning. We assume that individuals are trying to form an assessment of an unknown parameter, D, which might reflect the damages in a civil trial, the proper resolution of a constitutional dispute, the right response to climate change, or the dishonesty of a political candidate. For some readers, the exposition will seem somewhat technical, but the intuition should not be obscure:

If group members are listening to and learning from one another, their dis- o

cussions will produce greater confidence. As confidence increases, convictions O become firmer and individuals are more comfortable

moving away from the 3

center in the same direction as their predeliberation tendencies. <f

Throughout this paper, we will make use of the normal signal extrac- 211

tion formula and therefore assume that all random variables are normally j

distributed. The true value of D has mean zero (an arbitrary normalization) c

and variance j-. Each individual receives a private signal, denoted Si for in- d

dividual i, which equals D plus a noise term, denoted n;, which is also mean c

zero and has variance p-. Bayes' rule tells us that if an individual had nothing .

but this private signal, that individual's estimate of D would equal p + — . /

Our interest lies in social settings where "I" individuals communicate 2 S

a share of their private signals. In this first setting, we assume that people I

just relay their signals accurately to the group and that all of the signals are a

independent. The signal extraction formula then tells us that after com- "

munication all I individuals will share the same posterior assessment of D, V

which equals

p^S or ipd+p12ini II

po + p Po + IPi (see, e.g., Thomas Sargent 1979). This equals D plus an error term,

p^Zim - PoD Po+p •

One of the most empirically problematic implications of the Bayesian model is that people will end up with the same beliefs, even if they start with different signals. Still the Bayesian model delivers two standard facts about the heterogeneity of posterior beliefs:

Claim #1: As the number of people that communicate independent signals increases, the posterior will become more accurate, in the sense that the variance of the error will fall, and the variance of posterior beliefs will rise.

The variance of error term equals p + which is obviously declining with I. The variance of the posterior equals p (p + Ip ), which is increasing with I. The ex post evaluation equals Ip°+Ipi times the average evaluation ex ante, which is obviously greater than one. Group assessments are a simple multiple of average individual assessments; it follows that just as in the experiments discussed above, we should expect to see groups head towards extremes in the direction suggested by individual opinions.

Moreover, it is quite easy to think of examples, especially when I is small, where the ex post opinion of the group is more extreme than even the most extreme antecedent opinion among the group's members. Assume that there were three people with private signals of 0, 1, and 2 and assume that P0 = 4 and P1 = 1. Before the information exchange, the most extreme individual assessment of D was 0.4, which equals two times — or 0.2. After 3

4 po+ Pi el

the information exchange, everyone believes that the expected value of D is r

0.429, which equals one (the average signal) times — or ——. The group is a

po + Ipi 7 S

not only more extreme than the average individual opinion but also more m

extreme than the most extreme individual opinion. 3

These findings rationalize the results of David Schkade, Cass Sunstein |

& Reid Hastie (2007), and show that information exchange should lead to n

more extremism and less internal diversity, even when people are perfectly |

rational. Recall that when residents of Colorado Springs, a relatively con- 4

servative place, are brought together for group discussion, their opinions J become more conservative on climate change policy, affirmative action, and 4 civil unions for same-sex couples. When residents of Boulder, a relatively 0 liberal place, come together, their opinions become more liberal on all three 5 issues. In the context of the model, this experiment is akin to selecting a group that has positive signals and a group that has negative signals. If the two groups believe that they have been randomly formed, and not selected on the basis of their signals, then both groups will become more extreme. This type of extremism from the exchange of information and opinions is exactly what a rational model with group learning would predict.

The standard explanations of group polarization show an intuitive

appreciation of some of the more formal analysis here. One of the most prominent explanations refers to "informational influences" (Brown 1986). This account is essentially Bayesian. The central idea is that group members will be aware of some, but not all, of the arguments that support the antecedent tendency within the group. As information is pooled, learning occurs, and the learning will predictably tend to intensify the antecedent tendency. It is evident that polarization often occurs as a result of such learning.

Other explanations of group polarization, invoking the effects of confi- 28 o

dence and corroboration on people's preexisting views, are also quite Bayes- O

ian. If rational people lack confidence, they will tend toward the middle d

and hence avoid the extremes. As people gain confidence after hearing their <f

views corroborated by others, they usually become more extreme in their g

beliefs (Baron et al. 1996, 557-559). This is exactly what the Bayesian model j

suggests. In a wide variety of experimental contexts, people's opinions have c

been shown to become more extreme simply because their views have been d

corroborated, and because they have become more confident after learning c

that others share their views (Baron et al. 1996). The Bayesian model pre- .

dicts this process. /

If group polarization merely reflects standard Bayesian inference, then 29 S

it is hard to think that there is anything about polarization that challenges I

the conventional view that groups make better decisions than individuals. a

If people become firmer in their conviction that cigarettes cause cancer, "

that it is risky to drink and drive, or that the earth rotates around the sun, v

nothing is amiss. In the next section, we suggest that there are facts that are g

less compatible with rational Bayesian inference and more compatible with |

less perfect social learning. We will then suggest that a more realistic model 1

of social learning presents a somewhat different and far less positive view 2

about the accuracy of group decision-making. In this way, we can obtain a 5 better understanding of the occasional folly of crowds.

4. cREDuLous BAYEsiANs

The most obvious problem with the Bayesian framework is that group polarization is found even when people learn essentially nothing. To be sure, people must learn the bare fact that other people hold certain positions,

and one can learn something from that fact, so long as others are not unreliable. But a significant shift in the domains of law and politics — toward, for example, enthusiasm for affirmative action policies or an international agreement to control greenhouse gases — should not be expected when people have discussions with others who lack independent information.

Return once more to the Colorado experiment. Both liberals and conservatives regularly polarized on questions involving climate change, affirmative action, and civil unions for same-sex couples, even though much of the o time the discussion produced little or no new information. The same is true O in the standard risky-shift experiments: People become

more risk-inclined ft

simply after learning that others are risk-inclined, without learning much, <f

if anything, about why it makes sense to take risks. Indeed, group polariza- t tion occurs when people are merely exposed to the conclusions of others, j rather than to the reasons for those conclusions (Brown 1986). c

In many of the discussions in Colorado Springs and Boulder, the individu- d als who talked to each other were not bringing new data to the table. The | residents of both of these places had been exposed to the same basic influ- .

ences throughout most of their lives. For at least two of these issues (affirma- / tive action and same-sex unions), there has been a huge amount of public S discourse, and the discussions tended to add essentially nothing. For one of I them (climate change), the issues are technical and complex, and few par- 3 ticipants were actually able to provide new information. True, we have em- f

phasized that people learn from others on technical and nontechnical issues v

alike; most people begin with little memory of relevant facts and something is conveyed by the mere opinion of another human being who seems reli- CI able. Bare conclusions can themselves be informative, especially but not only 1 if they come from experts. But it seems heroic to suggest that the dramatic t

movements we observe are solely or even mostly a product of new knowl- 5 edge. So while the move to extremes is exactly what one might have predicted with a Bayesian model of social learning, the actual situation suggests a somewhat different scenario: people are treating each other's opinions as offering new information even when there actually is nothing new.

Similar problems occur with models of group polarization that do not emphasize acquisition of information through discussion. According to the "social comparison" account, offered as an alternative to accounts that

o' 34 u

emphasize information, people want to maintain both their preferred self-conception and their preferred self-presentation (Brown 1986). Suppose, for example, that group members believe that they are somewhat more likely to take investment risks than other people, or that they are somewhat more critical of the incumbent president than other people. If such people find themselves in a group of people who inclined to take investment risks, or to criticize the incumbent president, they might shift, not because they have learned anything, but because they want to see themselves, and present themselves, in the preferred way. The social comparison account makes a place for conformism, taken up in detail below. What that account misses is that in many settings, people shift in the belief that others are saying what they think or know, even though they are actually motivated by social motivations. Sometimes people do move because they desire to conform, but sometimes they move because they believe, falsely, that the views of others provide valuable information.

In the remainder of the article, we explore an alternative explanation of group polarization: the hypothesis that individuals are Credulous Bayes- . ians, who treat proffered opinions as having significant information value, / even when those opinions are biased or based on no new information. Of S course most heuristics work reasonably well (Tversky & Kahneman 1971; I Gigerenzer 2007), and there is nothing inherently unreasonable about a Credulous Bayesianism, which is essentially a Bayesian heuristic. Most of 3

the opinions that we hear over the course of a day are given to us by well- v

meaning people who are in fact sharing new information, on topics like the

quality of a restaurant or whether or not it is raining. It is quite sensible to | take those opinions seriously. It is even more sensible for children to be at- 1 tuned to take the advice given to them by their parents, on, say, not playing 3

with knives or not eating poisonous food. In fact, the human ability to learn 5 from one another is a cornerstone of our species and its civilizations.

With this background, we can see that Credulous Bayesianism is essen- 35 tially a failure to fine-tune learning for each individual setting, and a failure to correct for the motives of others. We may, on average, assign the right amount of weight to the opinions of friends or strangers, but in some settings we should be more skeptical, but do not bother to think through the appropriate degree of skepticism. A sensible model of costly cognition

could easily explain such a tendency, especially in settings where getting it right has little private value. It may well be that the tendency to be suspicious imposes costs beyond mere thinking, because suspicion may also be linked to uncomfortable emotions and an inability to enjoy a pleasant conversation. We suspect that going through life being intensely suspicious of every new statement would impose enormous cognitive and emotional costs. Credulous Bayesians are able to avoid such costs.

Credulous Bayesianism is linked to the cognitive hierarchy model of Co- o lin Camerer, Teck-Hua Ho & Kuan Chong (2004). Camerer, Ho & Chong O show that behavior in many games can be rationalized with a model in dd

which only a subset of the population is able to think about the motives and <f

implied actions of those around them. Credulous Bayesianism is quite sim- g

ilar in that when people use it, they are essentially failing to think through j

why opinions that they hear may not be unbiased or informative. There is X

also a link between Credulous Bayesianism and the Cursed Equilibrium d

concept of Erik Eyster and Matthew Rabin (2005). In a Cursed Equilib- |

rium, individuals make a naïve inference based on people's actions; Credu- .

lous Bayesians make a naïve inference based on people's statements. /

We believe that Credulous Bayesianism makes more sense of the Colo- S rado experiment than perfect Bayesian inference. Little new information I was proffered in this setting, so it is hard to see how Bayesian learning oc- a? curred, but opinions were given. If Credulous Bayesianism makes us prone f

to put weight on the opinions of others (even when they are based on the v

same sources as our own views), then we would expect to see something g that looks like learning, in that opinions become more extreme and more CI tightly held, even though little or no real information is being exchanged. 1

In fact, an abundance of psychological evidence suggests that people do t

use a "follow the majority" heuristic that puts great weight on the opinions 5 of others (Gigerenzer 2007). Psychologists have often shown that people follow the views of others even when those others are palpably wrong. In the most famous experiments, Solomon Asch explored whether people would be willing to overlook the apparently unambiguous evidence of their own senses (Asch 1955). In these experiments, the subject was placed into a group of seven to nine people who seemed to be other subjects in the experiment but who were actually Asch's confederates. The simple task was

to "match" a particular line, shown on a large white card, to the one of three "comparison lines" that was identical to it in length. Asch's striking finding was that when confronted with the obviously wrong but unanimously held views of others, most people end up yielding to the group at least once in a series of trials. Indeed, in a series of twelve questions, no less than 70 percent of people went along with the group, and defied the evidence of their own senses, at least once (Asch 1955).

It might seem jarring to think that people would yield to a unanimous 39 o group when the question involves a moral, political, or legal issue on which O they have great confidence. But additional experiments, growing out of dd

Asch's basic method, find huge conformity effects for many judgments <f

about morality, politics, and law (see R.S. Crutchfield 1955). Such effects t

were demonstrated for issues involving civil liberties, ethics, and crime j

and punishment. Consider the following statement: "Free speech being a o

privilege rather than a right, it is proper for a society to suspend free speech d

when it feels threatened." Asked this question individually, only 19 percent c

of the control group agreed. But confronted with the shared opinion of .

four others, 58 percent of people agreed (Krech, Crutchfield & Ballachey /

1962, 509). i

Non-experimental evidence also supports the view that people put 401

weight on opinions that they hear, even when those opinions are biased a

by incentives or based on very limited information. Billions of dollars are 3

spent on advertising, only some of which is informative, and much of that v

advertising contains obviously biased statements that are made about a

product's qualities. If these statements were not effective, it would be hard g to believe that firms would spend so much on them. There is also more g direct evidence on the impact of advertising. People generally say that they 3

think they prefer the taste of Coke to Pepsi, and their buying habits suggest 5 that they believe that to be true, yet in blind taste tests people generally prefer the taste of Pepsi to Coke (Shapiro 2006).

Other settings also seem to suggest that people are Credulous Bayesians. 41 There is a paucity of hard information about many religious topics, including the afterlife, the nature of any deities that may exist, and the lives or even the existence of many historical religious figures. Rational Bayesians should have very weak beliefs about things on which so little tangible evidence ex-

ists. Yet individuals are strongly committed to their religious beliefs and are often willing to die or kill for them. These beliefs are greatly influenced by statements of parents or religious leaders, and both of these groups often have little direct knowledge of core religious beliefs, but strong incentives to shape those beliefs in a particular direction. A more perfect Bayesian would presumably treat these opinions with a great deal of skepticism, especially when that Bayesian learns that millions of others hold very different beliefs.

The strong beliefs about religion that are transferred from person to person o

look to us more like Credulous Bayesianism. C

In political settings, there is also evidence that people accept statements d

that they hear, with relatively little critical appraisal. Matthew Getzkow <f

and Jesse Shapiro (2004), for example, document the remarkable range of t

beliefs held in the Islamic world about Israel and the United States. Most j

striking among those is the view that Mossad was responsible for the 9/11 c

terrorist attacks on America. Within the United States as well, there is abun- c

dant evidence that many people have acquired incorrect beliefs, such as a c

vastly overinflated view of the percentage of the national budget that the .

U.S. government spends on foreign aid or the connection between Saddam /

Hussein and the 9/11 attacks.5 Again, the heterogeneity in these views seems S

incompatible with any kind of perfect Bayesianism, but it is much easier to I

reconcile with individuals who are Credulous Bayesians. a


43 We now turn to our theoretical exploration of Credulous Bayesianism. To reflect the core rationality of social learning, we begin with the Bayesian learning model described in Section 3, where individuals are trying to infer the truth based on a combination of their private information and the information sent by others. The critical deviation from standard Bayesian learning models is that we assume that people do not accurately assess the degree to which social signals actually represent new information. We are essen-

5 An alternative interpretation of Americans' routine overestimation of the percentage of spending on foreign aid is that people routinely overestimate the percentage of spending on small budget items.

tially following Peter DeMarzo, Dimitri Vayanos and Jeffrey Zwiebel (2003) by assuming that individuals have trouble correcting the degree to which they should down-weight social signals in any given setting.

In this first model, we follow DeMarzo, Vayanos and Zwiebel (2003) 44 closely and consider the case where individual opinions share a strong common component, but the degree of common noise is underestimated by the participants in the model. As in Section 3, we think of a group of I

individuals sharing information and forming a posterior assessment of the o

value of D. This setting could capture the deliberations of a judicial panel O or any group of people that exchanges ideas and then makes

a decision. ^

As in Section 3, we assume that individuals receive a private signal equal to 45 <f

D plus a noise term denoted n However, unlike in Section 3, we now assume g

that the noise term n is the sum of a common noise term 0 and an individual —

specific term e. The total variance of the noise term remains p. The variance c

of 0 is p and the variance of e j is . This change does not alter the Bayesian d

l l p S —'

formula for pre-deliberation opinions, which remains Po+Pi. |

If individuals shared their signals and used the co rrect Bayesian updat- 46 .

ing formula, their posterior view would equal /

_P1^ isi__f

Po(1 + (I - 1)v) + IPi. F

This formula puts less weight on the signals of others as the share of the noise that is common increases. Since the share of common noise makes each new signal less informative, people appropriately put less weight on each new signal when much noise is common.

Throughout the remainder of this paper, we assume one common core deviation from perfect Bayesianism: individuals underestimate the degree to which information that they hear from others is biased or based on common sources. In this context, we model this underestimation by assuming the people underestimate the extent to which the noise terms are common. Specifically, every one of the agents in the model believes that the variance of 0 is -pV and the variance of ej is 1 —Xv . The parameter X measures the extent to which individuals are Credulous Bayesians and underestimate the amount of common noise. The parameter ranges from one, which would mean perfect rationality, to zero, which would mean a complete failure to

assume any common error terms. We also assume that individuals update as if they knew this parameter perfectly, so that the new formula for posterior beliefs is

P&Si Pl(Ziei + I(D + 0))

Po(1 + (I - 1)Xv) + p p0(1 + (I - 1)Xv) + p

We first discuss a simple example where people are completely credulous, that is, X is zero. The average belief of I people who have not shared o their information will equal jPf'fl ). The average belief after signals are O shared will equal d

(I - 1)(/ - Xv)Po O

1 +--. 3

Po(1 + (I - 1)Xv) + IP1 Si

Whatever initial opinions that existed in the group will get more extreme as the l

group size expands. If D = 0 then groups that initially erred in thinking that D is x

positive will become more extreme in their belief and groups that initially erred d

in thinking that D is negative will get more extreme in that belief. §

The experiments in which conservatives and liberals are grouped to- .

gether can be seen as an example of that phenomenon. In this case, the / experimenter has structured the two groups so that one group's signals

are s m

positive and the other group's signals are negative. The model predicts that o

both groups will get more extreme. The liberals will get more liberal and a

the conservatives will get more conservative. Of course, to use this model at U all, we must assume that individuals are unaware that their group has been

chosen on the basis of its signals. S

This result holds for sophisticated Bayesians as well. However, Credulous 3

Bayesianism increases the tendency of priors to become extreme. In either §

case, if an individual is thrown in with a random group, the expected impact 22

of that group on the individual's bias is zero. However, if the group starts 5 with an average bias that is positive or negative, then communication across that group will make this bias more extreme. We are particularly interested in the impact of Credulous Bayesianism on three different variance terms associated with ex post beliefs. First, we are interested in the actual variance of ex post beliefs, which equals

IPj(Po(1 + (I - 1)v) + IP1) Po(Po(1 + (I - 1)Xv) + IPi)2.

Higher levels of variance of posterior beliefs mean more polarization of beliefs. Second, we are interested in the variance of the true error associated with those beliefs or

(1 - X)(I - 1)vP1I 1 + (I - 1)Xv

(Po(1 + (I - 1)Xv) + IP1)2 Po(1 + (I - 1)Xv) + IP1

The variance of this error term captures the extent to which people are

incorrect in the posterior assessments. Finally, we are interested in the vari- o

ance of the error term that people believe to be associated with their pos- 0

terior beliefs, or d

1 + (I - 1)Xv 0

Po(1 + (I - 1)Xv) + IP1

This error term reflects the degree of certainty that people have ex post. j

The following proposition gives the impact of Credulous Bayesianism 51 c

on variance in posterior beliefs, on the variance of the error associated with d

those beliefs, and on the perceived variance of the error associated with |

those beliefs: I

Proposition 1: t

As the value of X falls, the variance of ex post beliefs increases, the variance 3

of the error in those beliefs also rises, and the degree to which people un- F

derestimate their true level of error rises as well. As the share of the noise I

term that is common increases, the difference between the actual precision n

of the posterior and the perceived precision of the posterior will rise as long as P0 + IP1 > (I- 1)XvP0.

Proposition l shows that Credulous Bayesianism causes the heterogeneity 5; of posterior beliefs to rise, which means that group polarization will be more pronounced among Credulous Bayesians. Belief in the independence of information makes it more likely that groups will converge on a more extreme belief, essentially because they believe that they have better information than they do. In fact, the evidence is consistent with this finding (Brown 1986; Schkade, Sunstein & Kahneman 2000).

While the extremism associated with standard Bayesian learning accom- 53 panies an increase in accuracy, Credulous Bayesianism increases extremism while decreasing accuracy. The fact, stated in Proposition l, that Credulous Bayesianism causes the variance of the error term to increase means that on

average posterior beliefs will be less accurate. This is not surprising; excessive credulity is, after all, going to produce errors. It is perhaps somewhat more surprising that the perceived variance of the posterior error falls with Credulous Bayesianism. This means that Credulous Bayesianism can explain the phenomenon of people holding onto erroneous opinions with a great deal of intensity.

The final sentence in the proposition shows that the degree of erroneous confidence may rise as the noise terms become more highly correlated across o people. This effect is not automatic because increasing the amount of common O noise creates two opposing effects. First, the increase in common noise does d cause individuals to put less weight on social signals. Second, as the amount of <f common noise rises, the tendency to think that each person has independent g information becomes costlier. If X is sufficiently low, so that the bias is suffi- j ciently severe, then the second effect must dominate, and more common noise c means more false confidence. This fact suggests that Credulous Bayesians will d make particularly bad decisions when there is common noise. |

Proposition 1 tells us that decision-making gets worse as people mistaken- . ly fail to correct for the common sources of information, but it does not show that group decision-making can actually be worse than individuals acting S alone. Proposition 2 gives conditions in which the level of error, as measured | by the variance of the error of the posterior, can rise with group size: a?

Proposition 2: n

The variance of the error of the posterior estimate will always decline with I if r

P + P v t.

X is sufficiently close to one. If I >-5—j--r- and X is sufficiently close 3

7 2vP0 - p/l - vPj) 7 o

to zero, then the variance of the error of posterior beliefs will rise with I. ^

If people are sufficiently prone to Credulous Bayesianism, that is, if X is suf- 2 ficiently close to zero, and if common noise is sufficiently important, then 5 group size decreases accuracy. These conditions do not imply that the folly of groups is a general condition. In many cases these conditions will fail, and groups will be wiser than individuals, but the possibility that the conditions of Proposition 2 will sometimes be met should warn us that groups will sometimes make worse decisions than individuals. And in fact, a great deal of evidence suggests that groups often perform worse than their best member or even their median member (Gigone & Hastie 1995).

We acknowledge a glaring failure of the Credulous Bayesian model as

it now stands, which is that it predicts that when two disparate groups are brought together, their views will converge. Certainly, such convergence can occur, but it is not the general rule. In many cases, members of a deliberating group will shift toward a more extreme direction in line with the predelibera-tion median, even if those members come from two disparate groups (Sun-stein 2003). At least if people's views are not entrenched, a group consisting of three people who tend to think that climate change is a serious problem, and two who tend to reject that view, will often shift toward more concern about climate change (a prediction consistent with group polarization). But there also are abundant examples of cases in which group members stick to their extreme opinions when they connect with each other (Brown 1995; Sunstein 2003). In other words, a group that consists of equally opposed subgroups might well show neither convergence nor polarization but simply entrenchment of members' antecedently held views.

For example, connections between different religions rarely leads to a merging of religious beliefs. And if (certain) Palestinians meet with (certain Israelis), convergence is not expected. Entrenchment and continuing con- .

flict are at least as likely. We conjecture that the model could generate per- / manent disagreement if people put a high weight on the views of "insiders" S but believe that the opinions of "outsiders" are essentially worthless. If there I is sufficiently little trust across groups, then it may be impossible to com- 3 municate information effectively. In that case, insiders may share views and C

extremes may persist even after communication (Brown 1986). If insiders v

in each subgroup listen to one another, and discount the views of those in the other group, then members of each group will polarize, producing an CI even greater split between the two. 1

5.1. Biased Samples 5

To illustrate the impact of biased samples of correspondents, we now assume 58 that individual signals include a common and an idiosyncratic component. This represents a special case of the correlated errors discussed above, but it is a case that makes it particularly easy to discuss sample selection. The correlation of error terms within a group would occur if the group were not a random sample of the world, but rather a specific subset with highly similar backgrounds or biases. If people sorted into communities or conversed with people who had similar views, then we should certainly expect to see

correlation in the error terms of people who regularly communicate with one another. The intuition here is straightforward: People who find themselves in a certain group often do not give sufficient thought to the possibility that the group is unrepresentative in a way that ought to matter for purposes of social learning. People in left-leaning or right-leaning groups listen to fellow group members, without discounting what they learn in light of the dispositions of those members.

Perfect Bayesians should strongly discount the opinions of their highly o selected acquaintances. Even if they are fairly committed to their group O identity, rational people should know that there is a skew

in the informa- d

tion held within group members (Democrats, Republicans, union mem- <f

bers, Americans, and so forth). Credulous Bayesians, however, imperfectly t correct for the unrepresentative nature of the opinions of their friends and j acquaintances. They assume that their friends and acquaintances are repre- c sentative of the world, not a very unusual set of informants. d

To address this possibility, we assume that individuals continue to receive c

a private signal equal to D plus a noise term denoted n;. The variances of .

these two terms are the same as before, but now we assume that the covari- / ance of any two noise terms in a person's group is With this assumption S


Po(1 + (I -+ IP{

As the degree of correlation rises, individuals should put less weight on the opinions of their neighbors.

In this instance, Credulous Bayesians underestimate the degree to which their neighbors' error terms are correlated with their own, which is equivalent to assuming that one's neighbors are more representative of the overall population than they actually are. To capture this, we assume that people believe that the true covariance of neighbor's signals is Credulous Bayesians use an updating formula of

Po(1 + (I - 1)X^) + IP 1

which puts too much weight on the signals received by a biased sample of neighbors.

and assuming that people know this covariance term, the correct Bayesian 0 inference formula is c

The impact of Credulous Bayesianism when there are correlated noise terms is almost identical to the impact of Credulous Bayesianism when there is a common noise term.

Proposition 3:

(a) As the value of X falls, the variance of ex post beliefs increases and the variance of the error in those beliefs also rises.

(b) The variance of the posteriors will always rise with I.

(c) The variance of the error of the posterior estimate will decline with I as long as X is close enough to one. If

(1 + P)Pq

1 > 2^PQ - (1 - pp'

then the variance of the error of the posterior estimate will rise with I as long as X is close enough to zero.

Just as before, individuals who fail to understand the degree to which their group is not representative will tend to have more extreme and more erroneous posterior beliefs. Credulous Bayesians have excessive faith in the o views of their neighbors and so underestimate the extent to which their a

neighbors make mistakes that are similar to their own. The folly of crowds i

is a likely result. §

The third part of the proposition shows that when individuals are close 64§

to being perfect Bayesians, then a larger group will always lead to views U that are more accurate, as well as more extreme. However, if individuals are r close to being completely naive Bayesians, who don't think that there is any § correlation in their signals, then as long as U

1, X 1 P1 ,

(2 - —) ——--> —, 2

i 1 - p 7(1 - p) Pq 55

accuracy will decline with group size. These are sufficient conditions for crowds to be more foolish than individuals. The intuition behind the condition is that this somewhat perverse result is more likely when signals are highly correlated, when group size is already large, and when the signal to noise ratio in individual signals is quite low.

The next proposition provides our first result on the benefits of diversity, 65 which in this case means a reduction in the degree of correlation across signals.

Proposition 4:

(a) As the degree of correlation in the signal noise rises, the variance of the posterior belief will decline if and only if

IP, + IPo

X >-;-;-;-.

2IPi + (2 + ^(I -1)P0

(b) The variance of the error in the posterior belief will rise with ^ if and only if

Ipi + p0

^(I - 1)Po

> X(1 - 2 X).

The first part of this proposition shows that increasing correlation of signals can either increase or decrease ex post extremism. Part (a) implies that when X is greater than one half, then increasing correlation of signals will cause posterior beliefs to become less extreme. There is nothing surprising in this fact. Lower correlation of noise means that there is more information in the signals, and more information should cause beliefs to become more extreme.

The opposite result occurs when X is sufficiently less than one half. In

that case, more correlation actually leads to more ex post extremism. High

correlation of noise means that the average signal will get more extreme,

and since people are putting too much weight on that average signal, their

views get more extreme as well. g-

The second part of the proposition gives a condition under which a re- F duction in correlation in the noise, which can be seen as more diversity, § increases the accuracy of ex post beliefs. If X is greater than one half or suf- n ficiently low, then this natural result always holds. More diversity improves accuracy. The opposite result can occur when people reason particularly poorly, but we suspect that this is more of a theoretical curiosity than a case to worry about.

5.2. intellectual Diversity

We now continue with our investigation of intellectual diversity and return to the framework with common noise terms. This simple framework is meant to help evaluate the benefit of intellectual diversity in groups, such as panels of judges, legislatures, or students in a classroom. Our specific interest is to ask whether diversity is more or less valuable as people become more rational.

We model diversity by assuming that different groups have independent draws of the common noise term 0. While such diversity always improves

information quality, the value of diversity can be significantly higher when people are Credulous Bayesians. We assume that a fraction a of the I people who are sharing information come from a group that has one draw of 0 while the rest of the group has received a second independent common signal. The true Bayesian posterior is then:

Pi&i<ai(1 - v + vI(1 - a))Si + 2>aj (1 - v + vIa)Sj) ( } ((1 - v)2 + (1 - v)vI + a(1 - a)v2I2)P0 + ((1 - v)I + 2a(1 - a)vI2)P1 '

Credulous Bayesians recognize that people from the two groups have different common noise terms, but they continue to underestimate the true amount of common error by a factor X and this produces the modified formula:

Px(2i<aI(1 - Xv + XvI(1 - a))Si + 2;>aI (1 - Xv + XvIa)S,)

((1 - Xv)2 + (1 - Xv)XvI + a(1 - a)X2v2i2)P0 + ((1 - Xv)7+2 a(1 - a)Xv/2)P1

This formula nests both the perfect Bayesian and a learner who completely ignores the common error components in people's signals. This ultranaive Bayesian would use the formula for learning with independent signals, . As long as X > 0, then Credulous Bayesians, like perfect Bayesians,

Po + IP.

put more weight on the views of the members of the minority group since

that group provides more information as its members had a different com- g

mon shock. In Section 9, we will turn to a case in which individuals put less §

weight on the opinions of outsiders. §

The next proposition discusses the impact of increasing intellectual di- 71 o

versity by changing the population shares of the two groups: S

Proposition 5: o

(a)The variance of prediction error is declining with a if and only if a < 0.5 for °f

both perfect Bayesians and naïve Bayesians. More generally, the variance of §

prediction error is declining with a when a is small enough and the other 5

parameters satisfy

0 > IP1(-2 + Xv(4 - 2Xv + I(-2 + X + Xv)))+

P0(-2 + Xv(2 + X(-3I(1 + v) + 2(2 + v)-(I - 1) v(-8 + I + ( I - 2) v)X + 2( I - 2) (I - 1) v2X2))).

(b)An increase in a from 0 to 0.5 will cause a decline in the variance of the error of posterior beliefs for naïve Bayesians that is larger than the decline in the variance of the error for perfect Bayesians.

Part (a) of the Proposition tells us that when a is small, an increase in the amount of diversity increases the accuracy of posterior beliefs when people are either perfect Bayesians or naïve Bayesians. A greater mix of people makes it easier to factor out the common noise for either of these two extreme groups. The condition given in the proposition is sufficient to ensure that diversity will be good in more intermediate cases, and it seems likely to hold for most reasonable parameter values.

The second part of the proposition tells us that the advantages associated o with mixing will be larger when people are Credulous Bayesians than when O people are perfect Bayesians. When there is a lot of common noise, the naïve dd Bayesians suffer both because of that noise and because they misattribute <f that noise to underlying truth. As there is more mixing, the common noise t gets averaged out, and there is both more accuracy and less misattribution j among the naïve Bayesians. This result suggests that intellectual diversity is X particularly valuable when people incorrectly underestimate the common d source of signals. Of course the benefits of diversity would fall if members of | one group did not trust the statements of members of another group; in that .

event, those statements might be unhelpful or even counterproductive. 3


73 At this point, we drop the assumption that individuals perfectly report their own signals and instead assume that individuals exhibit a tendency towards conformism when reporting their results. We also assume that all errors are idiosyncratic. This tendency towards conformism might come about for signaling reasons, as in Stephen Morris (2001), or out of a taste for conformity, as in B. Douglas Bernheim (1994). We assume that people care both about reporting the truth and about saying something that conforms to the norm in their group. The problem arises when people do not sufficiently discount people's statements, treating those statements as informative when in fact they reflect only the pressure to conform. In a political organization, for example, group members may disregard the possibility that disparagement of some environmental concern is driven not by knowledge, but by a perception that disparagement of environmental concerns is popular within the relevant group.

74 It is true of course that people can introspect and realize that they are

themselves conformists; they might generalize from their own self-knowledge to discount the behavior of others as conformists too. But evidence of human conformity constantly produces real surprise, and in any case human beings are subject to "the fundamental attribution error" (Ross & Nis-bett 1991), which means that they tend to attribute people's behavior to their dispositions, rather than to the situation (while simultaneously attributing their own behavior, at least when it is good, to the situation, rather than their

dispositions). It follows from the fundamental attribution error that people o

are unlikely to see the extent to which other people's behavior is a product of O

the desire to conform. Conformity pressures are responsible for "pluralistic d

ignorance," understood to mean ignorance of the judgments and beliefs of <f

other people. The group norm distorts what people say they believe. t

The norm is known and denoted n. Specifically, we assume that peo- 75 j

ple report a signal meant to minimize the quadratic loss functions: c

y(S; - Sj)2 + (1 - y)(_ - n)2 that sums losses from lying and losses from devi- d

ating from the community norm. Optimizing behavior then generates the |

reporting rule: Sj = ySj + (1 - Y)n, which means that reported signals are an .

average of the truth and the community norm. If people correctly trans- /

form reported signals, by subtracting (1 - Y)n and multiplying by y , then i

use of the standard Bayesian inference formula 0

Pi(Sj + -f )

will yield the most accurate posterior.

However, Credulous Bayesianism may operate in this setting as well, and in this case we assume that it causes people to underestimate the extent to which others have skewed their opinions to conform to the community norm. Instead, Credulous Bayesians assume that other people are using a reporting rule of _ = Xy Sj + (1 - Xy)n, where XE [1, y ], which nests the two extremes of perfectly correcting for conformity and treating everyone else as being a completely honest reporter of their private signal. This changes the formula for the posterior belief to

p (C + -L2 s (i - 1)(1 - XY)n) p1(si + XY2jWsj - Xy )

Po + iPi .

In this case, there is no longer homogeneity within a group, because indi-

Po + IPi i

viduals see their own signals accurately but incorrectly treat the signals of those around them. One interpretation of this assumption is that people suffer from a lack of higher order reasoning by failing to consider the motives of those around them. With this formula, Proposition 6 follows:

Proposition 6:

If n > 0 and X > 1 then posterior beliefs will be biased upwards, and this bias increases with X, n, I and P1 and falls with P0.

If n is randomly distributed across groups, with mean 0 and variance

Var(n), then the variance of posterior beliefs is rising with X if and only if

(I - 1)(X - 1)Var(n) >1 +, - 1 + -¡r- The within-group (i.e. conditioning on D ¡0 p1

and n) variance in posterior beliefs is always declining with X. The variance of the error in posterior beliefs is always rising with X.

Proposition 6 shows how conformism in stating beliefs will affect Credu- d lous Bayesians. A norm of stating a belief of n will cause biased posteriors §

if and only if X > 1, so that people don't sufficiently correct for the fact that .

statements are conforming to a norm. The degree of that bias increases with / the extent that people don't adequately control for conformism in people's i statements (X) and the magnitude of the norm (n). |

Larger groups will have more bias because individuals base their beliefs a more on the opinions of others and less on their own private signal. Credu- U

lous Bayesianism doesn't cause any error in an individual's private signal, V

but it does cause people incorrectly to assess the impact of their neighbor's

views. As group sizes increase, there is a larger range of outside voices for g people to misinterpret. jf

The amount of bias is also increasing with the variance of the true value (D) 2 and declining with the variance of the noise terms. The connection between the 5 bias and these two variance terms occurs because people weight the signals that they hear more highly when the variance in the underlying parameter is higher and the variance in the noise term is lower. As people weight the messages that they hear more heavily, the impact of Credulous Bayesianism naturally increases, because they are generally weighting these signals incorrectly.

Proposition 6 also tells us that the variance of the error in posterior beliefs is always rising with X. When people get worse at correcting for con-

formism, their beliefs become less accurate. Credulous Bayesianism causes an increase in the uniformity of beliefs within a group, as people conform in their statements to the group-wide norm, and those statements then further decrease the heterogeneity of beliefs within that group. (Recall the finding to this effect in the Colorado experiment.)

If the variance of norms across groups is sufficiently high, the combina- 81 tion of Credulous Bayesianism and conformism can actually increase the heterogeneity of posterior beliefs across groups and within the population o as a whole. When the variance of norms is low, then Credulous Bayesians O weight other people's opinions less, since people perceive less of a need to d inflate the perceived difference between the actual statement and the norm. <f When the variance of norms is high, then Credulous Bayesianism makes g these norms extremely powerful, since people believe that they receive in- j formation from their peers, while they are really just hearing statements c that reflect the prevailing norm within their group. When the variance of d norms is high, then Credulous Bayesianism acts mainly to cause people to c

inflate the importance of these norms. In this case, we should expect to see .

increased conformity within groups of Credulous Bayesians and increased /

heterogeneity across groups and across the population as a whole—what S

has been called "second-order diversity" (Gerken 2004). I

The parameter y does not appear in Proposition 6 because it is not con- 82 a

formism that matters but rather the extent to which people don't correct g

for that conformism, which is captured in the X term. The degree of con- V

formism does matter, however, if we assume that individuals completely g.

fail to correct for the tendency to conform, and set X = y. In that case 7 g

matters because it affects X and Corollary 1 follows: 1

Corollary 1: 0

If people are naive Bayesians who completely fail to correct for the tendency of statements to conform to the norm, then as y falls and conformism rises, the positive bias of the posterior increases if n > 0, the variance of the error in posterior beliefs rises and the variance of beliefs within groups declines. The variance of posterior beliefs falls with y if and only if (l - 1)(1 - y) „ , x 1 + Y(I - 1) Y

Var(n) >-P-+ p .

If people are completely naïve Bayesians, then increased conformism in-

creases the degree of error and bias in beliefs. As before, heterogeneity within groups will decline with the amount of conformism. Heterogeneity across people and groups can rise if the variation in the norm across society as a whole is sufficiently high.

These results can also help us understand experimental results showing that if members of the group think that they have a shared identity, and a high degree of solidarity, there will be heightened polarization. When people are placed in groups that emphasize some shared attribute, polarization o is increased (Abrams et al. 1990, 112). A sense of "group belongingness" o — of membership

in a single group with salient shared characteristics — d predictably affects the extent of polarization.6 <f

A revealing experiment, fitting closely with the account we are offering t here, attempted to investigate the effects of group identification on polar- j ization (Spears, Lee & Lee 1990). Some subjects were given instructions in c which group membership was made salient (the "group immersion" con- d dition), whereas others were not (the "individual" condition). For exam- c

ple, subjects in the group immersion condition were told that their group .

consisted solely of first-year psychology students and that they were being / tested as group members rather than as individuals. The relevant issues S involved affirmative action, government subsidies for the theatre, privati- I zation of nationalized industries, and phasing out nuclear power plants. a Polarization generally occurred, but it was greater when group identity was "

emphasized. This experiment shows that polarization is highly likely to oc- v

cur, and to be most extreme, when group membership is made salient.

In the context of the model, increases in k and y (parameters which | capture the degree to which people make conformist statements and ac- sf cept conformist statements as truth), can be interpreted as increases in the 2 strength of group membership. Increasing the sense of solidarity presum- 5 ably makes people more likely to trust one another, that is, to have a high value of k, and possibly less likely to lie to each other, which would cause Y to rise. Higher values of both of these parameters will always increase the conformity within the group. As long as the variance of norms across groups is sufficiently high, increases in these parameters will also cause the

6 In the same vein, physical spacing tends to reduce polarization; a sense of common fate and intragroup similarity tend to increase it, as does the introduction of a rival "outgroup."

polarization of the groups to increase. This can explain the connection between group identity and group polarization.

In sum, conformism increases extremism and error when people are Credulous Bayesians. One way to distinguish between the conformist and rational models is to look at the impact of private and public communication. In the conformism model, public communications would look quite different from private communications, as we expect people to express the community norm. Moreover, individuals would be swayed by these conformist public statements. In a more rational model, individuals would put less weight on public statements than on private statements, when they know that people speak in ways that conform to a group norm.


In this section, we explore the impact of self-interested persuaders on Credulous Bayesians. Our goal here is to show that Credulous Bayesianism can

provide a model that makes sense of persuasive behavior, which is harder .

to rationalize with a perfectly rational model. The core idea of Credulous / Bayesianism, which is that people pay too much attention to the opinions of S others, also predicts that persuasion will matter and that we will see plenty CI of attempts at persuasion, even in circumstances in which fully rational a people would not be much moved. g

This model follows along the lines of Mullainathan, Schwarzstein and Shleifer (2007), who look at persuasion and coarse, categorical thinking. We now assume that statements are motivated not by a simple desire for conformism, but by a desire to influence an outcome, such as sentence length, environmental policy, guilt or innocence, or purchasing patterns. We assume that one or more decision-makers will choose an outcome that is equal to their posterior assessment of D. In this case, we assume that the decision-makers have no independent knowledge of the true value of D, other than its mean of zero and variance of pk All information beyond that comes from the statements of I other individuals who do have private signals with idiosyncratic error terms of variance pK To keep things simple, we return to the assumption of Section 3 that the private signals have independent error terms.

As before, these other individuals have a taste for telling the truth, but because they are attempting to persuade, they also go in a particular direction. We capture these assumptions by assuming that these people maxi-

mize the expectation of pD - (S- - S,-)2 where p differs across the population

and reflects the heterogeneous objectives of different actors and D denotes the chosen outcome. One interpretation of this model is that signals are being produced by two lawyers, one of which has a value of p of one and the other of which has a value of p of minus one. The two lawyers are both trying to persuade the judge and jury of the truth of their view of the case. They are constrained somewhat by the truth, but send signals that are biased towards their side of the case.

We constrain the decision-maker to use a linear updating rule so that posterior beliefs equal pS-S- - K-, where p are endogeneously determined weights that will be discussed later and K- is a constant that may be person specific but that is independent of the reported signal. Given this assumption, the individual's choice of reported signal will satisfy: S = S- + 0'5pp. In this case, the existence of incentives creates an additive error term which als. surrounds the signal. rg/

We assume that the decision-maker does not know the values of -°y~, but S has an opinion about the distribution of this variable, which we denote . n

The variable is normally distributed with variance of pr. If the true mean a

of this variable is zero, optimal signal extraction means that the decision-maker will set his estimate of the average value of K- equal to zero if the individual cannot distinguish the motives of the speakers. If we assume that the decision-maker cannot commit to the weights that will be used ex post to form the posterior and the judgment, then the weights that minimize the variance of the error in the decision-maker's posterior belief will satisfy the equation

1 _ pi p 2p3

P = P + P + ,

and we denote that value p*.

In this case, Credulous Bayesianism will cause the decision-maker to underestimate the true heterogeneity of bias in the population by thinking that the variance of -°yp is p-. This assumption can be thought of as

a naive trust in people's honesty or, again, as a failure to engage in higherorder reasoning that would lead to the view that people slant their statements to achieve an end. In this case, the optimal ex post value of b*(X) satisfies

J37 J^ 2XP3

P ~ P + P + .

Differentiating this condition implies that a higher value of X causes the O decision-maker to be more skeptical about the signals that are reported. We O

then prove in the appendix that: d

proposition 7: 3

(a) As X rises, the variance of the decision-maker's posterior belief and the /

variance of the error associated with that beliefs when X < 1. As X increases,

the decision-maker believes that the variance of the error in his posterior O

beliefs also increases.

(b) If the decision-maker believes that the expectation of 'j* is zero, but in

■ 0'5 p Y

the population this is not actually the case, then the expected error term in the posterior belief will be increasing with the mean of P and with the covariance of p and -1-. The expected error will increase with the mean of — if and only if the mean of p is positive.

Y ' r r

The proposition shows that increases in X, the degree of cynicism about the motives of persuaders, create less heterogeneity in posterior beliefs and more accurate beliefs. This occurs for two reasons. First, greater skepticism means that the decision-maker tends to ignore the attempts at persuasion and sticks with his ex ante beliefs. Second, since the decision-maker is less susceptible to persuasion, the persuaders put less effort into misleading statements. Both effects together mean that the decision-maker's opinions are closer to his more accurate, and less variable, priors. Since persuasion creates both error and variability, less persuasion reduces both variability and error.

As in the case of other errors that come from the Credulous Bayesianism, incorrect beliefs have the effect of making posteriors more erroneous but increasing the confidence with which people hold to those erroneous beliefs. In this case, naïve decision-makers think that they are far more accurate than cynical decision-makers whose beliefs hew more closely to reality.

Part (b) of the proposition discusses the ex post bias of decision-maker beliefs given that the mean of -°yp is not zero. In that case, increases in the mean value of p, the average incentive to persuade the decision-maker, will cause the bias to increase. Increases in the mean value of — which captures the average willingness to lie, will cause the extent of the bias to increase if the mean of p is positive or decrease if the mean of p is negative. The willingness to lie tends to exacerbate the biases that come from an uneven

distribution of incentives. o

Finally, the covariance between the incentives to lie and the willingness O

to lie is also important. When these two things are more likely to go to- d

gether, then the bias in posterior beliefs will increase. These results suggest O

that it isn't the presence of incentives to lie that cause biased decisions. The t

problem comes when those incentives are unevenly distributed or corre- j

lated with the ability to lie. c

So far we have assumed that the decision-maker has no idea about the 1

biases that may afflict particular informants. Now we take the opposite as- |

sumption by assuming that the decision-maker has an assessment of for 1-

each individual equal to X^-. We assume that the decision-maker believes / that there is no error in his assessment of the bias. In this case, the decision- S maker sets

for each one of the speakers. With these formulae, it follows that: Proposition 8:

If X- = X for all i, then the variance of posterior beliefs and the variance of the error in posterior beliefs declines with X if X is less than one. The decision-maker's perceived variance of the error in his posterior declines with X for X between -3 and 1 and increases with X for all other values. The expected level of bias is increasing in Cov(p;, y), Cov(1 - X;, -;), E(p-) and I. It is increasing with ) and decreasing with the average of X if and only if E(p-) is positive.

P1 P1 s

p =-1— and K = (-1—)2X; - Si

P0 + IPi ' yP0 + IPi n

The proposition first makes the point that if the same incorrect level of adjustment for incentives is applied to everyone, we should expect the vari-

ance of the posterior and the error around the posterior to fall as k rises. Both effects occur because higher values of k purge the statements of their biases and those biases create both excess variance and error.

The expected level of bias in the posterior belief, and the decision, is a function of the mean level of (1 - k,)^, — the uncorrected bias terms in the individual statements. This mean level will be increasing in the covariance between the extent that the decision-maker fails to correct fully for bias and the degree of bias. If k, is constant across i, then this covariance term equals o zero. If the decision-maker is more likely to be appropriately cynical towards 100 O only

one side of the debate, then there will be biased decision-making. 3

Increases in the mean value of p, will increase the bias, because this im- O

plies that incentives are stacked more strongly on one side of the debate g

than on the other. If the mean value of p, is positive, then increasing the j

value of E(—), which effectively means decreasing the cost of lying, will c

cause the level of bias to increase. As before, covariance, either positive or d

negative, between incentives and the ability to lie will increase the expected c

level of bias. 101.

Increases in I will cause the bias to go up because as the number of people / increases, the decision-maker puts more weight on the biased views of those S people. This last result assumes that the overall mean level of bias is indepen- I dent of I, which might not be true in practice. This result points again to the a possibility that crowds might be more foolish than individuals. 3

While these results may seem unsurprising in light of the literature on v

persuasion, the model provides a simple framework that illustrates the similarities between persuasion and other forms of group communication. If individuals are insufficiently skeptical towards their friends, then we should not be surprised that they are also insufficiently skeptical towards advertisers. Both phenomena reflect the same tendency to put excessive faith in the stated opinions of others.

We can also use this framework to make sense of an experiment designed to see how group polarization might be dampened (Abrams et al. 1990, 112). The experiment involved the creation of four-person groups. On the basis of pretesting, these groups were known to include equal numbers of persons on two sides of political issues — whether smoking should be banned in public places, whether sex discrimination is a thing of the past, whether censorship

of material for adults infringes on human liberties, and so on. Judgments were registered on a scale running from +4 (strong agreement) to 0 (neutral) to -4 (strong disagreement). In half of the cases (the "uncategorized condition"), subjects were not made aware that the group consisted of equally divided subgroups in pretests. In the other half (the "categorized condition"), subjects were told that they would find a sharp division in their group, which consisted of equally divided subgroups. They were also informed who was in

which group and told that they should sit around the table so that one sub- o

group was on one side facing the other group. C In the uncategorized condition, discussion generally led

to a dramatic d

reduction in the gap between the two sides, thus producing a convergence <f

of opinion toward the middle of the two opposing positions (a mean of t

3.40 scale points, on the scale of +4 to -4). But things were very different in j

the categorized condition. Here the shift toward the median was much less c

pronounced, and frequently there was barely any shift at all (a mean of 1.68 d

scale points). In short, calling attention to group membership made people c

far less likely to shift in directions urged by people from different groups. .

In our model, the distinction between categorized and uncategorized /

groups can be seen as comparing a scenario in which individuals know the S

value of and a scenario in which they do not. We will pare the experi- I

ment down to its essentials by assuming that there are exactly two people in a

each group, each of whom acts as both a decision-maker and a persuader. We "

also must assume that individuals have their own private signals. In both sce- B-

narios, one member of the dyad has a value of equal to 1 and the other has a value of -1. We also assume that the signal of the individual for whom equals 1 is higher than the individual for whom -^y^- equals -1.

In the first scenario, both individuals believe that their partner is randomly drawn from the population both in their signal and in their value of -^y^. They also still believe that the population mean of -^y^ is zero. In the second scenario, the partners know the value of -^y^ for their partner. In both scenarios, people report signals equal to Sj = Sj + 0,5 pp, where the value of p reflects the signal extraction formula used by the listened. Of course, the value of p differs between the two scenarios.

In the first scenario, when -^y^ is not known, and people are Credulous Bayesians, the value of p solves

1 (P0 + 2 Pi)P 2X2p3

Po + Pi (Po + Pi) Pi P^

and we denote that value R*. The weight that the individuals put on their own signal equals~ ^. The posterior belief for individual i (for whom

0 5 P Pq + p

-j-- equals one) will be

Pi(1 - b ) S= +R*S,--R*2.

Po + Pi j

The difference between the two individuals opinions will equal (Si- Sj)(Pi(i - 2R*) - PoR*)

Po + Pi

As p* falls with X, the gap between the two participants will rise as people become more cynical, so that more trust is associated with more homogeneity in the non-characterization treatment.

In the case where people know each other's values of ^y^, then perfect

r 109 S

Bayesians would correct completely for these incentives. Ex post everyone .

would know the right answer and they would agree on their opinions. / Clearly, perfect Bayesian learning cannot explain the observed non-agree- i ment in the categorization version of the experiment.

However, a particular form of Credulous Bayesianism can better match the facts. If individuals completely discounted information from people who are known to have opposing views and who are shading their statements accordingly, then this would ensure a complete failure to reach any sort of consensus. In this setting, Credulous Bayesianism would mean that we discretely categorize people into friends and enemies and completely ignore the statements of enemies.

This type of extreme categorization might enable us to make sense of the sharp differences in beliefs that we observe across groups. The preceding models stressed that these differences could reflect Credulous Bayesianism, which results in a tendency to take neighbors' statements too seriously. However, in those models individuals were unaware of views held outside of the group. In the real world, people are often well aware that there are others who hold differing opinions. In either a standard Bayesian model or a model where people are Credulous Bayesians, those differing opinions should cause a substantial divergence across groups. In such a Bayes-

ian model, American Christians should reduce their faith in God because Indians have a different belief system. Palestinians who believe that Mos-sad destroyed the World Trade Center should moderate their views because they know that Americans do not share that view. If people thought that the views expressed by other groups were completely untrustworthy and shaped more by incentives to lie than by information, then we might be able to understand this failure of beliefs to converge.


If people are acting as Bayesians, they will end up both more unified and o

more extreme as a result of group discussions. The result may be nothing to t

deplore. If group members begin with the thought that people likely have j

one heart and two kidneys, or that it is probably negligent to drive over 80 c

miles per hour in a crowded area near a school, there is no problem if dis- d

cussion leads them to become more firmly committed to these beliefs and |

more unified in holding them. The same idea expresses the ideal conception .

of jury deliberation, as the exchange of information is supposed to ensure /

unanimity on a proposition that is true; and indeed juries typically polarize S

in criminal cases (Brown 1986). The notion of deliberative democracy has |

similar foundations. If the initial distribution of information is adequate, a

nothing is wrong with a situation in which participants in democratic de- 3

liberation become more extreme in their commitment to a certain outcome V

or course of action.

If, on the other hand, people are Credulous Bayesians, and overreacting | to the actual or perceived views of others, they may end up making ma- sf jor mistakes. As we have seen, large groups may do worse, not better, than 3

small ones. With respect to politics, people may accept some view that is 5 clearly inconsistent with the facts; widespread commitments to implausible conspiracy theories, or to preposterous accounts of what "really" underlies some natural or social phenomenon, can be understood in this light. Actual behavior may be adversely affected as well — as, for example, when people falsely believe that they are at risk and take unjustified precautions, or falsely believe that they are safe and fail to take protective measures.

Of course it is not possible to move directly from an understanding of how social learning occurs to any particular set of institutional reforms.

Much needs to be known about the particular context, including the particular form of Credulous Bayesianism. If people fail sufficiently to discount the self-interested incentives of speakers, as in the case of political advertising, the solution may be different from what it should be if people fail sufficiently to discount for the skewed nature of the distribution of in-114 formation within their group. In some groups, people may be good Bayes-ians and engage in appropriate discounting. Perhaps expert institutions are able to do exactly that. In other groups, diversity is desirable, but it also has o significant costs in terms of (for example) increased acrimony, social loaf- O ing, and greater difficulty in reaching any decision at

aU. The benefits of

error reduction may be lower than those costs. <f

With these disclaimers, we explore some possible implications here for gg independent regulatory agencies and federal appellate courts — two insti- j tutions that are extremely important and that are objects of considerable c public debate. Our goal is not to make any particular normative recommen- d dation, but to obtain a better understanding of current practices requiring c

diversity (in the case of administrative agencies) and current debates over .

the issue of diversity (in the case of federal appellate courts). We also offer 115 / a brief note on media policy. In particular, we are interested in the relation- S ship between our claims and debates, past and present, over the "fairness I doctrine" long imposed and now abandoned by the Federal Communica- 3 tions Commission.

8.1. independent Regulatory commissions

A great deal of national policy is established by the so-called independent regulatory commissions, such as the Federal Communications Commission, the Federal Trade Commission, the Securities and Exchange Commission, and the National Labor Relations Board. These agencies typically consist of five members, who are appointed by the president (with the advice and consent of the Senate), serve for specified terms (usually seven years), and make decisions by majority vote. Because of the immense importance of their decisions, any Democratic president would much like to be able to ensure that the commissions consist entirely or almost entirely of Democratic appointees; Republican presidents would certainly like to shift policy in their preferred directions by ensuring domination by Republican appointees.

Under existing law, however, presidential control of the commissions is sharply constrained, for no more than a bare majority can be from a single political party. Congress has explicitly so required (typically saying, "not more than three of the Commissioners shall be members of the same political party"),7 and indeed this has become the standard pattern for the independent agencies. Hence, for example, the National Labor Relations 117 Board and the Federal Communications Commission must consist either

of three Republicans and two Democrats or of three Democrats and two o

Republicans. From the standpoint of the president, a particular problem O

arises in a time of transition from one administration to the next. A Demo- e

cratic president, for example, is often disturbed to learn that agencies en- O

trusted with implementing legislation policy will be composed of at least t

two Republicans (appointed by his predecessor). j

It should be clear that the requirement of bipartisan composition oper- c

ates as a constraint on group polarization and extreme movements. Five d

Democratic appointees to the NLRB, for example, might well lead labor c

law in dramatic new directions. As we have seen, such movements could .

operate through rational Bayesianism. Perhaps a president would like to / choose five people with extensive experience in labor-management rela- S tions, specializing in marshalling evidence and arguments in support of I labor unions. Perhaps the five Democratic appointees could learn from one a another in a way that produces a consensus on some position that, while 3

8 extreme in light of existing law, is sensible as a matter of policy. Or perhaps v

the movements could occur as a result of Credulous Bayesianism. NLRB

commissioners might well discount the extent to which the information oI that they hold is shared by all, or the extent to which important views are 1 missing, or the extent to which some of them are signaling a position that 3

conforms to the perceived group norm. Such signaling can occur within ex- 5 pert groups as well as within groups of nonspecialists. If people are Credulous Bayesians, then the presence of two Republican appointees constrains the relevant movements and ensures that significant counterarguments will be offered. To this extent, bipartisan membership might serve to limit un-warrantedly extreme changes in regulatory policy.

7 See 47 U.S.C. §154(b)(5) (Federal Communications Commission); 15 U.S.C. §41 (Federal Trade Commission).

We can now understand how requirements of bipartisan membership might reasonably be debated. In the abstract, it is not clear that any particular Congress would want to prevent relative extremism. A Democratic-majority Congress, and the groups who support its members, might well believe that an all-Democratic NLRB would have a better understanding of national labor policy; perhaps the rulings of such an NLRB would be a more faithful agent of that particular legislature. If Democratic members 119

are perfect Bayesians, and if an all-Democratic NLRB contained the optimal o

range of information, so that Republican appointees would add confusion O

and falsehood, bipartisan composition would be hard to justify. (No legisla- d

tor believes that the NLRB should have communist or anarchist members.) <f

But if many members of Congress believe that stability is desirable over gg

time, and if most of them want to check unjustified movements produced j

by Credulous Bayesianism, legislators, and the diverse groups who pres- c

sure them, might be able to reach a consensus on bipartisan membership d

as the best means to their ends. Bipartisan membership might turn out to |

represent a stable kind of arms control agreement, in which members of .

both parties are willing to relinquish the possibility of extreme movements /

in their preferred direction in return for assurance against extreme move- S

ments the other way.8 And some legislators, and outside observers, might I

be willing to defend the current situation as reflecting an intuitive aware- a?

ness of the consequences of Credulous Bayesianism. 3

8.2. Federal Appellate courts

Do similar considerations apply to the federal judiciary? At first glance, the judiciary is quite different, because many people believe that it is not supposed to make policy at all. And indeed, judges are supposed to be specialists in assessing both (legally relevant) facts and law, and hence the idea of bipartisan membership might seem jarring. But the evidence suggests a more complicated picture. Note first that judicial panels consist of three judges, and assignment to three-judge panels is random. This means that there are many DDD panels, many RRR panels, many RDD panels, and many RRD panels. As our analysis would predict, extreme movements are

8 For informative discussion, involving alternating incumbency as a way of constraining extremism on the bench, see J. Mark Ramseyer & Eric B. Rasmussen (2003).

shown by DDD and RRR panels, in the sense that judges, on such panels, are especially likely to vote in line with ideological stereotypes.

We have referred to this point in general terms; now consider a few examples (Sunstein et al. 2006). On all-Republican panels, Republican appointees vote for gay rights 14 percent of the time; on all-Democratic panels, Democratic appointees vote for gay rights 100 percent of the time. On 121 all-Republican panels, Republican appointees vote to validate affirmative

action programs 34 percent of the time; on all-Democratic panels, Demo- o cratic appointees vote to validate such programs 83 percent of the time. On O all-Republican panels, Republican appointees vote in favor of women in d sex discrimination cases 30 percent of the time; on all-Democratic panels, <f Democratic appointees vote in favor of women in sex discrimination cases g 76 percent of the time. On all-Republican panels, Republican appointees j vote for disabled people in cases brought under the Americans with Dis- c abilities Act 17 percent of the time; on all-Democratic panels, Democratic d appointees vote for disabled people in such cases 50 percent of the time. In | cases brought under the National Environmental Policy Act, Republican .

appointees on all-Republican panels vote for environmental plaintiffs 20 /

percent of the time; in such cases, Democratic appointees on all-Democrat- S

ic panels vote for environmental plaintiffs 71 percent of the time. I

By contrast, both Republican and Democratic appointees show far more a?

moderation when they sit on panels containing at least one appointee nomi- 3

nated by a president of the opposing political party. On mixed panels, Repub- V

lican appointees are much more liberal than they are on unified Republican g.

panels; Democratic appointees show precisely the same effects. In affirmative | action cases, Republican appointees show a 69 percent liberal voting rate on 1 RDD panels (far above the 34 percent rate on RRR panels); in such cases, 3

2 Democratic appointees show a 60 percent liberal voting rate on DRR panels 5 (far below the 83 percent rate on DDD panels). In sex discrimination cases, Republican appointees show a 44 percent liberal voting rate on RDD panels (far above the 30 percent rate on RRR panels); in such cases, Democratic appointees show 58 percent liberal voting rate on DRR panels (far below the 76 percent rate on DDD panels). In some domains, the difference between Democratic appointees and Republican appointees is small or even nonexistent on mixed panels; it emerges only when we compare Rs on RRR panels

to Ds on DDD panels. In nearly all areas, the difference between Republican and Democratic appointees is sharpest on DDD and RRR panels; sitting with like-minded judges appear to create significant polarization. We have said that the NLRB must have bipartisan membership, but of course appellate panels that review the NLRB need not; and the results of appellate review of NLRB decisions are very different depending on whether the panel is RRR or DDD (see Miles & Sunstein 2008).

These patterns are consistent with perfect Bayesianism. On a DDD panel, Democratic appointees will hear different conclusions and different arguments from what they hear on a DRR panel. And if DDD panels contain all the arguments that it is useful to hear, nothing is amiss. But it is at least possible that in some cases, such appointees are not receiving new information at all or that they should discount the relevant arguments by taking account of the sources. In short, Credulous Bayesianism may well be at work, even on the federal bench. Some of the time, we speculate, judges may well be acting as if agreement from other judges supplies additional information, when it does not. A great deal of additional work would be needed to un- .

derstanding the precise mechanisms here; we could imagine experiments / in which judicial deliberations were recorded and analyzed. But in support S of our speculation, consider Judge Richard A. Posner's report that serious I deliberation, and the careful exchange of information and reasons, are rare 3 among three-judge panels (see Posner, 2008). If this is so, it is reasonable f

to believe that Credulous Bayesianism helps to account for the observed v

patterns. And at least some of the time, it is also reasonable to believe that

in ideologically contested cases, the greater moderation of DDR or RRD CI

panels is influenced by the existence of competing conclusions and even 1

arguments. 2

The purpose of full or "en banc" review, on the courts of appeals, is to cor- 5 rect errors on the part of three-judge panels. If that is the purpose, and if we do not believe that Ds or Rs have a monopoly on information or wisdom, a relatively uncontroversial implication is that a warning flag should be raised whenever a unified panel goes far in the ideologically predictable direction. That warning flag might justify closer consideration of en banc review. In the most important cases, the warning flag might also be relevant to the Supreme Court's decision whether to take review, which is also designed to correct 124

errors on the part of lower courts. It would seem quite sensible for the Supreme Court to consider, as a relevant factor, whether the decision it is being asked to review was decided by a unified or mixed panel. If a DDD panel has ruled in favor of an affirmative action program, or if an RRR panel has ruled against environmentalists challenging a federal regulation, there is particular reason to attend to an argument that the panel has erred. In fact we are willing to hypothesize that the Court's reviewing practice is implicitly responsive to this consideration, and that the Court is distinctly likely to grant review in o ideologically contested cases resolved by a DDD or RRR panel. It would be O valuable to test this hypothesis empirically. d

A possible counterargument would be that while the political party of <f

the appointing president is a proxy for ideology, the proxy is crude: Some t

Republican appointees are more liberal, in general or in particular areas, j

than some Democratic appointees. A more fine-grained approach, atten- c

tive to the value of diversity, would inquire directly into the established d

voting tendencies of various judges, not into the political party of the ap- c

pointing president. But it is not simple to operationalize the more fine- .

grained measures, even though they exist; and during judges' early years on /

the bench, judicial records are too spare to permit easy characterizations. S

The political party of the appointing president may be the best way to com- I

bine (adequate) accuracy with ease of administration. a

A much more controversial implication is that in the most difficult and "

ideologically charged cases, those who seek to avoid the effects of group po- V

larization should consider efforts to create diverse judicial panels, as in the context of the NLRB and the FCC. (Of course any proposal in favor of bal- | anced panels would be feasible only in circuits that have significant numbers 1 of both Ds and Rs.) This implication is controversial because the judiciary 2 is not understood as a policymaking institution, because such an approach 5 might cement judicial self-identification in political terms, and because efforts to ensure ideological diversity might well be taken as inconsistent with the commitment to judicial neutrality. But the discussion here suggests that judges are policymakers of a distinctive kind, and that in principle, the argument for diversity, as a means of counteracting Credulous Bayesianism and hence group polarization, is not significantly different from the argument in the context of the independent regulatory commissions. Recall here that while the NLRB must be DDDRR or RRRDD, reviewing courts are not simi-

larly constrained, and that the ultimate fate of NLRB decisions and hence national labor law, even in the most important domains, will often be radically different if the reviewing court is RRR or DDD. By contrast, appellate panels are far more moderate if they are RRD or RDD.

Of course it is also true that judicial diversity is constrained by a number of other factors, most obviously the fact that judicial panels are composed entirely of lawyers. We put to one side the interesting question whether this lack of diversity causes significant problems in terms of our central argu- o ment here. C

8.3. Media Policy and Diversity C

For many years, the Federal Communications Commission imposed the 127 tr

"fairness doctrine," which, in brief summary, required radio and television j

broadcasters to cover issues of public controversy and to allow presenta- c

tions by competing sides (Sunstein 1993). Under the fairness doctrine, it d

would be unacceptable for a television station to offer the "liberal" posi- |

tion on all issues, without permitting alternative positions to have their say. .

The fairness doctrine was of course highly controversial, in part because / of evident difficulties in administration, and indeed it was challenged in 128 S the Supreme Court on the ground that it abridged the free speech rights of C| broadcasters (Red Lion Broadcasting Co. v. FCC9). In the view of the chal- a

lengers, the government should not be allowed to force them to present f

certain positions on the stations that they owned. V

For our purposes, the Court's response was of special interest. The Court emphasized that the "rights of listeners and viewers," rather than the rights | of broadcasters, should be taken as paramount. In the Court's views, listen- 1 ers and viewers had something like a "right" to be exposed to competing t

positions, and a single set of presentations, from a single point of view, 5 would violate that "right." This claim might seem puzzling in the abstract; whether or not is it correct, it is far easier to understand the Court's concern in light of the arguments we have offered. Credulous Bayesians might fail 129 to discount the partiality or bias of a station that offers a particular point of view; they might treat the presentation that is offered as relevantly representative even though it is not. Without the fairness doctrine, there is a risk

9 395 U.S. 367 (1969).

that people will live in echo chambers, or information cocoons, in which they end up more confident and more extreme simply because they are listening to the same point of view.

We do not mean here to explore the greatly contested question whether the fairness doctrine was a good idea, even in its time of great scarcity of broadcasters; the point is only that the doctrine, and the Court's decision to uphold it, may well be understood as reflecting an intuitive understanding of Credulous Bayesianism. Ordinary listeners and viewers, hearing a station o that repeatedly offers a single perspective, might not sufficiently discount O the points of view that they

are hearing. 3

In the modern era, the FCC has mostly repealed the doctrine's require- <f ments, largely on the theory that with so many options and outlets, people t are able to have access to an exceedingly wide range of information and j opinions. Nothing said here demonstrates that this conclusion was wrong. c But the remaining problem, signaled by our analysis, is that if people are d engaged in a degree of self-sorting, so that they select points of view with | which they antecedently agree, they might well be moved in the direction of .

extremism precisely because of the operation of Credulous Bayesianism. /

To the extent that there is a high degree of self-sorting, the communica- S tions market may reveal a fully voluntary version of the Colorado experi- | ment. The market is likely to have some similar dynamics, underpinned by a?

Credulous Bayesianism, producing both polarization and confidence. In 3

fact we would predict that a fully open communications market, with ideo- v

logically identifiable sources and with voluntary sorting, would, for many

people, replicate the results of that experiment. |

9. conclusion: the wisdom and the folly of crowds 3

We have argued here that extreme movements can be a product of rational updating, as people respond to the information and the arguments offered by others. To this extent, polarized opinions need not reflect any kind of bias or irrationality on the part of those whose opinions have been rendered more extreme. To the extent that people are responding rationally to new information, more confident and more extreme groups may well be wise. The "wisdom of crowds" is plausibly explained in this way.

At the same time, polarization may often be produced by Credulous Bayesianism, in which people treat the views of others as significantly more informative than they actually are. We have explored four possibilities. (1) Sometimes people's opinions have common sources, and hence the views of others add little. (2) Sometimes group members are not a random sample 134 of the population as a whole and the pre-deliberation distribution of views is biased. (3) Sometimes group members frame their views so as to curry favor or to avoid social sanctions. (4) Sometimes people have incentives o to mislead. We have suggested that Credulous Bayesians give insufficient O weight to these possibilities, in a way that can produce significant blun- d ders. Errors by deliberating groups are frequently a product of these four <f phenomena. As a result of these errors, members of deliberating groups t may well be less wise as well as more extreme than individuals. The folly of j

crowds is often a result. c

An understanding of Credulous Bayesianism does not lead to any simple d prescription for institutional design, but it does have important implica- c

tions, and it helps explain a number of current practices and debates. Con- .

gress' decision to require bipartisan composition on the independent regu- / latory commissions, and the absence of a significant controversy over that S decision, might be explained as responsive to an understanding of the risks I associated with the possible movements that we have explored here. Much a more controversially, we have suggested that an understanding of group f

polarization and Credulous Bayesianism helps to explain current calls for v

diversity on federal appellate panels. The question of appropriate media policy raises many complexities that we have not explored, but it is plain cI that many past and current debates are rooted in an intuitive awareness of 1 the phenomenon of group polarization and an understanding that Credu- t

lous Bayesianism might lead both individuals and groups in unfortunate 5 directions.


Abrams, Dominic, Margaret Wetherell, Sandra Cochrane, Michael A. Hogg & John C. Turner. 1990. Knowing What To Think By Knowing Who You Are: A Social Identity Approach to Norm Formation, Conformity, and Group Polarization, 29 Brit. J. Soc. psychol. 97-119.

Asch, Solomon. 1955. Opinions and Social Pressure, 193 Scientific American 31-35.

Baron, Robert S., Sieg I. Hoppe, Bethany Brunsman, Barbara Linneweh & Diane Rogers. 1996. Social Corroboration and Opinion Extremity, 32(6) J. Experimental Soc. psychol. 537-560.

Bernheim, B. Douglas. 1994. A Theory of Conformity, 102(5) J. pol. Econ. 841-877.

Brown, Roger. 1986. Social psychology: The Second Edition. New York: The Free Press.

Camerer, Colin F., Teck-Hua Ho & Kuan Chong. 2004. A cognitive hierarchy model of behavior in games, 119(3) Q.J. Econ. 861-898.

Cromwell, Paul, Alan Marks, James N. Olson & D'Aunn W. Avary. 1991. Group Effects on Decision-Making by Burglars, 69(2) psychol. Rep. 579-588.

Crutchfield, R.S. 1955. Conformity and Character, 10 Am. psychol. J. 191-198. l

DeMarzo, Peter, Dimitri Vayanos & Jeffrey Zwiebel. 2003. Persuasion Bias, Social Influ- /

ence and Uni-Dimensional Opinions, 118(3) Q.J. Econ. 909-968. g

Eyster, Erik, & Matthew Rabin. 2005. Cursed Equilibrium, 73(5) Econometrica 1623- 0

1672. g

Getzkow, Matthew, & Jesse Shapiro. 2004. Media, Education, and Anti-Americanism in |

the Muslim World, 18(3) J. Econ. persp. 117-133. U

Gigerenzer, Gerd. 2007. Gut Feelings: The Intelligence of the Unconscious. New York: Vi- r

king. gg

Glaeser, Edward L. 2004. Psychology and the Market, 94(2) Am. Econ. Rev. papers & J

proc. 408-413. |f

Glaeser, Edward L., & Bryce Ward. 2006. Myths and Realities of American Political Ge- 2

ography, 20(2) J. Econ. persp. 119-144. 2

Habermas, Jurgen. 1998. Between Facts and Norms. Cambridge, Mass.: MIT.

Hong, Lawrence. 1978. Risky Shift and Cautious Shift: Some Direct Evidence on the Culture Value Theory, 41(4) Soc. psychol. 342-346.

Kaplan, Martin F. 1977. Discussion Polarization Effects in a Modified Jury Decision Paradigm: Informational Influences, 40(4) Sociometry 262-271.

Kerr, Norbert, Robert MacCoun & Geoffrey P. Kramer. 1996. Bias in Judgment: Comparing Individuals and Groups. 103 psychol. Rev. 687-705.

Krech, David, Richard S. Crutchfield & Egerton S. Ballachey. 1962. The Individual In Society. New York: McGraw Hill.

Lord, Charles G., Lee Ross & Mark R. Lepper. 1979. Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence, 37 J. personality & Soc. psychol. 2098-2109.

Miles, Thomas, & Cass R. Sunstein. 2008. The Real World of Arbitrariness Review, 75 U. Chi. L. Rev. 761-813.

Morris, Stephen. 2001. Political Correctness, 109(2) J. pol. Econ. 231-265.

Moscovici, Serge, & Marisa Zavalloni. 1969. The Group as a Polarizer of Attitudes, 12 J. personality & Soc. psychol. 125-135.

Mullainathan, Sendhil, Joshua Schwarzstein & Andre Shleifer. 2008. Coarse Thinking and Persuasion, Q.J. Econ.

Myers, David G. 1975. Discussion-Induced Attitude Polarization, 28(8) Hum. Rel. 699714.

Myers, David G., & George D. Bishop. 1970. Discussion Effects on Racial Attitudes, 169 Sci. 778-789.

Myers, David G., & Martin F. Kaplan. 1976. Group-Induced Polarization in Simulated Juries, 2 personality & Soc. psychol. Bull. 63-66.

Page, Scott. 2006. The Difference: How the power of Diversity Creates Better Groups, Firms, 'r Schools and Societies. Princeton: Princeton University Press. a

Posner, Richard A. 2008. How Judges Think. Cambridge, Mass.: Harvard University Press.

Ramseyer, J. Mark, & Eric B. Rasmussen. 2003. Measuring Judicial Independence. Chicago: University of Chicago Press. g

Ross, Lee, & Richard E. Nisbett. 1991. The person and the Situation. New York: McGraw <'

Sargent, Thomas. 1979. Macroeconomic Theory. New York: Academic Press. g

Schkade, David, Cass R. Sunstein & Reid Hastie. 2007. What Happened on Deliberation e Day?, 95 Cal. L. Rev. 915-940. g

Schkade, David, Cass R. Sunstein & Daniel Kahneman. 2000. Deliberating About Dol- 5 lars: The Severity Shift, 100 Colum. L. Rev. 1139-1176.

Shapiro, Jesse. 2006. A Memory-Jamming Theory of Advertising. Mimeograph, University of Chicago.

Spears, Russell, Martin Lee & Stephen Lee. 1990. De-Individuation and Group Polarization in Computer-Mediated Communication, 29 Brit. J Soc. Psychol. 121-134.

Stoner, James A.F. 1961. A Comparison of Individual and Group Decision Involving Risk. Unpublished Master's Thesis, Massachusetts Institute of Technology. Available at:

Sunstein, Cass R. 1993. Democracy and the problem of Free Speech. New York: The Free Press.

-. 2003. Why Societies Need Dissent. Cambridge, Mass.: Harvard University Press.

Sunstein, Cass R., David Schkade & Lisa M. Ellman. 2004. Ideological Voting on Federal Courts of Appeals: A Preliminary Investigation, 90(1) Va. L. Rev. 301-354.

Sunstein, Cass R., David Schkade, Lisa M. Ellman & Andres Sawicki. 2006. Are Judges political? An Empirical Investigation of the Federal Judiciary. Washington: The Brookings Institution.

Surowiecki, James. 2004. The Wisdom of Crowds: Why the Many Are Smarter Than the w

Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. o

New York: Little, Brown & Co. e

Turner, John C., Michael A. Hogg, Penelope J. Oakes, Steven D. Reicher & Margaret 0

S. Wetherell. 1987. Rediscovering the Social Group: A Self-Categorization Theory. h

New York: Blackwell. p

appendix: proofs of propositions

Proof of Proposition V.

The derivative of the variance of ex post beliefs, or

JPx(Po(1 + (i - 1)v) + JPi) Po(Po(1 + (I - 1)Xv) + iPi)2

with respect to X equals

2I(I - 1)vP1(Po(1 + (I - 1)v) + IP1

The variance in the error of ex post beliefs is

(1 - v)(I - 1)vP1I 1 + (I - 1)Xv

(P0(1 + (I - 1) Xv) + IP1)2 + po(1 + (I - 1)Xv) + Ip1

2P0P1I(I - 1)2 (1 - X)v2

(Po(1 + (I - 1)Xv) + IP1)

The difference between the actual variance of the error term and the perceived variance of the error term is

(1 - X)(I - 1)vP1I

(Po(1 + (I - 1)Xv) + IP1)2

and the derivative of this with respect to X equals:

- (I - 1)vP1I(2(I- 1)(1 - X)Pov + Po(1 + (I - 1)Xv) + IP1) (Po(1 + (I - 1)Xv) + IP1)3

The derivative of

(1 - X)(I - 1)P1I

(Po(1 + (I - 1) Xv) + IP1)2

(Po(1+(I - 1)Xv)+IP1)3 :

and the derivative of this with respect to X equals /

: < 0. I

with respect to v equals

(Po(1 - (I - 1)Xv) + JPi)(1 - X)(J - 1)vPiJ (Po(1 + (I - 1)Xv) + IP1)3

which is positive if and only if

P0+IP1 > (I - 1)XvPo.

Proof of Proposition 2:

The derivative of

(1 - v)(I - 1)vP17 1 + (I - 1)Xv

(Po(1 + (I - 1) Xv) + IP1)2 Po(1 + (I - 1)Xv) + IP1

(1 + (I - 1)Xv)2Po + (1 + (I - 1)v)IP1

(Po(1 + (I - 1)Xv) + IP1)2

with respect to I equals

PoP1[1 + (I - 1)Xv(v(1 - 2 X) + 3) + (1 - 2 I)v] + (1 - v) IP12 (Po(1 + (I - 1)Xv) + IP1)3 *

When X = 1, this equals C

- (1 - X)p1 o

(Po(1+(I - 1)v)+IP1)2 aa

which is strictly negative, and since the derivative is continuous, it will re- /

main negative when X is high. When X = 0, the derivative equals PoP1(2Iv - 1 - v) - (1 - v )IP12

which is strictly positive if and only if

Po + P

v > ■

Po(2i-1) + IP

Again by continuity, the derivative will be positive when X is sufficiently close to zero if this condition holds.

(Po + IP1)3 n

Proof of Proposition 3:

The variance of ex post beliefs equals the variance of

p1(zim + id)

which equals

Po(1 + (I - + IPi Pi((I+I(I - 1)^)Po+12Pi)

Po(Po(1 + (I - 1)X^) + IP1)2 This variance is clearly declining with X. The error in the beliefs is

P1(2/n,0 - Po(1 + (I - 1)X^)d /

Po(1 + (I - 1)X^)+IP1

and the variance of that is

P1(I+I(I - 1)^) + Po(1 + (I - 1)X^)2 (Po(1 + (I - 1)X^) + IP1)2 .

The derivative of that with respect to X is S

- (1 - X)2PoP1(I- 1W < 0. |

(Po(1 + (I - 1)X^) + IP1)3

The derivative of the variance of ex post beliefs with respect to I is

PoP1[(1 - X^) + I ^(1 - X) + ^(I - 1)(1 - X^) + I p(1 + ^ - 2X^)] (Po(1 + (I - 1)X^) + IP1)3

The derivative of the variance of the error term with respect to I is

PoP1(- 1 + I ^ + (I - 1)^(1 - 3X + X(2X - 1)^)) - IP12 (1 - ip) (Po(1 + (I - 1)X^) + IP1)3

If X equals one, then the derivative is

- P1(1 - ip)

(Po(1 + (I - 1№) + IP1)2

By continuity, the derivative must be negative in a region around one.

If X equals zero, then the derivative is

(2I - D^PP, - I(1 - W - P1P0


which is strictly positive whenever

(2 -1) __L_ > P1 .

v V 1 - ^ I(1 - Po

When this condition holds, then by continuity the derivative will be positive when X is close enough to zero.

Proof of Proposition 4:

The variance of ex post beliefs is

Px((j+i(i - 1)ty) + Po+i2Pi) Po(Po(1 + (I - 1)Xty) + IPi)2

and derivative of this with respect to ty is

[IPi(1 - 2 X) + ( Po((1 - 2 X) - (I - 1)Xty))](I - 1)IPi I

(Po(1 + (I - 1)Xty) + IP1)3 ■ d

This is negative if and only if 3

IP1+po -I

X > (1-2X) typ^ ■ 1

The variance of posterior error term is

P1(I+I(I-1)ty) + Po(1 + (I -1)Xty)2 §

(Po(1 + (I -1)Xty) + IP1)2 and the derivative of this with respect to ty is

I(I - 1)P1(IP1 + Po - (I - 1)PoXty(1 - 2X)) F

(Po(1 + (I - 1)Xty) + IP1)3 •

This is positive if and only if

IP1 + Po > X(1-2X). ty (I - 1)Po

Proof of Proposition 5:

(a) The variance of the error around the posterior when people are perfect Bayesians is

((1 - v)2 + (1 - v)vI + a(1 - a)v2I2) (((1 - v) + (1- v + vI)+a(1-a)v2I2)Po + ((1- v)I + 2a(1- a)vI2)P1) '

And the derivative of this with respect to a equals:

(2a - 1)212P1v(1 - v)(1 - v + o.5vI) (((1 - v)2 + (1 - v)vI+a(1 - a)v212)Po + ((1 - v)I+2 a(1 - a)vI2)P1)2

Which is positive if and only if a > 0.5. The maximum variance is

(1 - v)2 + (1 - v)vI ((1 - v)2 + (1 - v)vI)Po + (1 - v)IP1

and the minimum variance is

((1 - v)2 + (1 - v)vI+o.25v212)

(((1 - v)2 + (1 - v)vI0.25v212)Po + ((1 - v) + o.5vI2) '

When individuals do not think that there is any common noise (i.e., they are naive Bayesians) then the variance is a

P1I(1 - v + vI - 2 a(1 - a)vI) + Po O

(Po + IP1)2 f

and the derivative of this with respect to a equals: o

2P1vI 2 (2a - 1) g

(Po+IP1)2 g

which is positive if and only if a > 0.5. The maximum variance is

P1I(1 - v + vI) + Po (Po + IP1)2

and the minimum variance is

P1I(1 - v + o.5vI) + Po (Po + IP1)2 '

The basic formula for the posterior can be written

P1(Ei<ai(a + b(1 - a))S i + ^¿>ai (a + ba)Sj) (a (a + b) + a(1 - a)b2)P0 + (a + 2ba(1 - a))IP1

where a = 1 - Xv and b = 1 - XvI.

With this, the variance of the difference between the posterior and D equals:

(a4P0 + 2a3bP0 - 2ab(b2P0+IP1(2 + (I - 2)v)) (a - 1 )a + b2(a - 1 )a(IP1(v - 1) + b2P0(a - 1 )a + 2I2P1v(a - 1 )a) + a2(b2P0(1 - 2(a - 1 )a) + IP^1 + v( - 1 + I(1 + 2(a - 1)a)))).

Divided by: (a2P0 + a(bP0+IP1) - b(bP0 + 2IP^a - 1)a)2. The derivative of this with respect to a is then:

Divided by: (P0(1 - Xv)2+I(1 - Xv)p + P0vX) - I2v(a - 1)aX^ + P0vX))3 . When a = 0, this becomes:

12P1v(IP1( - 2 + vX(4 - 2Xv) +1( - 2 + X+vX))) + P0( - 2 + vX(2 +X( - 3I(1 + v) + 2 (2 + v) - (I - 1) v ( - 8 + I + (I - 2) v )X + 2(I - 2)(I - 1)v2X2)))).

Divided by: (Xv - 1)2 (P0+IP1 + (I - 1)P0vX)3.

I2P1v(2a - 1)(IP1(2 + vX( - 6 + 2I2(v - 1)v(a - 1 )aX2 - .

2Xv( - 3 + Xv) - I(Xv -1)(2 - (v + 1)X + 4(a - 1)a(1 + (v - 2)X)))) + <|

P0(2 + (I - 2)v4(1 + I(a - 1)(Ia - 1)X4 (2X - 1) + vX( - 4 - 4X + 3IX) + i

v3X3(4 - 12X+I( - 6 + I(1 + (I - 4) (a - 1)a(2 - 3X) - 3X + 15X)) + | v2 X2(12X+I(3 - 12X+I(X - 2(a-1)a(4X - 3)))))).

This is negative if and only if the other parameters satisfy:

IPj( - 2 + vX(4 - 2Xv + I( - 2 + X + Xv))) + P0( - 2 + vX(2 - X( - 37(1 - v) + 2(2 + v) - (I - 1)v( - 8 + I + (I - 2)X) + 2(I - 2)(I - 1)v2X2))) < 0.

And if this condition holds, then by continuity there must be some a sufficiently close to zero such this derivative is still negative.

(b) The reduction in error variance moving from a = 0 to a = 0.5 for the perfect Bayesian equals

viaPj [0.5 + (- 0.5 + 0.25)v]

The reduction in error variance moving from a = 0 to a = 0.5 for the naïve

Bayesian equals 2(p +,'p )2. The difference between the reduction in error variance for the naïve Bayesian and the reduction in error variance for the perfect Bayesian is:

!_pp v(___I+(O.5I-I)v_)

2 1 HPo+IPI)3 (po+ip1-pOV+ipOV)(ip1(I+(O.5i-I)V)+pO(I+(i-2)V+0.25(i-2)\1)) )'

Which is greater than zero if and only if

(Po +IPi- PoV + IPov)((i + (i - 2)v + 0.25(1 - 2)2v2)Po + (l + (0.5I - l)lP1) . Jj

i > _i+(O.5I-I)v__i

(Po+IPi)2 (Po+IPi-Pov+IPov)(IPi(I+(O.5I-I)v)+Po(I+(I-2)v+0.25(I-2)2v2) . e

This condition can be rewritten:

v (I - 1)p0((p0 + IP 1) + IPi(0.5I - 1)v + P0((I - 2)v + 0.25(I - 2)2v2)) + (Po+IPi)((Po((0.5I - 1)v + 0.25(I - 2)2 v2)) > 0.

So the naïve Bayesian always becomes more accurate.

Proof of Proposition 6:

The formula for the posterior can be written:

Pi(B(1 +1-1)+e i + (I-1)X(X-l)n)

Po+IP1 .

And the difference between the posterior and the true value of the outcome, D, equals

(I - 1)(XX - l)n Pd - Pod+P1(ei+X j ej + (I - 1)(XX - l)n )

Po+IP1 .

The expected value of this quantity, conditioning on n, but not on D, equals P.U-iXX-i^ and this is the d


I and P1 and falling with P0.

P MP*+«>l)n and this is the degree of bias. This quantity is rising with X, n,

And the derivative of this with respect to X equals

-2Pi2 4-1 (1 + i-1 )-2PoPi (4-1- )+2PoPi2 (l-l)?X(X-l) Var(n)

which is positive if and only if (I-1)(X-1)Var(n) > — (I+X-1) + —.

The variance of the difference between the posterior belief and the true value is

Pi2( )2+ Po2+2PoPi(i-1)( X-1 )+PoPi(1+I-1 )+PoPi2 )Var (n)

Po(Po+IPiY .

Which is increasing with X. , 1-1,

D P1(l + X2 )

Within a group, the variance of beliefs equals (p +fp )2 which is declining with X.

The variance of the posterior equals

P12(1 + ^ )2 + PoP1(1 +1-1) + Po P12((l - l)(XX - l)n )2Var(n) m

Po(Po + IP1)2 . §

Proof of Corollary 1:

The results in this corollary follow directly from applying Proposition 4 and that fact that X = y.

Proof of Proposition 7:

The posterior belief of the decision-maker equals p*(ID + + p*^-,-)

and the variance of this equals -p-+p-+-py which is increasing with p*

and hence decreasing with X.

The error term in the posterior belief equals (p*I - 1)D + p*(2i-ei- + p*^-,-)

which has variance - (l~1p>l) + -i^r1 + -p1- and the derivative of this with re-p0 , P1 p|

spect to p* is - ri(l—p 1 + ^py+"p1. This quantity is positive as long as -p0"+-p-+p->pr which must hold when X > 1 since p*(X) satisfies "10+"p1"+=p^. Since the variance of the error is increasing in p* when X < 1, it must be decreasing with X if this condition holds.

The perceived variance of the error term equals ilP^+P. + X-pH and n the derivative of this with respect to X is S

P*^ (4 PP,, - 3p7P|P|| - 3 pfPop, - 2p*2XPo P i) 0 J

P, ( IPiP,+PoP,+6Xp*2PoPi ) >. |

The expected error term in the posterior equals p*2I_, and the value of 1

- can be written as 0.5 times Cov(p,y) + (p)(y) where Cov(p,y) is the ex

post covariance of p and y and (p) and (-Y-) are the ex post means of p and

Y Y 1 _

Y respectively. The ex post bias is clearly increasing with Cov(p,y) and (p), and with (y) when (p) > 0.

Proof of Proposition 8:

The posterior belief of the decision-maker equals

^liDM + ()2 2.(1-X^,

p0+ip1 vp0+IPi If Xi=X for all i, then the variance of this term is

IP1 | ( Pl )4 (l-X)21 |

po(po +Ipi) (Po+iPi) Pu a

which is falling with X . The variance of the error term equals 0

1 +( Pl )4 (l - X)2I /

P0+IP1 ( P0+IP1 ) Pu f

which is also falling with X. The perceived variance of the error term is

P1 \ 4 (l - X)2 XI

—L-+(-P) .

P0 + IPi P0 + IPi Pu The derivative of this with respect to X is

^)4 PU[3X2 - 4X+1]

which is negative if and only if 3X2 - 4X + 1< 0, which holds for -3- < X < 1.

The expected bias in decision-making equals (P +' )2IE((1 - Xi)u;) which

P0 + IP1

also equals

)2 (0.5E(1 - X,)E(p)E(^) + 0.5E(1 - X;)Cov(p-, ± )Cov(1 - X., u)).

This * increasing in C°V(p,^ Cov(1 - X„ ^ and E(p) and increasing

with E(—) and E(1 - X.) if and only if E(p) is positive.