A big Noise (review)

Noise: A flaw in human judgment

by Daniel Kahneman, Olivier Sibony and Cass R. Sunstein.

Published by William Collins, 2021.

By now, many people are familiar with Nobel prize winner Daniel Kahneman’s previous book Thinking Fast and Slow, in which he popularised the idea that rapid unconscious thought processes underlie many of our judgments and decisions. It is this manner of thought that we equate with intuition. Kahneman showed us how intuitive thinking can give rise to a range of systematic errors, referred to as biases. In this new book, he has teamed up with Olivier Sibony and Cass Sunstein to talk about another source of error in judgment, referred to as noise.

The authors state that error in judgment arises from a combination of bias and noise. Noise in judgment is defined as unwanted variability, and we are told that this is a more pervasive problem than bias. The book describes various studies of noise that researchers have conducted over several decades, including the notable contribution of Marvin Frankel, a US judge who was outraged by the variability of criminal sentencing in the American legal system. The authors contend, however, that the topic of noise has tended to be overshadowed by the topic of bias. Specifically:

[I]n public conversations about human error and in organizations all over the world, noise is rarely recognized. Noise is a bit player, usually offstage. The topic of bias has been discussed in thousands of scientific articles and dozens of popular books, few of which even mention the issue of noise. This book is our attempt to redress the balance (p.6).

Noise can be observed and measured even where a ‘right’ answer may not exist or cannot be verified. There is no objective standard for assessing whether a movie is ‘good’, for example, but because most professional critics give a numerical rating we can see the extent to which they agree or disagree with each other. There may be little consequence to variability in the judgments of film critics, but in many domains we would hope for high levels of consistency. For example, if any of us were to find ourselves the defendant in a court case, we would rightfully expect that the fairness of the outcome should not depend upon which judge happens to be hearing the case that day. Regrettably, the evidence reported by the authors indicates that noise pervades the legal system and many other areas of life. They note that noisy judgments have been observed in medicine, child custody decisions, professional forecasting, asylum decisions, personnel decisions, bail decisions, forensic science, and patent decisions. 

In Chapter 6, the authors describe different types of noise. The example of a legal defendant obtaining a different outcome depending on which judge handles the case is an illustration of system noise. Observations of court-room sentencing have always suggested that judges vary in the way they treat similar cases, a conclusion which is supported by controlled research. A study published in 1981 presented 208 US Federal judges with the same 16 cases and asked them to set a sentence for each. Sure enough, wide variation was observed in sentencing. There is of course no way of knowing what the ‘right’ sentence is, and while it is tempting to suggest that the average sentence for a case represents the ‘right’ sentence, the average may also reflect the existence of bias (e.g. racial discrimination in sentencing).

System noise is itself the product of two other distinct forms of noise. One of these is level noise. In the case of courtroom judges this would represent the tendency of some judges to be more severe than others. The other contribution to system noise comes from pattern noise. This occurs when a judge treats certain types of case more severely than other types (a judge x case interaction). As the authors put it:

One judge, for instance, may be harsher than average in general but relatively more lenient toward white-collar criminals. Another may be inclined to punish lightly but more severely when the offender is a recidivist.

Another type of noise arises when the same individual makes different judgments about the same information when it is encountered at different times. Such within-person variability is referred to as occasion noise. Logically, if a person is operating as part of a system, then occasion noise must also contribute to system noise, but this is rather difficult to tease apart. Occasion noise has been widely studied and arises from numerous factors, such as variation in mood, stress, fatigue, and even changes in the weather. Contextual information can also have an impact: in the US, a judge who has just granted asylum to the previous two applicants is less likely to grant it to the next applicant.

The authors propose a range of remedies for the problem of noisy judgments, which they class under the umbrella heading of decision hygiene. Any organisation concerned about the issue of noise in judgment, they suggest, should conduct a noise audit in order to determine the extent to which they are affected (an appendix provides guidance for how to go about this). The first principle of decision hygiene is that “The goal of judgment is accuracy, not individual expression”. Statistical models have been found to generally outperform human judges on repeated decisions, including models that were created from analyses of human judges. This has been known for a long time, though the advent of machine learning has given even greater scope for the application of such models. The great advantage of statistical models is that they are free from occasion noise, although there is a danger that models based on human judgment will incorporate societal biases (e.g. racial discrimination). There is some discussion about the problem of bias in AI systems, though the authors seem largely unconcerned. This was a real issue for me. I found their rather casual dismissal of bias to be hand-wavy and unconvincing.

However, acknowledging the fact that people often resist challenges to their autonomy, the authors suggest that some situations – such as job interviews – may best benefit from being structured, having options rated, and for those ratings to be the input for a discussion among decision makers rather than as the input to an algorithm. 

A second principle is that judges should “think statistically and take the outside view of the case”. Thinking about how a current situation might be similar to other situations that have been encountered can help root thinking in statistical reality, and avoid excessive optimism.

Thirdly, judges should “structure judgments into several independent tasks”. This has always been a basic principle of decision analysis. People’s limited cognitive capacities are better able to manage a series of smaller problems than one big, complex problem. Kahneman et al describe a specific procedure for organisational decision making, which they call the mediating assessments protocol.

A fourth principle is to “avoid premature intuitions”. In Chapter 20 the authors provide an alarming description of how the forensic analysis of fingerprints can be biased in the US legal system. Whenever a laboratory links the partial fingerprint from a crime scene to the complete fingerprint of a suspect, a second laboratory is asked to carry out the same analysis. Unfortunately, the second laboratory knows that it is only being asked to do the analysis because another laboratory made an identification, hence they are potentially biased at the outset.

The book finishes with a comparison of rules and standards as a means for regulating behaviour. Rules have clear-cut criteria (“Do not exceed the speed limit”), though as noted earlier they can also be biased. Standards, on the other hand, allow for the exercise of discretion (“Drive carefully”). Standards are often adopted because it can be difficult to get people to agree on the precise criteria for rules. However, the more open-ended the language used in a standard is, the more judgment is needed and the more that noise is likely to creep in. The authors give the example of Facebook’s Community Standards, which are meant to determine what is and isn’t acceptable online content. When first introduced, precisely because there were thousands of Facebook reviewers working according to standards, they ended up making decisions that were highly variable.To address this problem, Facebook created a non-public document for its reviewers called the Implementation Standards, which – for example – included graphic images to depict what it meant by “glorifying violence”. In so doing, they basically created a set of rules to underpin their public standards.

There appears to be no clear-cut way to determine whether a rule or a standard should be used, and the authors suggest that at a first approximation any organisation needs to consider the costs of decisions and the costs of errors. Creating a rule can be difficult, but applying a rule in a decision situation is relatively easy. Conversely, creating a standard is easier, but where a person has to make many decisions the need to be continually exercising judgment can be quite a burden. The authors suggest that the costs of errors depend on “whether agents are knowledgeable and reliable, and whether they practice decision hygiene”. Where agents can be trusted in this regard, then a standard might work well, but otherwise a rule may be more appropriate.

That is the book in summary, then. With three co-authors you might wonder how stylistically consistent the book would be, but I found it to be remarkably consistent, with no obvious clue as to who did what. However, there is also quite a bit of repetition and a more rigorous editing process could have cut down the length substantially. Overall, though, I  found the book to be quite engaging, much more so than Thinking Fast and Slow, which I found rather hard-going (I didn’t manage to finish that book, although I was familiar with most of the content anyway). 

There has been some academic sniping over Noise, though I don’t think it’s very interesting for a review to begin reviewing the other reviewers (one highly critical review with links to other critical reviews can be found here). Some of the criticism, in my view, is overstated and there is a sense of people trying to cut down one of the “tall poppies” in the field. Nonetheless, one of the reasons that Kahneman, in particular, has become something of a target is because a number of weaknesses have been identified in his previous book, Thinking Fast and Slow. Kahneman was perhaps unfortunate to have published his best-seller in the same year in which one well-known psychologist was revealed to have fabricated data in many studies, and in which one of the most controversial papers in psychology appeared, a paper which has prompted a great deal of soul-searching within the discipline. It transpires that for a long time, a range of questionable research practices (QRPs) have been used in psychology (and, to be fair, in other disciplines, though not to the same degree). As a result of this introspection, it turns out that Kahneman’s book contains many studies which have either failed to be replicated by other researchers or which are severely “underpowered” (too few participants), meaning that there is a good chance they would not be replicated. The implicit priming studies featured in Chapter 4 of Thinking Fast and Slow are particularly problematic, and a critique can be read here. A broader critique can be found here

Kahneman has not (yet) revised Thinking Fast and Slow to address the problems identified, and the millions of non-psychologist readers are unlikely to be aware that there are any problems. Those who are aware of the problems identified in psychological research will justifiably wonder about the validity of the studies reported in Noise. I have no doubt that noise exists, but to what extent and are the psychological explanations correct? One widely-cited study reported in Noise found that the parole decisions of experienced US judges became increasingly unfavourable the further they got into a session, with about 65% of decisions being favourable at the start of a session and none at the end of a session. Immediately following a break for food, favourable decisions predominated once more before going into a gradual decline again. Whereas most psychological effects are no more than modest in size, this one was substantial. Not reported by Kahneman and colleagues is the fact that this finding has been the subject of some contention. 

One response suggested that the results could be explained by the non-random ordering of cases – prisoners without legal representation tend to have their cases heard later in the session; although the original researchers argued that including representation in their analysis did not change the results. It has also been claimed that the “hungry judge” effect arises from the sensible planning of rational judges: judges tend to end a session when they foresee that the next case is likely to take a long time, and longer cases are more likely to result in favourable outcomes. If correct, then this account would suggest that the case for noise in this instance has been overstated and the supposed reason is false. Finally, the wider concept of “ego-depletion”, upon which the original hungry judge finding rests has itself been called into question.

In conclusion, Noise is somewhat overlong and repetitive, but I think the breakdown of different types of noise is very interesting. There are some potentially useful suggestions for minimising noise, though the authors gloss over concerns about bias in AI-driven decisions. Also, the idea of a noise audit for organisations sounds quite bureaucratic (though potentially a money-spinner for consultants), so presumably ought to be considered only by organisations where noise is a major concern. A healthy skepticism about the psychological research is advised.

[Note: I made a sight edit for clarity to the “hungry judge” section – 12.30pm, 9th August 2021]

Review – Meltdown: Why Our Systems Fail and What We Can Do About It

screenshot 2019-01-26 at 15.14.17In the opening chapter of Meltdown, the authors Chris Clearfield and András Tilcsik describe the series of events that led to a near-disaster at the Three Mile Island nuclear facility in the United States. The initiating event was relatively minor, and occurred during routine maintenance, but as problems began to multiply the operators were confused. They could not see first-hand what was happening and were reliant on readouts from individual instruments, which did not show the whole story and which were open to misinterpretation.

The official investigation sought to blame the plant staff, but sociology professor Charles “Chick” Perrow argued that the incident was a system problem. The author of Normal Accidents: Living With High Risk Technologies, Perrow states that systems can be characterised along two dimensions: complexity and coupling. Complex systems have many interacting parts and frequently the components are invisible to the operators. Tightly-coupled systems are those in which there is little or no redundancy or slack. A perturbation in one component may have multiple knock-on effects. Perrow argues that catastrophic failures tend to be those where there is a combination of high complexity and tight coupling. His analysis forms the explanatory basis for many of the calamities described in Meltdown. Not all of these are life-threatening. Some are merely major corporate embarrassments, such as when Price-Waterhouse Cooper cocked up the award for the Best Picture at the 89th Academy Awards. Others nonetheless had a big impact on ordinary people, such as the problems with the UK Post Office’s Horizon software system, which led to many sub-postmasters being accused of theft, fraud and false accounting. Then there are the truly lethal events, such as the Deepwater Horizon oil rig explosion. Ironically, it is often the safety systems themselves that are the source of trouble. Perrow is quoted as saying that “safety systems are the biggest single source of catastrophic failure in complex tightly-coupled systems”.

The second half of Meltdown is devoted to describing some of the ways in which we can reduce the likelihood of things going wrong. These include Gary Klein’s idea of the premortem. When projects are being planned, people tend to be focused on how things are going to work, which can lead to excessive optimism. Only when things go wrong do the inherent problems start to appear obvious (hindsight bias). Klein suggests that planners envisage a point in time after their project has been implemented, and imagine that it has been a total disaster. Their task is to write down the reasons why it has all gone so wrong. By engaging in such an exercise, planners are forced to think about things that might not otherwise have come to mind, to find ways to address potential problems, and to develop more realistic timelines.

Clearfield and Tilcsik also discuss ways to improve operators’ mental models of the systems they are using, as well as the use of confidential reporting systems for problems and near-misses.

They devote several chapters to the important topic of allowing dissenting voices to speak openly about their concerns. There is ample evidence that lack of diversity in teams, including corporate boards, has a detrimental effect on quality of discussion. Appointing the “best people for the job” may not be such a great idea if the best people are all the same kind of people. One study found that American community banks were more likely to fail during periods of uncertainty when they had higher proportions of banking experts on their boards. It seems that these experts were overreliant on their previous experiences, were overconfident, and – most importantly – were over-respectful of each others’ opinions. Moreover, domination by banking experts made it harder for challenges to be raised by the non-bankers on the boards. However, where there were higher numbers of non-bankers, the bankers more often had to explain issues in more detail and their opinions were more often challenged.

Other research shows that both gender and ethnic diversity are important, too. An experimental study of stock trading, involving simulations, found that ethnically homogeneous groups of traders tended to copy each other, including each others’ mistakes, resulting in poorer performance. Where groups were more diverse, people were just generally more skeptical in their thinking and therefore more accurate overall. Another study found that companies were less likely to have to issue financial restatements (corrections owing to error or fraud) where there was at least one woman director on the board.

Clearfield and Tilcsik argue that the potential for catastrophe is changing as technologies develop. Systems which previously were not both complex and tightly-coupled are increasingly becoming so. This can of course result in great performance benefits, but may also increase the likelihood that any accidents that do occur will be catastrophic ones.

Meltdown has deservedly received a lot of praise since its publication last year. The examples it describes are fascinating, the explanations are clear, and the proposed solutions (although not magic bullets) deserve attention. Writing in the Financial Times, Andrew Hill cited Meltdown when talking about last year’s UK railway timetable chaos, saying that “organisations must give more of a voice to their naysayers”. The World Economic Forum’s Global Risks Report 2019 carries a short piece by Tilcsik and Clearfield, titled Managing in the Age of Meltdowns.

I highly recommend this excellent book.

Review – Lab Rats: Why Modern Work Makes People Miserable

screenshot 2019-01-19 at 12.55.27When scientists develop new antidepressant drugs they first administer them to rats. This means initially inducing depression in the poor rodents. No physical pain is involved. Rather, for a prolonged period the animals experience unpredictable negative changes to their environment, such as wet bedding, dirty sawdust, the sounds of predators, or changes to the cycle of light and dark. Eventually, the rats slide into an apathetic state, ceasing to groom themselves or build nests, and not bothering to use their exercise wheels.

In Lab Rats, a book that is by turns funny and frightening, Dan Lyons likens the plight of modern workers to that of these experimental rats. Constant change, whether it be office layout, new technologies or new methodologies, is producing a workforce that is increasingly stressed, depressed, and sometimes suicidal. Three other factors contribute to decreasing satisfaction with work. The first factor is money; or, specifically, the lack of it. Over the last few decades the incomes of ordinary workers have fallen, whilst those of the wealthiest have boomed. Secondly, workers are increasingly insecure. The third factor is dehumanization, whereby people are increasingly used by technology, rather than vice versa.

According to Lyons, there are two key reasons why modern work has become so much worse. The first is the shift from stakeholder capitalism to shareholder capitalism. For much of the twentieth century, company executives often recognised that they had responsibilities to employees and the wider community, as well as to investors. However, a significant change in attitude occurred when Milton Friedman, a University of Chicago economist (later to be awarded the Nobel prize), promoted the idea that the only responsibility that companies had was towards their shareholders. Aided by anti-trades union legislation, Friedman’s ideas led to a more ruthless form of capitalism, in which jobs were cut or moved abroad, wages slashed, and work frequently outsourced to the lowest bidder. The gig economy developed, in which organisations assembled a body of contract employees, people who were in fact often classed as “self-employed” so that they didn’t have to be awarded the kinds of benefits, such as paid holidays and sick pay, enjoyed by regular employees. The development of the Internet served to speed up these processes.

The second key factor identified by Lyons is the rise of Silicon Valley. Indeed, in the American edition of Lab Rats, Silicon Valley is explicitly identified in the subtitle. Once upon a time, Silicon Valley was full of hippies who grew up in the counter-culture of the 1960s. Companies like Hewlett Packard were a model of how to treat employees well. However, with the advent of shareholder capitalism the hippies were replaced by ruthless oligarchs (e.g. Jeff Bezos, Mark Zuckerberg, Travis Kalanick and Elon Musk), an army of wannabes desperate to get rich quick, and a bunch of venture capitalists who hold out the hope that this can be achieved.

The lack of morality in modern-day Silicon Valley is surely best exemplified by the following example. The rise of the tech oligarchs and their billion-dollar campuses, such as the Googleplex and Apple’s space-ship campus, have pushed up housing prices so far that these tech installations are now fringed by neighbourhoods where people live in camper vans, tents, or simply on the sidewalks. In 2016:

a bunch of rich techies came up with their own solution, sponsoring a ballot proposition that would let police forcibly remove homeless people from the sidewalks. Homeless people would get twenty-four hours to either move to a shelter or get a bus ticket out of town. If they didn’t comply, the cops could seize their tents and belongings. (p.36)

The proposition was passed.

But back to the venture capitalists. In Lyons’s words, the Valley has become a “casino”. The ambition of the modern ‘techie’ is to create a start-up business that attracts sufficient money from venture capitalists such that they are able to get rich by floating the business on the stock market (essentially, getting taken over by other monied interests). Lyons refers to these business start-ups as “unicorns”. Along the way, these businesses typically lose heaps of money, which is why their employees are treated so poorly. But no matter – if all goes to plan, the start-up bosses flog off their outfits, then write a best-selling book about how to run a ‘disruptive’ company. Outside of Silicon Valley, many CEOs – fearful that their organisation is at risk of stagnating in the new economy – lap up this guidance on how to do things a new way. Yet, to quote Lyons:

Silicon Valley has no fountain of youth. Unicorns do not possess any secret management wisdom. Most start-ups are terribly managed, half-assed outfits run by buffoons and bozos and frat boys, and funded by amoral investors who are only hoping to flip the company into the public markets and make a quick buck. They have no operations expertise, no special insight into organizational behavior (p.45)

Why is it that CEOs are so ready to seek out the guidance of business gurus? It seems that the simple truth is that no-one really knows how to run a big company. Lyons writes:

The business world has a seemingly insatiable appetite for management gurus. You probably can’t blame CEOs. It may be that no human is really smart enough to run something as vast and complex as a corporation. Yet someone has to do it. Clinging to a system, any system, at least provides the illusion of structure. The system also gives the boss something to blame when things go wrong. Managers grasp at systems the way that drowning people reach for life jackets.

Indeed, although Lyons doesn’t mention this, even the Harvard Business Review has previously noted that CEOs who run highly successful corporations frequently fail to repeat that success when they move to another company (with correspondingly vast salary). It is as though success occurs despite the CEOs’ presence rather than because of it.

Lyons traces a rough history of business systems, beginning with the work of “shameless fraud” Frederick Taylor (“He fudged his numbers. He cheated and lied”) in the 1890s. Taylor claimed to have devised a scientific method to optimise the efficiency of any process. In reality, he ramped up the quotas until staff began to leave. Taylor was subsequently fired from the company where he did his “research”. Despite his work being thoroughly debunked, Taylorism became almost a religion. Since Taylor, we have had Peter Drucker (who coined the term “knowledge worker”), Michael Porter, Jim Collins, and countless others. The business fads that we’ve been sold include the ‘Five Forces Framework’, ‘Six Sigma’, Lean Manufacturing’, ‘Lean Startup’ and ‘Agile’.

Some space is devoted to description and discussion of ‘Agile’, which is perhaps the most recent fad to be widely adopted. In 2001 a group of software developers authored a ‘Manifesto for Agile Software Development’, an idea that was subsequently pounced on by others and expanded far beyond its original domain of application. Lyons describes the business application of Agile as:

a management fad that has swept the corporate world and morphed into what some call a movement but is more like widespread mental illness (p.55)

Like other fads, Agile is really just another version of Taylorism. All these ideas basically boil down to trying to do more with fewer people for less money. The authors of the original Agile manifesto have sought to distance themselves from what Agile has become, saying they can no longer make sense of it.

Sadly, one particular group of workers may find themselves particularly at risk:

The pressure is extra high on older workers, who are experienced enough to realize that this is bullshit, and that Agile usually fails, but wise enough to realize that the real point of Agile may be to create an excuse to fire older workers, and so the smart thing to do is to shut up and go along. (p.59)

Lyons does hold out some hope for better things, and the final section of his book is called “The No-Shit-Sherlock School of Management”. He points out that Fortune magazine’s list of ‘Legends’ are companies that are incredibly successful and treat their employees exceptionally well. Elsewhere, a number of businesses have sprung up in Silicon Valley that are reacting against the spread of shareholder capitalism by not only treating their staff well, but doing good for the community. One venture capital firm, Kapor Capital, engages only in “gap-closing” investing, putting money into companies that are “serving low-income communities and/or communities of color to close gaps of access, opportunity, or outcome” (p.201).

It also seems that increasing numbers of students are drawn towards business courses that have more of a social emphasis. Elsewhere, workers have rediscovered the value of becoming organised. For example, Google employees succeeded in getting the company to abandon its involvement in a military drone programme, and a number of gig economy workers have successfully organised to challenge their working conditions and contracts.

Lab Rats is an eminently readable book that will both amuse and horrify. To be sure, Dan Lyons’s emphasis is on reforming capitalism, which may seem a little optimistic for some on the left. Indeed, I felt possibly he might have been viewing the pre-Friedman economic era through slightly rose-tinted spectacles. Also, whilst he holds up Starbucks as a company that treats their employees very well, he omits to mention that they have also been criticised for using legal mechanisms to minimise the tax they pay in the countries where they operate (Lyons does criticise Apple for the same thing). However, CEOs who are looking for a business reason to treat their employees well might note the work of the psychologist Dan Ariely, who found that companies where people felt physically and emotionally safe tended to outperform the stock market.