A big Noise (review)

Noise: A flaw in human judgment

by Daniel Kahneman, Olivier Sibony and Cass R. Sunstein.

Published by William Collins, 2021.

By now, many people are familiar with Nobel prize winner Daniel Kahneman’s previous book Thinking Fast and Slow, in which he popularised the idea that rapid unconscious thought processes underlie many of our judgments and decisions. It is this manner of thought that we equate with intuition. Kahneman showed us how intuitive thinking can give rise to a range of systematic errors, referred to as biases. In this new book, he has teamed up with Olivier Sibony and Cass Sunstein to talk about another source of error in judgment, referred to as noise.

The authors state that error in judgment arises from a combination of bias and noise. Noise in judgment is defined as unwanted variability, and we are told that this is a more pervasive problem than bias. The book describes various studies of noise that researchers have conducted over several decades, including the notable contribution of Marvin Frankel, a US judge who was outraged by the variability of criminal sentencing in the American legal system. The authors contend, however, that the topic of noise has tended to be overshadowed by the topic of bias. Specifically:

[I]n public conversations about human error and in organizations all over the world, noise is rarely recognized. Noise is a bit player, usually offstage. The topic of bias has been discussed in thousands of scientific articles and dozens of popular books, few of which even mention the issue of noise. This book is our attempt to redress the balance (p.6).

Noise can be observed and measured even where a ‘right’ answer may not exist or cannot be verified. There is no objective standard for assessing whether a movie is ‘good’, for example, but because most professional critics give a numerical rating we can see the extent to which they agree or disagree with each other. There may be little consequence to variability in the judgments of film critics, but in many domains we would hope for high levels of consistency. For example, if any of us were to find ourselves the defendant in a court case, we would rightfully expect that the fairness of the outcome should not depend upon which judge happens to be hearing the case that day. Regrettably, the evidence reported by the authors indicates that noise pervades the legal system and many other areas of life. They note that noisy judgments have been observed in medicine, child custody decisions, professional forecasting, asylum decisions, personnel decisions, bail decisions, forensic science, and patent decisions. 

In Chapter 6, the authors describe different types of noise. The example of a legal defendant obtaining a different outcome depending on which judge handles the case is an illustration of system noise. Observations of court-room sentencing have always suggested that judges vary in the way they treat similar cases, a conclusion which is supported by controlled research. A study published in 1981 presented 208 US Federal judges with the same 16 cases and asked them to set a sentence for each. Sure enough, wide variation was observed in sentencing. There is of course no way of knowing what the ‘right’ sentence is, and while it is tempting to suggest that the average sentence for a case represents the ‘right’ sentence, the average may also reflect the existence of bias (e.g. racial discrimination in sentencing).

System noise is itself the product of two other distinct forms of noise. One of these is level noise. In the case of courtroom judges this would represent the tendency of some judges to be more severe than others. The other contribution to system noise comes from pattern noise. This occurs when a judge treats certain types of case more severely than other types (a judge x case interaction). As the authors put it:

One judge, for instance, may be harsher than average in general but relatively more lenient toward white-collar criminals. Another may be inclined to punish lightly but more severely when the offender is a recidivist.

Another type of noise arises when the same individual makes different judgments about the same information when it is encountered at different times. Such within-person variability is referred to as occasion noise. Logically, if a person is operating as part of a system, then occasion noise must also contribute to system noise, but this is rather difficult to tease apart. Occasion noise has been widely studied and arises from numerous factors, such as variation in mood, stress, fatigue, and even changes in the weather. Contextual information can also have an impact: in the US, a judge who has just granted asylum to the previous two applicants is less likely to grant it to the next applicant.

The authors propose a range of remedies for the problem of noisy judgments, which they class under the umbrella heading of decision hygiene. Any organisation concerned about the issue of noise in judgment, they suggest, should conduct a noise audit in order to determine the extent to which they are affected (an appendix provides guidance for how to go about this). The first principle of decision hygiene is that “The goal of judgment is accuracy, not individual expression”. Statistical models have been found to generally outperform human judges on repeated decisions, including models that were created from analyses of human judges. This has been known for a long time, though the advent of machine learning has given even greater scope for the application of such models. The great advantage of statistical models is that they are free from occasion noise, although there is a danger that models based on human judgment will incorporate societal biases (e.g. racial discrimination). There is some discussion about the problem of bias in AI systems, though the authors seem largely unconcerned. This was a real issue for me. I found their rather casual dismissal of bias to be hand-wavy and unconvincing.

However, acknowledging the fact that people often resist challenges to their autonomy, the authors suggest that some situations – such as job interviews – may best benefit from being structured, having options rated, and for those ratings to be the input for a discussion among decision makers rather than as the input to an algorithm. 

A second principle is that judges should “think statistically and take the outside view of the case”. Thinking about how a current situation might be similar to other situations that have been encountered can help root thinking in statistical reality, and avoid excessive optimism.

Thirdly, judges should “structure judgments into several independent tasks”. This has always been a basic principle of decision analysis. People’s limited cognitive capacities are better able to manage a series of smaller problems than one big, complex problem. Kahneman et al describe a specific procedure for organisational decision making, which they call the mediating assessments protocol.

A fourth principle is to “avoid premature intuitions”. In Chapter 20 the authors provide an alarming description of how the forensic analysis of fingerprints can be biased in the US legal system. Whenever a laboratory links the partial fingerprint from a crime scene to the complete fingerprint of a suspect, a second laboratory is asked to carry out the same analysis. Unfortunately, the second laboratory knows that it is only being asked to do the analysis because another laboratory made an identification, hence they are potentially biased at the outset.

The book finishes with a comparison of rules and standards as a means for regulating behaviour. Rules have clear-cut criteria (“Do not exceed the speed limit”), though as noted earlier they can also be biased. Standards, on the other hand, allow for the exercise of discretion (“Drive carefully”). Standards are often adopted because it can be difficult to get people to agree on the precise criteria for rules. However, the more open-ended the language used in a standard is, the more judgment is needed and the more that noise is likely to creep in. The authors give the example of Facebook’s Community Standards, which are meant to determine what is and isn’t acceptable online content. When first introduced, precisely because there were thousands of Facebook reviewers working according to standards, they ended up making decisions that were highly variable.To address this problem, Facebook created a non-public document for its reviewers called the Implementation Standards, which – for example – included graphic images to depict what it meant by “glorifying violence”. In so doing, they basically created a set of rules to underpin their public standards.

There appears to be no clear-cut way to determine whether a rule or a standard should be used, and the authors suggest that at a first approximation any organisation needs to consider the costs of decisions and the costs of errors. Creating a rule can be difficult, but applying a rule in a decision situation is relatively easy. Conversely, creating a standard is easier, but where a person has to make many decisions the need to be continually exercising judgment can be quite a burden. The authors suggest that the costs of errors depend on “whether agents are knowledgeable and reliable, and whether they practice decision hygiene”. Where agents can be trusted in this regard, then a standard might work well, but otherwise a rule may be more appropriate.

That is the book in summary, then. With three co-authors you might wonder how stylistically consistent the book would be, but I found it to be remarkably consistent, with no obvious clue as to who did what. However, there is also quite a bit of repetition and a more rigorous editing process could have cut down the length substantially. Overall, though, I  found the book to be quite engaging, much more so than Thinking Fast and Slow, which I found rather hard-going (I didn’t manage to finish that book, although I was familiar with most of the content anyway). 

There has been some academic sniping over Noise, though I don’t think it’s very interesting for a review to begin reviewing the other reviewers (one highly critical review with links to other critical reviews can be found here). Some of the criticism, in my view, is overstated and there is a sense of people trying to cut down one of the “tall poppies” in the field. Nonetheless, one of the reasons that Kahneman, in particular, has become something of a target is because a number of weaknesses have been identified in his previous book, Thinking Fast and Slow. Kahneman was perhaps unfortunate to have published his best-seller in the same year in which one well-known psychologist was revealed to have fabricated data in many studies, and in which one of the most controversial papers in psychology appeared, a paper which has prompted a great deal of soul-searching within the discipline. It transpires that for a long time, a range of questionable research practices (QRPs) have been used in psychology (and, to be fair, in other disciplines, though not to the same degree). As a result of this introspection, it turns out that Kahneman’s book contains many studies which have either failed to be replicated by other researchers or which are severely “underpowered” (too few participants), meaning that there is a good chance they would not be replicated. The implicit priming studies featured in Chapter 4 of Thinking Fast and Slow are particularly problematic, and a critique can be read here. A broader critique can be found here

Kahneman has not (yet) revised Thinking Fast and Slow to address the problems identified, and the millions of non-psychologist readers are unlikely to be aware that there are any problems. Those who are aware of the problems identified in psychological research will justifiably wonder about the validity of the studies reported in Noise. I have no doubt that noise exists, but to what extent and are the psychological explanations correct? One widely-cited study reported in Noise found that the parole decisions of experienced US judges became increasingly unfavourable the further they got into a session, with about 65% of decisions being favourable at the start of a session and none at the end of a session. Immediately following a break for food, favourable decisions predominated once more before going into a gradual decline again. Whereas most psychological effects are no more than modest in size, this one was substantial. Not reported by Kahneman and colleagues is the fact that this finding has been the subject of some contention. 

One response suggested that the results could be explained by the non-random ordering of cases – prisoners without legal representation tend to have their cases heard later in the session; although the original researchers argued that including representation in their analysis did not change the results. It has also been claimed that the “hungry judge” effect arises from the sensible planning of rational judges: judges tend to end a session when they foresee that the next case is likely to take a long time, and longer cases are more likely to result in favourable outcomes. If correct, then this account would suggest that the case for noise in this instance has been overstated and the supposed reason is false. Finally, the wider concept of “ego-depletion”, upon which the original hungry judge finding rests has itself been called into question.

In conclusion, Noise is somewhat overlong and repetitive, but I think the breakdown of different types of noise is very interesting. There are some potentially useful suggestions for minimising noise, though the authors gloss over concerns about bias in AI-driven decisions. Also, the idea of a noise audit for organisations sounds quite bureaucratic (though potentially a money-spinner for consultants), so presumably ought to be considered only by organisations where noise is a major concern. A healthy skepticism about the psychological research is advised.

[Note: I made a sight edit for clarity to the “hungry judge” section – 12.30pm, 9th August 2021]

The Dirty War on the NHS – a new documentary by John Pilger

undefined

The desperate plight of the National Health Service, much of which has already been privatised, is demonstrated by the terrible case of Trevor Moncrieff. As described in John Pilger’s latest documentary, Mr. Moncrieff, a councillor, had a heart attack in 2018. The ambulance that arrived to attend him was one of many “NHS” vehicles that is actually part of a private fleet. It was carrying a second-hand defibrillator that had not recently been tested, and which failed to work. Mr. Moncrieff’s son Matt attempted, unsuccessfully, to provide CPR whilst the paramedics struggled with the equipment. The paramedics had not been given radios, so tried to reach their dispatcher by phone, but ended up leaving a message with a call centre. Eventually, they advised Matt Moncrieff to call the fire service, as they were likely to have a defibrillator. It was too late: Trevor Moncrieff could not be revived.

This instance belies the notion that private enterprise is always more efficient than a publicly-owned service. Indeed, the prime objective for private commercial ventures is to benefit their shareholders, which means cutting costs wherever possible. In 2010, Hinchingbrooke hospital was sold off to the Circle Health group, founded by a former Goldman Sachs executive, Ali Parsa. The hospital quickly began to accrue an increasing deficit, which only extreme savings could hope to address. Whereas hospital staff had always worked in cooperation, as would be expected for such a vocation as healthcare, now the lead nurses were expected to compete with each other to get the patients out of the hospital as fast as possible. Morale plummeted, and a concerning report by the National Audit Office was followed by a damning one from the Care Quality Commission. In 2015, Circle Health withdrew from the contract, handing back control to the NHS.

Ali Parsa went on to found Babylon Health, which offers a NHS-funded chatbot GP service. In 2018, the Labour Party complained that Health Secretary Matt Hancock had breached the ministerial code when he praised Babylon’s app.

Meanwhile, Pilger visits the United States to show us the harsh consequences of a fully-privatised healthcare system. Healthcare costs are the leading cause of bankruptcy in the US, where swingeing excess charges often make insurance useless for those who can afford it in the first place. People whose financial means run out whilst they are in hospital often then find themselves the victims of “patient dumping”. During the hours of darkness, some hospitals escort penniless patients out into the street and leave them there, perhaps outside the doors of a homelessness shelter if the patient is lucky. Pilger notes that we already have a form of patient dumping in the UK, in that people with mental health problems that are not severe enough to meet certain criteria for treatment can find themselves out on the street.

He describes the creation of the NHS as a “revolutionary moment”. At the end of the second world war, the public – and especially returning servicemen – were expecting something better than they had previously been used to. Some servicemen even went on strike. The decision to set up a National Health Service was opposed by the Conservatives, but they were out of touch with the public mood, and we see footage of Churchill being barracked by an audience at an election hustings in the north of England. Several decades later, however, influenced by a growing movement of American free marketeers, Oliver Letwin and John Redwood were developing plans to open up the NHS to market competition, plans that were then accelerated under Blair’s New Labour. However, the PFI (Private Finance Initiative) deals that were arranged burdened hospitals with huge debts. Still, Health Secretary Alan Milburn went on to join Bridgepoint Capital, a venture capital firm that specialises in financing private healthcare enterprises. Simon Stevens, health advisor under the Blair government, went on to join the American healthcare provider United Health, before returning under the Conservative/Lib-Dem coalition to run the NHS.

What is clear from Pilger’s film is that the worsening plight of the NHS is not because it is an inefficient public service, but precisely the opposite: it is being deliberately starved of public funding in the hope that the public will be deceived into believing that further marketisation is what’s required, yet it is quite clear that privatisation only works for shareholders, not for patients. But now, in the 2019 General Election, we finally have a chance to turn things around. Interviewed towards the end of the film, Shadow Health Secretary Jonathan Ashworth says that the Labour Party will reverse the trend towards privatisation – basically, re-nationalising the NHS. Following this, Pilger informs us that he has spent six months trying to arrange interviews with the politicians currently in charge of the NHS, as well as David Cameron and Alan Milburn; none of them responded.

Moby-Dick

To mark the 200th anniversary of Herman Melville’s birth (1st August 1819), the Guardian online carried an article by Philip Hoare about the author’s masterpiece, Moby-Dick. Hoare described it as “the Mount Everest” of literature, as many people apparently start but fail to finish the book. Having done this myself many years ago, Hoare’s article spurred me to revisit Moby-Dick with a refreshed determination to read it through to the end. It took me about three weeks to finish and I enjoyed it hugely. It is a very unusual book that raises many questions, not the least of which is ‘What is it all about?’ Presumably this is why the book was not successful during Melville’s lifetime. As with so many great works of art, though, the ambiguities, oddities and uncertainties are what give it its longevity, as people keep returning to unpick its mysteries. In this blog post, I give my own reflections on Moby-Dick (which are those of an enthusiastic reader, not of an academic expert in English literature).

I’ll begin with a couple of short, simple observations that might be of interest to people who have never so much as glanced at Moby-Dick. First, although it is a long book most of the chapters are very short, some less than a page. This makes it easy to read over a series of short intervals, such as on a rail commute or during your lunch breaks at work, without having to abandon the text in the middle of a long section. Second, if you are expecting a thrilling adventure story you are likely to be disappointed. Perhaps this is why readers often don’t make it through to the end; they may be expecting a different kind of tale. There is action, but mostly towards the end of the book. The notion of the whale-hunt really just seems to be a device – the “MacGuffin”, as Hitchcock called it – to motivate the characters, and thereby allow certain themes to be explored.

The first third of the book consists largely of an introduction to characters and locations, with the first chapter and its famous opening sentence – “Call me Ishmael” – being about our narrator himself. Ishmael’s motivation for going to sea appears to be boredom: “It is a way I have of driving off the spleen, and regulating the circulation”. He rocks up at the Spouter Inn, at New Bedford, where lack of spare accommodation means he has little alternative but to share a bed with Queequeg, a South Sea chieftan who has left his home to explore the world. Queequeg is an experienced harpooneer for whaling ships. Initially rather afraid of “The Pagan”, Ishmael begins to grow close to him in what seems a quite romantic fashion:

“I began to be sensible of strange feelings. I felt a melting in me. No more my splintered heart and maddened hand were turned against the wolfish world. This soothing savage had redeemed it […] Wild he was; a very sight of sights to see; yet I began to feel myself mysteriously drawn towards him”.

Ishmael, a Presbyterian, is invited to join Queequeg in his religious rituals, which he does. Eventually, the two of them depart New Bedford for Nantucket, where they join a whaling ship, the Pequod. Across several chapters we then get introductions to the crew of the ship, notably Captain Ahab; the mates Starbuck (chief mate), Stubb (second mate) and Flask (third mate); and the other harpooneers Tashtego and Dagoo. There is also Pip, the young black cabin boy, who is part prophet and part court jester, especially after he later begins to lose his mind following a period of time alone in the sea.

Chapter 32 is titled ‘Cetology’ and concerns the different types of whales and their classification. Many subsequent chapters are devoted to the physiology of the sperm whale (its head, brain, tale, spout, and so on) and there are even chapters devoted to pictures of whales.

As the Pequod‘s journey progresses, they meet a series of other whaling ships, each of which has had an encounter with Moby-Dick, successively more serious, but without killing the creature. The critical thing to know about Captain Ahab is that he lost a leg, on a previous voyage, to Moby-Dick. He is now obsessed with killing the whale, no matter what the dangers, and is not at all deterred by the reports from the other ships’ captains he meets. Ahab himself does not appear before the crew until several days into the ship’s voyage, thus adding to the air of mystery that surrounds him. This is compounded, on the first occasion that a whale is sighted, by the appearance of previously unseen shipmates: “five dusky phantoms that seemed fresh formed out of air”. Four of these men are of a “tiger-yellow complexion peculiar to some of the aboriginal natives of the Manillas”, while their leader – Fedallah, also referred to as “the Parsee” (a Zoroastrian) – is a dark-skinned man in a white turban. This latter figure is the source of many rumours and is regarded by the crew with deep suspicion, not least because he is the source of some darkly prophetic comments.

What really struck me about the book is the contrast between Ishmael’s wish to understand others, whether they be people or whales, and Ahab’s obsessive pursuit of the whale which precludes any attempt to understand the creature beyond predicting its movements. A good example of Ishmael’s open-mindedness comes when his new friend Queequeg – a “wild idolator” – invites him to join his worship:

“But what is worship? – to do the will of God? – that is worship. And what is the will of God? – to do to my fellow man what I would have my fellow man to do to me – that is the will of God. Now Queequeg is my fellow man. And what do I wish this Queequeg would do to me? Why, unite with me in my particular Presbyterian form of worship. Consequently, I must then unite with him in his; ergo, I must turn idolator”.

Ishmael talks of whaling as a noble activity. Indeed, if he did not believe in whaling it would make no sense for him to be on board the Pequod. However, the numerous chapters that are devoted to understanding all aspects of the sperm whale, and other whales, creatures of no little intelligence, inevitably create an empathy for these extraordinary animals that sits uneasily with the vivid descriptions of them being hunted and harpooned until they expire, exhausted, in waters red with their own blood. In contrast to Ishmael’s empathy, Ahab puts the lives of his own men increasingly at risk, pushing them to the limit even when lives have already been lost, the dangers appear overwhelming, and Starbuck is urging him to give up the pursuit.

The narration of the story is unusual. Ishmael’s own presence as a character in the story is perhaps strongest in the first third of the book, especially where he describes the growing bond between himself and Queequeg. Ishmael himself is an actor within the story in these early chapters, which includes getting tossed into the water at one point during a whale hunt. Elsewhere, though, the narration shifts. When Starbuck is introduced, Ishmael’s narration becomes God-like, telling us about Starbuck’s thoughts. Chapter 37 is entirely Ahab’s thoughts, whilst alone in his cabin. Chapter 38 relates Starbuck’s thoughts, as he leans against the mainmast at night. Chapter 39 is Stubbs’ thoughts, as he performs his duty as first night-watch. Chapter 40 is written in the form of a script, giving us the voices of numerous crew members who are on the forecastle at midnight. After this deviation in narrative voice, Chapter 41 returns us to the main narrator, with the opening sentence “I Ishmael, was one of that crew…”. A little later, at the start of Chapter 45, Ishmael seems to highlight that his is not a conventionally told tale: “So far as what there may be of a narrative in this book…”.

In the last third of the book, although Ishmael continues to narrate, he himself mostly seems to disappear as a character who participates in events. One exception to this is Chapter 94, ‘A Squeeze of the Hand’, in which Ishmael describes his feelings as he bathes his hands in the sperm of the whale. To modern ears much of this chapter seems quite comical, and I wonder if this was how it was meant to read. Philip Hoare’s Guardian article referred to the “queerness” of Moby-Dick, and reference to online dictionaries indicates that the term ‘sperm’ in reference to spermatozoa, as distinct from spermaceti (the waxy substance from the sperm whale), has been in currency since the fourteen century. What then do we make of the lyrical way in which Ishmael rhapsodizes about the act of washing his hands in sperm? –

“I felt divinely free from all ill-will… Squeeze! squeeze! squeeze! all the morning long; I squeezed that sperm till I myself almost melted into it… I found myself unwittingly squeezing my co-labourers’ hands in it… that at last I was continually squeezing their hands, and looking up into their eyes sentimentally… let us squeeze ourselves universally into the very milk and sperm of human kindness”.

Strangely, though, given the emotionality of the early chapters involving Queequeg, when his friend becomes seriously ill, Ishmael does not give any indication that he is especially troubled by this turn of events, beyond stating that all the crew were concerned. This seems a little odd, but then the main motivation of this chapter appears to be to set up certain events that happen later on. Thereafter, as the hunt for Moby-Dick begins to dominate the story and develops into outright action, the focus is on the other characters; although Ishmael is there, we do not know what he is doing. In fact, he largely disappears as an actor until, perhaps, the final page.

For me, one of the great pleasures of Moby-Dick is the quality of the writing, the beauty of the description, whether Melville is describing the anatomy of whales, the layout of the ship, or the characteristics of people. Turning at random to almost any page reveals such penmanship, as in this example:

“The starred and stately nights seemed haughty dames in jewelled velvets, nursing at home in lonely pride, the memory of their absent conquering Earls, the golden helmeted suns! For sleeping man, ’twas hard to choose between such winsome days and such seducing nights” (Chapter 29).

Melville must have been on a creative roll by the time he wrote Moby-Dick and surely had great confidence in what he was doing. It is a shame that, like many of the great artists, his best work was not appreciated in his own lifetime. This is a book that, once finished, really sticks in the mind. Like stepping ashore after a long period at sea and feeling as though the ground beneath you is swaying, so there seems to be a period of mental turbulence upon reaching the end of Moby-Dick, as the thoughts continue to splash around inside your head. It is a memorable literary voyage.

Book review: This Dreaming Isle

In recent years I have become something of an afficionado of short stories, partly thanks to those published in Interzone and Black Static magazines, the former devoted to science fiction and the latter to horror. Anthologies abound, but you would be hard pressed to find a more satisfying horror collection than the seventeen stories published in This Dreaming Isle, edited by Dan Coxon and published by Unsung Stories.

To be specific, the cover blurb describes these as “horror stories and weird fictions” that draw upon “the landscape and history of the British Isles”. Actually, as the singular term of the title suggests, all of the stories are set on the mainland rather than the surrounding isles, but such quibbling aside it is true that these tales have a very British feel to them. In fact, the stories are grouped into three sections: Country, City and Coast. Within these we have a range of familiar physical environments, including angry skies, roiling clouds, rolling hills, a treacherous reservoir, turbulent seas, traffic jams, a lethal industrial development, a smart Kensington apartment, beaches, fossils, cliffs, seagulls and old country houses. The supernatural or strange entities include a legendary nuisance ghost dog, a pre-Christian tribe in the hills near the Kent coast, a kind of human caterpillar, a mysterious Twitter account, a mythical hill-walker, a dead artist who may or may not live within his paintings, and a siren.

Frequently, these landscapes and phenomena (real or imagined) are the backdrop to ordinary human concerns (e.g. a woman escaping a difficult background maneuvres herself into marriage with a financially successful man; a mother and daughter try to repair a difficult relationship by holidaying together; a son finds himself manipulated by a difficult, but sick, father; a right-wing internet troll rages against the world; an aging actor is made to feel young again by a beautiful woman). In the way that a walker in unfamiliar country might start out happily in the sunshine, but then misread their map whilst ignoring or overlooking the signs of the weather turning, eventually realising they have no idea where they are just as the storm breaks, so the sense of unease in many of these stories builds up gradually until the protagonists are literally or metaphorically out of their depth.

I was impressed by the consistently excellent quality of the contributions. Also, there are no “long” short stories, which I assume was due to editorial guidance (the longest, I think, was in the region of twenty pages). For myself, as someone who often likes to read a story before going to sleep, these were of an ideal length: I never once found sleep creeping over me before I reached the end of a story.

The book comes with an interesting introduction by editor, Dan Coxon, who notes that the book was conceived before the referendum on EU membership and with no expectation that the vote would go the way it did. He is at pains to point out that none of the stories in this collection create a nationalistic fantasy of some past golden age of Britain. Rather: “The past is a dangerous, cutthroat place, filled with violence, injustice and inequality”. Quite so.

Highly recommended.

Fear is the key: A review of ‘Why Horror Seduces’ (by Mathias Clasen)

Screenshot 2019-03-30 at 4.05.46 PMLooking back, I think the first horror movie I saw at the cinema must have been The Omega Man, the 1971 version of I Am Legend, Richard Matheson’s tale of modern-day vampires. I was only nine at the time, which was probably below the age certification for that film, though I think back then the ticket sellers at my local cinema weren’t always too bothered about checking and enforcing such matters. For a long time, The Omega Man remained my favourite film. I still love the scene where Charleton Heston sits alone in a cinema watching the documentary film of the 1969 Woodstock festival, listening to hippies talking about a world of peace and love, a wonderful juxtaposition with the world we know Heston is now living in – bereft of human beings during the daytime and besieged by malevolent vampires at night.

A little later I saw Jaws (1975). This was still a time when your ticket enabled you to enter at any point during the film and then stay for the next showing (hence the saying “This is where we came in”). Thus, my introduction to the film was seeing Quint disappearing into the mouth of the shark, without any of the dramatic build-up to this point, which is arguably more fear-inducing than the final scenes.

Both I Am Legend (the book) and Jaws (the film) are among the works included in a selective review of American horror fiction discussed in the 2017 book Why Horror Seduces, by Mathias Clasen, Associate Professor of Literature and Media at Aarhus University. Before we get to this review, though, Clasen addresses the wider questions of what horror is, how it works, and how it has been and should be studied. Horror is notoriously hard to define other than in terms of the reactions that a work of fiction elicits from the viewer or reader. Whilst some theorists date the origins of horror to the advent of Gothic fiction in the nineteenth century, Clasen agrees with one of the genre’s most celebrated practitioners, Lovecraft, who wrote that “the horror tale is as old as human thought and speech themselves”.

Survey research shows that most people enjoy horror but, like Goldilocks’ porridge, it needs to be just right – it doesn’t work if it fails to provoke unease or a fear reaction, and likewise most people don’t want horror to be too frightening. But why do we want to be frightened at all by a work of fiction? Enter a plethora of theorists who want to tell us that the stories we love are actually about something other than an entity or situation that is scaring the crap out of us. There are Freudian, feminist, queer, Lacanian, Marxist, race studies, post-colonial and post-structuralist readings of horror fiction. Sometimes, a critical interpretation is simultaneously based on several of these approaches.

Whilst acknowledging that works of fiction may include multiple themes, and also expressing pleasure that the horror genre is taken seriously by these writers, Clasen believes that their various theoretical approaches invariably miss what is at the core of horror fiction. He is especially scathing, albeit in a polite fashion, about the psychoanalytic approaches to horror, which are so divorced from empirical evidence that they enable the shark in Jaws to be interpreted as both “a greatly enlarged, marauding penis” (Peter Biskind) and a “vagina dentata” (Jane Caputi), a giant vagina with teeth.

Critical interpretations are sometimes at odds with the explicitly expressed intentions of writers or directors. Thus, one Lacanian reading of The Shining insists that the novel is really about repressed homoerotic and Oedipal desires, despite author Stephen King’s insistence that the story is based on his own battle with alcohol. The early slasher movies provoked a range of critical reactions. One critic insisted that these films were a means for young people to assuage their guilt about their own hedonistic lifestyles, a claim that had no evidential basis whatsoever. Others claimed that slasher movies were inherently misogynistic, depicting their female victims as being punished for their sexually active lifestyles. Yet, content analysis of slasher films has shown that men are just as likely as women to be victims. In Halloween, the character of Laurie Strode (Jamie Lee Curtis) is supposedly spared death because she adheres to socially conservative norms, overlooking the fact that Michael Myers is actually trying to kill her and thereby causing her to be terrified. Furthermore, writer/director John Carpenter explicitly rejects the moralistic interpretation: Laurie Strode survives because she is the only character to detect and adequately respond to the danger.

Carpenter’s explanation is aligned with Clasen’s own interpretation of how horror works and why we are drawn to it. The human capacity for emotion, he points out, is a product of evolution. Fear and anxiety are the most primal of emotions, as these help shape our responses to immediate and anticipated threats, respectively (more on the topic of evolution and emotions can be found in Randolph Nesse’s new book Good Reasons for Bad Feelings). Potentially threatening stimuli, such as strange noises in the house at night, tend to grab our attention, even if in fact there is no danger. It is better to be anxious about something that turns out to be harmless than to be unconcerned about something truly dangerous. Organisms that do not respond to potential threats are fairly quickly removed from the gene pool. For our hunter-gatherer ancestors, those threats included environmental hazards, non-human predators, other people (in the form of physical danger, loss of status, and potentially lethal social ostracization), and diseases in the form of invisible pathogens, bacteria and viruses (hence certain stimuli, such as excrement and rotting meat, universally lead to feelings of disgust).

Our hunter-gatherer ancestors faced regular challenges to their survival on a scale that most of us will never experience. Even contemporary hunter-gatherers mostly live shorter lives than the rest of us. Horror fiction enables us to experience emotional reactions to potentially threatening stimuli within a safe environment. As Clasen puts it (p.147):

The best works of horror have the capacity to change us for life – to sensitize us to danger, to let us develop crucial coping skills, to enhance our capacity for empathy, to qualify our understanding of evil, to enrich our emotional repertoire, to calibrate our moral sense, and to expand our imaginations into realms of the dark and disturbing.

In his review of several works of American horror fiction, Clasen not only skewers the inadequacy of many previous critical approaches to horror, but spells out precisely the behavioural challenges that are posed to the characters in these works. For example, a staple ingredient of many zombie films is the tension between acting self-interestedly versus cooperating with others to fight the encroaching threat (zombies themselves arouse feelings of disgust associated with contagion). Often goodness and selfishness are embodied in different characters, yet in some works of fiction they may represent a conflict within a single character. One such example is Jack Torrance in The Shining. His failing literary career represents a loss of status, which he hopes to address by focusing on his writing whilst at the Overlook Hotel. When it becomes clear that there is some kind of threat to his son, Danny, feelings of parental concern are aroused. However, the hotel itself – once the home to various gangsters and corrupt politicians – exerts an evil influence on Jack, a recovering alcoholic, poisoning him against his own son.

Elsewhere, in Rosemary’s Baby author Ira Levin “successfully targeted evolved fears of intimate betrayal, contamination of the body, and persecution by metaphysical forces of evil” (p.91), whilst The Blair Witch Project plays upon our tendency to attribute negative value to a place where something bad has happened, a tendency which is adaptive because it makes people avoid dangerous places. As Clasen notes (p.143):

The same psychological phenomenon is at work when people shun houses in which murders or other particularly violent or grisly forms of crime have taken place.

Although Clasen himself says that there is far more we don’t know about how horror fiction works than what we do know, the evolutionary psychology approach would appear to offer a far more promising prospect for our understanding than any other approach that has so far been proposed. It is also an approach which should be far more satisfying to those of us who enjoy horror fiction, because it is in line with our intuitive understanding that we like horror because we simply enjoy the thrill of being scared whilst knowing that we are not really in danger.

Review – Meltdown: Why Our Systems Fail and What We Can Do About It

screenshot 2019-01-26 at 15.14.17In the opening chapter of Meltdown, the authors Chris Clearfield and András Tilcsik describe the series of events that led to a near-disaster at the Three Mile Island nuclear facility in the United States. The initiating event was relatively minor, and occurred during routine maintenance, but as problems began to multiply the operators were confused. They could not see first-hand what was happening and were reliant on readouts from individual instruments, which did not show the whole story and which were open to misinterpretation.

The official investigation sought to blame the plant staff, but sociology professor Charles “Chick” Perrow argued that the incident was a system problem. The author of Normal Accidents: Living With High Risk Technologies, Perrow states that systems can be characterised along two dimensions: complexity and coupling. Complex systems have many interacting parts and frequently the components are invisible to the operators. Tightly-coupled systems are those in which there is little or no redundancy or slack. A perturbation in one component may have multiple knock-on effects. Perrow argues that catastrophic failures tend to be those where there is a combination of high complexity and tight coupling. His analysis forms the explanatory basis for many of the calamities described in Meltdown. Not all of these are life-threatening. Some are merely major corporate embarrassments, such as when Price-Waterhouse Cooper cocked up the award for the Best Picture at the 89th Academy Awards. Others nonetheless had a big impact on ordinary people, such as the problems with the UK Post Office’s Horizon software system, which led to many sub-postmasters being accused of theft, fraud and false accounting. Then there are the truly lethal events, such as the Deepwater Horizon oil rig explosion. Ironically, it is often the safety systems themselves that are the source of trouble. Perrow is quoted as saying that “safety systems are the biggest single source of catastrophic failure in complex tightly-coupled systems”.

The second half of Meltdown is devoted to describing some of the ways in which we can reduce the likelihood of things going wrong. These include Gary Klein’s idea of the premortem. When projects are being planned, people tend to be focused on how things are going to work, which can lead to excessive optimism. Only when things go wrong do the inherent problems start to appear obvious (hindsight bias). Klein suggests that planners envisage a point in time after their project has been implemented, and imagine that it has been a total disaster. Their task is to write down the reasons why it has all gone so wrong. By engaging in such an exercise, planners are forced to think about things that might not otherwise have come to mind, to find ways to address potential problems, and to develop more realistic timelines.

Clearfield and Tilcsik also discuss ways to improve operators’ mental models of the systems they are using, as well as the use of confidential reporting systems for problems and near-misses.

They devote several chapters to the important topic of allowing dissenting voices to speak openly about their concerns. There is ample evidence that lack of diversity in teams, including corporate boards, has a detrimental effect on quality of discussion. Appointing the “best people for the job” may not be such a great idea if the best people are all the same kind of people. One study found that American community banks were more likely to fail during periods of uncertainty when they had higher proportions of banking experts on their boards. It seems that these experts were overreliant on their previous experiences, were overconfident, and – most importantly – were over-respectful of each others’ opinions. Moreover, domination by banking experts made it harder for challenges to be raised by the non-bankers on the boards. However, where there were higher numbers of non-bankers, the bankers more often had to explain issues in more detail and their opinions were more often challenged.

Other research shows that both gender and ethnic diversity are important, too. An experimental study of stock trading, involving simulations, found that ethnically homogeneous groups of traders tended to copy each other, including each others’ mistakes, resulting in poorer performance. Where groups were more diverse, people were just generally more skeptical in their thinking and therefore more accurate overall. Another study found that companies were less likely to have to issue financial restatements (corrections owing to error or fraud) where there was at least one woman director on the board.

Clearfield and Tilcsik argue that the potential for catastrophe is changing as technologies develop. Systems which previously were not both complex and tightly-coupled are increasingly becoming so. This can of course result in great performance benefits, but may also increase the likelihood that any accidents that do occur will be catastrophic ones.

Meltdown has deservedly received a lot of praise since its publication last year. The examples it describes are fascinating, the explanations are clear, and the proposed solutions (although not magic bullets) deserve attention. Writing in the Financial Times, Andrew Hill cited Meltdown when talking about last year’s UK railway timetable chaos, saying that “organisations must give more of a voice to their naysayers”. The World Economic Forum’s Global Risks Report 2019 carries a short piece by Tilcsik and Clearfield, titled Managing in the Age of Meltdowns.

I highly recommend this excellent book.

Review – Lab Rats: Why Modern Work Makes People Miserable

screenshot 2019-01-19 at 12.55.27When scientists develop new antidepressant drugs they first administer them to rats. This means initially inducing depression in the poor rodents. No physical pain is involved. Rather, for a prolonged period the animals experience unpredictable negative changes to their environment, such as wet bedding, dirty sawdust, the sounds of predators, or changes to the cycle of light and dark. Eventually, the rats slide into an apathetic state, ceasing to groom themselves or build nests, and not bothering to use their exercise wheels.

In Lab Rats, a book that is by turns funny and frightening, Dan Lyons likens the plight of modern workers to that of these experimental rats. Constant change, whether it be office layout, new technologies or new methodologies, is producing a workforce that is increasingly stressed, depressed, and sometimes suicidal. Three other factors contribute to decreasing satisfaction with work. The first factor is money; or, specifically, the lack of it. Over the last few decades the incomes of ordinary workers have fallen, whilst those of the wealthiest have boomed. Secondly, workers are increasingly insecure. The third factor is dehumanization, whereby people are increasingly used by technology, rather than vice versa.

According to Lyons, there are two key reasons why modern work has become so much worse. The first is the shift from stakeholder capitalism to shareholder capitalism. For much of the twentieth century, company executives often recognised that they had responsibilities to employees and the wider community, as well as to investors. However, a significant change in attitude occurred when Milton Friedman, a University of Chicago economist (later to be awarded the Nobel prize), promoted the idea that the only responsibility that companies had was towards their shareholders. Aided by anti-trades union legislation, Friedman’s ideas led to a more ruthless form of capitalism, in which jobs were cut or moved abroad, wages slashed, and work frequently outsourced to the lowest bidder. The gig economy developed, in which organisations assembled a body of contract employees, people who were in fact often classed as “self-employed” so that they didn’t have to be awarded the kinds of benefits, such as paid holidays and sick pay, enjoyed by regular employees. The development of the Internet served to speed up these processes.

The second key factor identified by Lyons is the rise of Silicon Valley. Indeed, in the American edition of Lab Rats, Silicon Valley is explicitly identified in the subtitle. Once upon a time, Silicon Valley was full of hippies who grew up in the counter-culture of the 1960s. Companies like Hewlett Packard were a model of how to treat employees well. However, with the advent of shareholder capitalism the hippies were replaced by ruthless oligarchs (e.g. Jeff Bezos, Mark Zuckerberg, Travis Kalanick and Elon Musk), an army of wannabes desperate to get rich quick, and a bunch of venture capitalists who hold out the hope that this can be achieved.

The lack of morality in modern-day Silicon Valley is surely best exemplified by the following example. The rise of the tech oligarchs and their billion-dollar campuses, such as the Googleplex and Apple’s space-ship campus, have pushed up housing prices so far that these tech installations are now fringed by neighbourhoods where people live in camper vans, tents, or simply on the sidewalks. In 2016:

a bunch of rich techies came up with their own solution, sponsoring a ballot proposition that would let police forcibly remove homeless people from the sidewalks. Homeless people would get twenty-four hours to either move to a shelter or get a bus ticket out of town. If they didn’t comply, the cops could seize their tents and belongings. (p.36)

The proposition was passed.

But back to the venture capitalists. In Lyons’s words, the Valley has become a “casino”. The ambition of the modern ‘techie’ is to create a start-up business that attracts sufficient money from venture capitalists such that they are able to get rich by floating the business on the stock market (essentially, getting taken over by other monied interests). Lyons refers to these business start-ups as “unicorns”. Along the way, these businesses typically lose heaps of money, which is why their employees are treated so poorly. But no matter – if all goes to plan, the start-up bosses flog off their outfits, then write a best-selling book about how to run a ‘disruptive’ company. Outside of Silicon Valley, many CEOs – fearful that their organisation is at risk of stagnating in the new economy – lap up this guidance on how to do things a new way. Yet, to quote Lyons:

Silicon Valley has no fountain of youth. Unicorns do not possess any secret management wisdom. Most start-ups are terribly managed, half-assed outfits run by buffoons and bozos and frat boys, and funded by amoral investors who are only hoping to flip the company into the public markets and make a quick buck. They have no operations expertise, no special insight into organizational behavior (p.45)

Why is it that CEOs are so ready to seek out the guidance of business gurus? It seems that the simple truth is that no-one really knows how to run a big company. Lyons writes:

The business world has a seemingly insatiable appetite for management gurus. You probably can’t blame CEOs. It may be that no human is really smart enough to run something as vast and complex as a corporation. Yet someone has to do it. Clinging to a system, any system, at least provides the illusion of structure. The system also gives the boss something to blame when things go wrong. Managers grasp at systems the way that drowning people reach for life jackets.

Indeed, although Lyons doesn’t mention this, even the Harvard Business Review has previously noted that CEOs who run highly successful corporations frequently fail to repeat that success when they move to another company (with correspondingly vast salary). It is as though success occurs despite the CEOs’ presence rather than because of it.

Lyons traces a rough history of business systems, beginning with the work of “shameless fraud” Frederick Taylor (“He fudged his numbers. He cheated and lied”) in the 1890s. Taylor claimed to have devised a scientific method to optimise the efficiency of any process. In reality, he ramped up the quotas until staff began to leave. Taylor was subsequently fired from the company where he did his “research”. Despite his work being thoroughly debunked, Taylorism became almost a religion. Since Taylor, we have had Peter Drucker (who coined the term “knowledge worker”), Michael Porter, Jim Collins, and countless others. The business fads that we’ve been sold include the ‘Five Forces Framework’, ‘Six Sigma’, Lean Manufacturing’, ‘Lean Startup’ and ‘Agile’.

Some space is devoted to description and discussion of ‘Agile’, which is perhaps the most recent fad to be widely adopted. In 2001 a group of software developers authored a ‘Manifesto for Agile Software Development’, an idea that was subsequently pounced on by others and expanded far beyond its original domain of application. Lyons describes the business application of Agile as:

a management fad that has swept the corporate world and morphed into what some call a movement but is more like widespread mental illness (p.55)

Like other fads, Agile is really just another version of Taylorism. All these ideas basically boil down to trying to do more with fewer people for less money. The authors of the original Agile manifesto have sought to distance themselves from what Agile has become, saying they can no longer make sense of it.

Sadly, one particular group of workers may find themselves particularly at risk:

The pressure is extra high on older workers, who are experienced enough to realize that this is bullshit, and that Agile usually fails, but wise enough to realize that the real point of Agile may be to create an excuse to fire older workers, and so the smart thing to do is to shut up and go along. (p.59)

Lyons does hold out some hope for better things, and the final section of his book is called “The No-Shit-Sherlock School of Management”. He points out that Fortune magazine’s list of ‘Legends’ are companies that are incredibly successful and treat their employees exceptionally well. Elsewhere, a number of businesses have sprung up in Silicon Valley that are reacting against the spread of shareholder capitalism by not only treating their staff well, but doing good for the community. One venture capital firm, Kapor Capital, engages only in “gap-closing” investing, putting money into companies that are “serving low-income communities and/or communities of color to close gaps of access, opportunity, or outcome” (p.201).

It also seems that increasing numbers of students are drawn towards business courses that have more of a social emphasis. Elsewhere, workers have rediscovered the value of becoming organised. For example, Google employees succeeded in getting the company to abandon its involvement in a military drone programme, and a number of gig economy workers have successfully organised to challenge their working conditions and contracts.

Lab Rats is an eminently readable book that will both amuse and horrify. To be sure, Dan Lyons’s emphasis is on reforming capitalism, which may seem a little optimistic for some on the left. Indeed, I felt possibly he might have been viewing the pre-Friedman economic era through slightly rose-tinted spectacles. Also, whilst he holds up Starbucks as a company that treats their employees very well, he omits to mention that they have also been criticised for using legal mechanisms to minimise the tax they pay in the countries where they operate (Lyons does criticise Apple for the same thing). However, CEOs who are looking for a business reason to treat their employees well might note the work of the psychologist Dan Ariely, who found that companies where people felt physically and emotionally safe tended to outperform the stock market.

Double book review: Margaret Boden and Gary Smith on Artificial Intelligence

AI – Its nature and future, by Margaret A. Boden. Oxford University Press. 2016.

The AI Delusion, by Gary Smith. Oxford University Press. 2018.

AI, machine learning, algorithms, robots, automation, chatbots, sexbots, androids – in recent years all these terms have regularly been appearing in the media, either to tell us about the latest achievements in technology, about exciting future possibilities, or in the context of warnings about threats to our jobs and freedoms.

Two recent books, from Margaret Boden and Gary Smith, respectively, are useful guides to the perplexed in explaining the issues. Each is clearly written and highly readable. Margaret Boden, Research Professor of Cognitive Science at the University of Sussex, begins with a basic definition:

Artificial intelligence (AI) seeks to make computers do the sorts of things that minds can do.

People who work in AI tend to work in one of two different camps (though occasionally both). They either take a technological approach, whereby they attempt to create systems that can perform certain tasks, regardless of how they do it; or they take a scientific approach, whereby they are interested in answering questions about human beings or other living things.

Screenshot 2018-09-02 at 17.27.36

Boden’s book is essentially a potted history of the field, guiding the reader through the different approaches and philosophical arguments. Alan Turing, of Bletchley Park fame, seems to have envisaged all the current developments in the field, though during his lifetime the technology wasn’t available to implement these ideas. The first approach to hit the big time is now known as ‘Good Old-Fashioned AI (GOFAI)’. This assumes that intelligence arises from physical entities that can process symbols in the right kind of way, whether these entities are living organisms, arrangements of tin cans, silicon chips or whatever else. The other approaches are not reliant on sequential symbol processing. These are: 1. Artificial Neural Networks (ANNs), or connectionism, 2. Evolutionary programming, 3. Cellular automata (CA), and 4. Dynamical systems. Some researchers argue in favour of hybrid systems that combine elements of symbolic and non-symbolic processing.

For much of the 1950s, researchers of different theoretical persuasions all attended the same conferences and exchanged ideas, but in the late ’50s and 1960s a schism developed. In 1956 John McCarthy coined the term ‘Artificial Intelligence’ to refer to the symbol processing approach. This was seized upon by journalists, particularly as this approach began to have successes with the Logic Theory Machine (Newell & Simon) and General Problem Solver (Newell, Shaw, and Simon). By contrast, Frank Rosenblatt’s connectionist Perceptron model was found to have serious limitations and was contemptuously dismissed by many symbolists. Professional jealousies were aroused and communication between the symbolists and the others broke down. Worse, funding for the connectionist approach largely dried up.

Work within the symbol processing, or ‘classical’, approach has taught us some important lessons. These include the need to make problems tractable by directing attention to only part of the ‘search space’, by making simplifying assumptions and by ordering the search efficiently. However, the symbolic approaches also faced the issue of ‘combinatorial explosion’, meaning that logical processes would draw conclusions that were true but irrelevant. Likewise, in classical – or ‘monotonic’ – logic, once something is proved to be true it stays true, but in everyday life that is often not the case. Boden writes:

AI has taught us that human minds are hugely richer, and more subtle,  than psychologists previously imagined. Indeed, that is the main lesson to be learned from AI.

Throughout the lean years for connectionist AI a number of researchers had plugged away regardless, and in the late 1980s there was a sudden explosion of research under the name of ‘Parallel Distributed Processing’ (PDP). These models consist of many interconnected units, each one capable of computing only one thing. There are multiple layers of units, including an input layer, an output layer, and a ‘hidden layer’ or layers in between. Some connections feed forward, others backwards, and others connect laterally. Concepts are represented within the state of the entire network rather than within individual units.

PDP models have had a number of successes, including their ability to deal with messy input. Perhaps the most notable finding occured when a network produced over-generalisation of past tense learning (e.g. saying ‘go-ed’ rather than ‘went’), indicating – contrary to Chomsky – that this aspect of language learning may not be an inborn linguistic rule. Consequently, the research funding tap was turned back on, especially from the US Department of Defense. Nonetheless, PDP models have their own weaknesses too, such as not being able to represent precision as well as classical models:

Q: What’s 2 + 2?

A: Very probably 4.

Learning within ANN’s usually involves changing the strength (the ‘weights’) of the links between units, as expressed in the saying “fire together, wire together”. It involves the application of ‘backprop’ (backwards propagation) algorithms which trace responsibility for performance back from the output layer into the hidden layers, identifying the units that need to be adopted, and thence to the input layer. The algorithm needs to know the precise state of the output layer when the network is giving the correct answer.

Although PDP propaganda plays up the similarity between network models and the brain’s neuronal connections, in fact there is no backwards propagation in the brain. Synapses feed forwards only. Also, brains aren’t strict hierarchies. Boden also notes (p.91):

a single neuron is as computationally complex as an entire PDP system, or even a small computer.

Subsequent to the 1980s PDP work it has been discovered that connections aren’t everything:

Biological circuits can sometimes alter their computational function (not merely make it more or less probable), due to chemicals diffusing through the brain.

One example of this is Nitrous Oxide. Researchers have now developed new types of ANNs, including GasNets, used to evolve “brains for autonomous robots.

Boden also discusses other approaches within the umbrella of AI, including robots and artificial life (‘A-life’), and evolutionary AI. These take in concepts such as distributed cognition (minds are not within individual heads), swarm intelligence (simple rules can lead to complex behaviours), and genetic algorithms (programs are allowed to change themselves, using random variation and non-random selection).

But are any of these systems intelligent? Many AI models have been very successful within specific domains and have outperformed human experts. However, the essence of human intelligence – even though the word itself does not have a standard definition among psychologists – is that it involves the ability to perform in many different domains, including perception, language, memory, creativity, decision making, social behaviour, morality, and so on. Emotions appear to be an important part of human thought and behaviour, too. Boden notes that there have been advances in the modelling of emotion, and there are programs that have demonstrated a certain degree of creativity. There are also some programs that operate in more than one domain, but are still nowhere near matching human abilities. However, unlike some people who have warned about the ‘singularity’ – the moment when machine intelligence exceeds that of humans – Boden does not envisage this happening. Indeed, whilst she holds the view that, in principle, truly intelligent behaviour could arise in non-biological systems, in practice this might not be the case.

Likewise, the title of Gary Smith’s book is not intended to decry all research within the field of AI. He also agrees that many achievements have occurred and will continue to do so. However, the ‘delusion’ of the title occurs when people assign to computers an ability that they do not in fact possess. Excessive trust can be dangerous. For Smith:

True intelligence is the ability to recognize and assess the essence of a situation.

This is precisely what he argues AI systems cannot do. He gives the example of a drawing of a box cart. Computer systems can’t identify this object, he says, whereas almost any human being could not only identify it, but suggest who might use it, what it might be used for, what the name on the side means, and so on.

Screenshot 2018-09-02 at 17.28.21

Smith refers to the Winograd Schema Challenge. The Stanford Computer Science Professor, Terry Winograd, has put up a $25,000 prize for anyone who can design a system that is at least 90% accurate in interpreting sentences like this one:

I can’t cut that tree down with that axe; it is too [thick/small]

Most people realise that if the bracketed work is ‘thick’ it refers to the tree, whereas if it is ‘small’ it refers to the axe. Computers are typically – ahem – stumped by this kind of sentence, because they lack the real-world experience to put words in context.

Much of Smith’s concern is about the data-driven (rather than theory-driven) way that machine learning approaches use statistics. In essence, when a machine learning program processes data it does not stop to ask ‘Where did the data come from?’ or ‘Why these data?’ These are important questions to ask and Smith takes us through various problems that can arise with data (his previous book was called Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics).

One important limitation associated with data is the ‘survivor bias’. A study of Allied warplanes returning to Britain after bombing runs over Germany found that most of the bullet and shrapnel holes were on the wings and rear of the plane, but very few on the cockpit, engines, or fuel tanks. The top brass therefore planned to attach protective metal plates to the wings and rear of their aircraft. However, the statistician Abraham Wald pointed out that the planes that returned were, by definition, the ones that had survived the bullets and shrapnel. The planes that had not returned had most likely been struck in the areas that the returning planes had been spared. These were the areas that should be reinforced.

Another problem is the one discussed in my previous blog, that of fake or bad data, arising from the perverse incentives of academia and the publishing world. The ‘publish-or-perish’ climate, together with the wish of journals to publish ‘novel’ or ‘exciting’ results, has led to an exacerbation of ‘Questionable Research Practices’ or outright fakery, with the consequence that an unfortunately high proportion of published papers contain false findings.

Smith is particularly scathing about the practice of data mining, something that for decades was regarded as a major ‘no-no’ in academia. This is particularly problematic in the advent of big data, when machine learning algorithms can scour thousands upon thousands of variables looking for patterns and relationships. However, even among sequences that are randomly generated, correlations between variables will occur. Smith shows this to be the case with randomly generated sequences of his own. He laments that

The harsh truth is that data-mining algorithms are created by mathematicians who often are more interested in mathematical theory than practical reality.

and

The fundamental problem with data mining is that it is very good at finding models that fit the data, but totally useless in gauging whether the models are ludicrous.

When it comes to the choice of linear or non-linear models, Smith says that expert opinion is necessary to decide which is more realistic (though one recent systematic comparison of methods, involving a training set of data and a validation set, found that the non-linear methods associated with machine learning were dominated by the traditional linear methods). Other problems arise with particular forms of regression analysis, such as stepwise regression and ridge regression. Data reduction methods, such as factor analysis or principal components analysis, can also cause problems because the transformed data are hard to interpret. Especially if mined from thousands of variables they will contain nonsense. Smith looks at some dismal attempts to beat the stock market using data mining techniques.

But as if the statistical absurdities weren’t bad enough, Smith’s penultimate chapter – the one that everything else has been leading up to, he says – concerns the application of these techniques to our personal affairs in ways which impinge upon our privacy. For example, software exists that examines the online behaviour of job applicants. Executives who ought to know better may draw inappropriate causal inferences from the data. One of the major examples discussed earlier in the book is Hillary Clinton’s presidential campaign. Although not widely known, her campaign made use of a powerful computer program called Ada (after Ada Lovelace, an early pioneer in AI). This crunched masses of data about potential voters across the country, running 400,000 simulations per day. No-one knows exactly how Ada worked, but it was used to guide decisions about where to target campaigning resources. The opinions of seasoned campaigners were entirely sidelined, including perhaps the greatest campaigner of all – Bill Clinton (reportedly furious about this, too). We all know what happened next.

 

 

Review: The 7 Deadly Sins of Psychology

Screenshot 2018-08-18 at 19.05.15

I remember being taught as an undergraduate psychology student that replication, along with the principle of falsification, was a vital ingredient in the scientific method. But when I flipped through the pages of the journals (back in those pre-digital days), the question that frequently popped into my head was ‘Where are all these replications?’ It was a question I never dared actually ask in class, because I was sure I must simply have been missing something obvious. Now, about 30 years later, it turns out I was right to wonder.

In Chris Chambers’ magisterial new book The 7 Deadly Sins of Psychology, he reports that it wasn’t until 2012 that the first systematic study was conducted into the rate of replication within the field of psychology. Makel, Plucker and Hegarty searched for the term “replicat*” among the 321,411 articles published in the top 100 psychology journals between 1900 and 2012. Just 1.57 per cent of the articles contained this term, and among a randomly selected subsample of 500 papers from that 1.57 per cent,

only 342 reported some form of replication – and of these, just 62 articles reported a direct replication of a previous experiment. On top of that, only 47 per cent of replications within the subsample were produced by independent researchers (p.50).

Why does this matter? It seems that researchers, over a long period, have engaged in a variety of ‘Questionable Research Practices’ (QRPs), motivated by ambitions that are often shaped by the perverse incentives of the publishing industry.

A turning point occurred in 2011 when the Journal of Personality and Social Psychology published Daryl Bem’s now-notorious paper ‘Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect’. Taking a classic paradigm in which an emotional manipulation influences the speed of people’s responses on a subsequent task, Bem conducted a series of studies in which the experimental manipulation happened after participants made their responses. His results seemed to indicate that people’s responses were being influenced by a manipulation that hadn’t yet happened. There was general disbelief among the scientific community and Bem himself said that it was important for other researchers to attempt to replicate his findings. However, when the first – failed – replication was submitted to the same journal, they rejected it on the basis that their policy was to not publish replication studies, whether or not they were successful.

In fact, many top journals – e.g. Nature, Cortex, Brain, Psychological Science – explicitly state, in various ways, that they only publish findings that are novel. A December 2015 study in the British Medical Journal, that perhaps appeared too late for inclusion in Chambers’ book, found that over a forty year period scientific abstracts had shown a steep increase in the use of words relating to novelty or importance (e.g. “novel”, “robust”, “innovative” and “unprecedented”). Clearly, then, researchers know what matters when it comes to getting published.

A minimum requirement, though not the only one, for a result being interesting is that it is statistically significant. In the logic of null hypothesis significance testing (NHST), this means that if chance were the only factor in producing a result, then the probability of getting that result would be less than 5 per cent (or less than 1/20). Thus, researchers hope that any of their key tests will lead to a p-value of less than .05, as – agreed by convention – this allows them to reject the null hypothesis in favour of their experimental hypothesis (the explanation that they are actually proposing, and which they may be invested in).

It is fairly easy to see how the academic journals could be – and almost certainly are – overpopulated with papers that claim evidential support for hypotheses that are false. For instance, suppose many different researchers test a hypothesis that is, unknown to them, incorrect. Perhaps just one researcher finds a significant result, which is a fluke result arising by chance. That one person is likely to get published, whereas the others will not. In reality, many researchers will not bother to submit their null findings. But here lies another problem. A single researcher may conduct several studies of the same hypothesis, but only attempt to the publish the one (or ones) that turn out significant. He or she may feel a little guilty about this, but hey! – they have careers to progress and this is the system that the publishers have forced upon them.

Replication is supposed to help discover which hypotheses are false and which are likely to be true. As we have seen, though, failed replications may never see the light of day. More problematic is the use of ‘conceptual replications’, in which a researcher tries to replicate a previous finding using a methodology that is, to a greater or lesser degree, novel. The researcher can claim to be “extending” the earlier research by testing the generality of its findings. Indeed, having this element of originality may increase the chances of publication. However, as Chambers notes, there are three problems with conceptual replications.

Firstly, how similar must the methods be in order that the new study can count as a replication, and who decides this? Second, there is a risk of certain findings becoming unreplicated. If a successful conceptual replication later on turns out to have produced its result through an entirely different causal mechanism, then the original study has just been unreplicated. Thirdly, attempts at conceptual replication can fuel confirmation bias. If the attempt at a conceptual replication produces a different result to the initial study, the authors of the first study will inevitably claim that their own results were not reproduced precisely because the attempted replication didn’t follow exactly the same methodology.

Chambers sums up the replication situation as follows:

To fit with the demands of journals, psychologists have thus replaced direct replication with conceptual replication, maintaining the comfortable but futile delusion that our science values replication while still satisfying the demands of novelty and originality (p.20).

Because psychologists frequently run studies with more than one independent variable, they typically use statistical tests that provide various main effects and interactions. Unfortunately, this can tempt researchers to operate with a degree of flexibility that isn’t warranted by the original hypothesis. They may engage in HARK-ing – Hypothesizing After the Results are Known. Suppose a researcher predicts a couple of main effects, but that these turn out to be non-significant once the analysis has been performed. Nonetheless, there are some unpredicted significant interactions within the results. The researcher now goes through a process of trying to rationalise why the results turned out this way. Having come up with an explanation, he or she now rewrites the hypotheses as though these results were what had been expected all along. Recent surveys show that psychologists believe the prevalence of HARKing to be somewhere between 40%-90%, though the prevalence of those who admit to doing it themselves is, of course, much lower.

Another form of QRP is p-hacking. This refers to a cluster of practices whereby a researcher can illegitimately transform a non-significant result into a significant one. Suppose an experimental result has a p-value of .08, quite close to the magical threshold of .05 but also likely to be a barrier to publication. At this point, the researcher may try recruiting some new participants to the study in the hope that this will boost the p-value to below .05. However, bearing in mind that there will always be some variation in the way that participants respond, regardless of whether or not a hypothesis is true, then “peeking” at the results and recruiting new participants until such point that p falls below .05 is simply inflating the likelihood of obtaining a false positive result.

A second form of p-hacking is to analyse the data in different ways until you get the result you want. There is no single agreed method for the exclusion of ‘outliers’ in the data, so a researcher may run several analyses in which differing numbers of outliers are excluded, until a significant result is returned. Alternatively, there may be different forms of statistical test that can be applied. All tests are essentially estimates, and while equivalent-but-different tests will produce broadly similar results, the difference of a couple of decimal places or so may be all that is needed to transform a non-significant result into a significant one.

A third form of p-hacking is to change your dependent variables. For example, if three different measures of an effect are all just slightly non-significant, then a researcher might try integrating these into one measure to see if these brings the p-value below .05.

Several recent studies have examined the distributions of p-values in similar kinds of studies and have found that there is often a spike in p-values just below .05, which would appear to be indicative of p-hacking. The conclusion that follows from this is that many of the results in the psychological literature are likely to be false.

Chris Chambers also examines a number of other ways in which the scientific literature can be distorted by incorrect hypotheses. One such way is the hoarding of data. Many journals do not require, or even ask, that authors deposit their data with them. Authors themselves often refuse to provide data when a request is received, or will only provide it under certain restrictive conditions (almost certainly not legally enforceable). Yet one recent study found that statistical errors were more frequent in papers where the authors had failed to provide their data. Refusal to share may, of course, be one way of hiding misconduct. Chambers argues that data sharing should be the norm, not least because even the most scrupulous and honest authors may, over time, lose their own data, whether because of the updating of computer equipment or in the process of changing institutions. And, of course, everyone dies sooner or later. So why not ensure that all research data is held in accessible repositories?

Chapter 7 – The Sin of Bean Counting – covers some ground that I discussed in an earlier blog, when I reviewed Jerry Muller’s book The Tyranny of Metrics.  Academic journals now have a ‘Journal Impact Factor’ (JIF), which uses the citation counts of their papers to index the overall quality of the work published in the journals. Yet, a journal’s JIF is accounted for only by a very small proportion of the papers they carry. Most papers only have a small number of citations. Worse, the supposedly high impact journals are in fact the ones with the highest rates of retractions owing to fraud or suspected fraud. Chambers argues that it would be more accurate to refer to them as “high retraction” journals rather than “high impact” journals. The JIF is also easily massaged by editors and publishers, and, rather than being objectively calculated, is a matter of negotiation between the journals and the company that determines the JIF (Thomson Reuters).

Yet:

Despite all the evidence that JIF is more-or-less worthless, the psychological community has become ensnared in a groupthink that lends it value.

It is used within academic institutions to help determine hiring and promotions, and even redundancies. Many would argue that JIF and other metrics have damaged the collegial atmosphere that one associates with universities, which in many instances have become arenas of overwork, stress and bullying.

Indeed, recent years have seen a number of instances of fraudulent behaviour by psychologists, most notably Diederik Stapel, who invented data for over 50 publications before eventually being exposed by a group of junior researchers and one of his own PhD students. By his own account, he began by engaging in “softer” acts of misrepresentation before graduating to more serious behaviours. Sadly, his PhD students, who had unwittingly incorporated his fraudulent results into their own PhDs (which they were allowed to retain) had their peer-reviewed papers withdrawn from the journals in which they had been published. Equally sad is ‘Kate’s Story’ (also recounted in Chapter 5) which describes the unjust treatment meted out to a young scientist after she was caught up in a fraud investigation against the Principal Investigator of the project she was working on, even though she was not the one who had reported him. Kate is reported as saying that if you suspect someone of making up data, but lack definitive proof, then do not expect any sympathy or support for speaking out.

Fortunately, Chris Chambers has given considerable thought as to how psychology’s replication crisis might be addressed. Indeed, he and a number of other psychologists have been instrumental in effecting some positive changes in academic publishing. His view is that it would be hopeless to try to address the biases (many likely unconscious) that researchers possess. Rather, it is the entire framework of the scientific and publishing enterprise which must be changed. His suggestions include:

  • The pre-registration of studies. Researchers submit their research idea to a journal in advance of carrying out the work. This includes details of hypotheses to be tested, the methodology and the statistical analyses that will be used. If the peer reviewers are happy with the idea, then the journal commits to publication of the findings – however they turn out – if the researchers have indeed carried out the work in a satisfactory manner.
  • The use of p-curve analyses to determine which fields in psychology are suffering from p-hacking.
  • The use of disclosure statements. Joe Simmons and colleagues have pioneered a 21-word statement:

We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.

  • Data sharing.
  • Solutions to allow “optional stopping” during data collection. One method is to reduce the alpha-level every time a researcher “peeks” at the data. A second method is to use Bayesian hypothesis testing instead of NHST. Whereas NHST only actually tests the null hypothesis (and doesn’t provide an estimate of the likelihood of the null hypothesis), the Bayesian approach allows researchers to directly estimate the probability of the null hypothesis relative to the experimental hypothesis.
  • Standardization of research practices. This may not always be possible, but where researchers conduct more than one type of analysis then the details of each should be reported and the robustness of the outcomes summarised.

Chambers devotes most space to the discussion of pre-registration. Many objections have been raised against this idea, and Chambers tackles these objections (convincingly, I think) in his Chapter 8: Redemption.

Although the issue of replication and QRPs is not unique to psychology, evidence indicates that it may be a bigger problem than for other disciplines. Therefore, if psychologists wish to be taken seriously then it is incumbent upon them to clean up their act. Fortunately, a number of psychologists – Chambers included – have been at the forefront of both uncovering poor practice and proposing ways to improve matters. A good starting point for anyone wanting to appreciate the scale of the problem and how to deal with it would be to read this book. Indeed, I think every university in the library should have at least one copy of this book on its shelves, and it should be on the reading list for classes in research methods and statistics. Despite being a book on methodology, I didn’t find it a dry read. On the contrary, it is something of a detective story – like Sherlock Holmes explaining how he worked out whodunnit – and, as such, I found it rather gripping.

 

 

 

Review: ‘The Mind is Flat’ by Nick Chater

The nature of consciousness is a topic that psychologists and philosphers have spilt much ink and many pixels over. Outside of psychoanalytic circles, what has been less discussed is the nature of the ‘unconscious mind’. Claims made by some psychologists about the power of the unconscious mind to influence behaviour have proven controversial.

Now, in a book that will have psychoanalysts and many others protesting loudly, cognitive scientist Nick Chater has plunged a stake through the very concept of an unconscious mind. In The Mind Is Flat Chater argues that our minds have no depths, let alone hidden ones. His primary claim is that the brain exists to make sense of the world by creating a stable perception of it and ourselves; but the brain does not provide us with an account of its own workings. These perceptions are created from our interpretations of a limited number of sensory inputs, with the assistance of various memory traces (themselves based on our interpretations of past events).

Chater’s opening chapter, The Power of Invention, describes how we can create an apparently rich internal picture of a fictional person or location based on a limited description that may have gaps or inconsistencies (Chater discusses Anna Karenina and Gormenghast). So it is with our perceptions of the actual world and, indeed, ourselves.  Most of our visual receptors are incapable of colour detection, yet we perceive the world in glorious colour. Our eyes are continually darting about all over the place, yet our perception of the world is smooth, not jerky. In short, much or most of what we perceive is an illusion foisted upon us by our brains.

Screenshot 2018-08-05 at 16.44.32

For centuries, philosophers consulted their ‘inner oracle’ in order to determine how the world works. Yet, Chater points out, the inner oracle has consistently misled us about concepts such as heat, weight, force and energy. Early researchers in artificial intelligence (AI) tried to do the same thing. They tried to excavate the mental depths of experts, recover ‘common sense theory’ and then devise methods to reason over this database. However, by the 1980s it had become clear that this program was going nowhere, and so was quietly abandoned.

As Chater puts it:

The mind is flat: our mental ‘surface’, the momentary thoughts, explanations and sensory experiences that make up our stream of consciousness is all there is to mental life. (p.31)

One reason why we are unaware of the fictional nature of our perceptions is precisely because our eyes are constantly moving about and picking up new sensory fragments. I may be unaware of the type of flower on the mantelpiece, but if you mention it my eyes go there automatically. In gaze-contingent eye tracking studies, the text on a screen changes according to where a person is looking. In fact, most of the text on the screen consists of Xs. As a participant’s eyes move across the screen the Xs that would have been in their fixation point change to become real words, and the area where they had been looking reverts to Xs. The participant, however, perceives that the entire page consists of meaningful text.

Likewise, when we construct a mental image it is never truly a ‘picture in the mind’. If we are asked to describe some details from the image, we simply ‘create’ those in our imagination in response to the question. Nothing is being retrieved from a complete image.

We often talk about a battle between ‘the heart and the head’, but Chater argues that we are in fact simply posing one reason against another reason. Citing the Kuleshov Effect, and the work of Schacter & Singer (1962) and Aron & Dutton (1974) on the labeling of emotional states, Chater concludes that “our feelings do not burst unbidden from within – they do not pre-exist at all” (p.98). Indeed:

The meaning of pretty much anything comes from its place in a wider network of relationships, causes and effects – not from within. (p.107)

Despite, or perhaps because of, our lack of inner depth, we are extremely good at dreaming up explanations for all kinds of things, including our inner motives. Perhaps my favourite example is from the work on choice blindness, in which participants were asked to choose the most attractive of two faces, each of which was presented on a card. After a participant made their choice, the researcher supposedly passed them the card they had chosen and asked them to explain why they had preferred that face. In fact, the researcher used sleight-of-hand to pass them the face they hadn’t chosen. Most people didn’t spot the discrepancy and readily provided an explanation as to why they preferred the face that they had not in fact chosen.

This research links to a wider body of work in decision making research, which shows that people’s preferences are constructed during the process of choice, depending on various contextual factors, as opposed to the conventional economic account that assumes people to have stable preferences that are revealed by the choices they make.

Chater also goes on to talk about people’s attentional limitations, arguing that – in almost all circumstances – our brains are only able to work on one problem at a time (where a problem is something which requires an act of interpretation on our part, rather than an habitual action such as putting one foot in front of the other when walking). This also fits with decades of work on human judgment, which has repeatedly found that people are unable to reliably integrate multiple items of information when trying to make a judgment.

Finally, Chater isn’t arguing that there are no unconscious processes. However, these unconscious processes aren’t ‘thoughts’. The mind isn’t like an iceberg, with a few thoughts appearing in consciousness and many others below the level of consciousness. Rather, the real nature of the unconscious is “the vastly complex patterns of nervous activity that create and support our slow, conscious experience” (p.175). Thus:

There is just one type of thought, and each thought has two aspects: a conscious read-out, and unconscious processes operating the read-out. And we can have no more conscious access to these brain processes than we can have conscious awareness of the chemistry of digestion or the biophysics of our muscles.

 The Mind is Flat is a book that I wish I’d written, in that it expresses, with evidence, a viewpoint that I have held for some time. The writing is clear and entertaining, and I devoured the book in just a few days. Recommended.