Double book review: Margaret Boden and Gary Smith on Artificial Intelligence

AI – Its nature and future, by Margaret A. Boden. Oxford University Press. 2016.

The AI Delusion, by Gary Smith. Oxford University Press. 2018.

AI, machine learning, algorithms, robots, automation, chatbots, sexbots, androids – in recent years all these terms have regularly been appearing in the media, either to tell us about the latest achievements in technology, about exciting future possibilities, or in the context of warnings about threats to our jobs and freedoms.

Two recent books, from Margaret Boden and Gary Smith, respectively, are useful guides to the perplexed in explaining the issues. Each is clearly written and highly readable. Margaret Boden, Research Professor of Cognitive Science at the University of Sussex, begins with a basic definition:

Artificial intelligence (AI) seeks to make computers do the sorts of things that minds can do.

People who work in AI tend to work in one of two different camps (though occasionally both). They either take a technological approach, whereby they attempt to create systems that can perform certain tasks, regardless of how they do it; or they take a scientific approach, whereby they are interested in answering questions about human beings or other living things.

Screenshot 2018-09-02 at 17.27.36

Boden’s book is essentially a potted history of the field, guiding the reader through the different approaches and philosophical arguments. Alan Turing, of Bletchley Park fame, seems to have envisaged all the current developments in the field, though during his lifetime the technology wasn’t available to implement these ideas. The first approach to hit the big time is now known as ‘Good Old-Fashioned AI (GOFAI)’. This assumes that intelligence arises from physical entities that can process symbols in the right kind of way, whether these entities are living organisms, arrangements of tin cans, silicon chips or whatever else. The other approaches are not reliant on sequential symbol processing. These are: 1. Artificial Neural Networks (ANNs), or connectionism, 2. Evolutionary programming, 3. Cellular automata (CA), and 4. Dynamical systems. Some researchers argue in favour of hybrid systems that combine elements of symbolic and non-symbolic processing.

For much of the 1950s, researchers of different theoretical persuasions all attended the same conferences and exchanged ideas, but in the late ’50s and 1960s a schism developed. In 1956 John McCarthy coined the term ‘Artificial Intelligence’ to refer to the symbol processing approach. This was seized upon by journalists, particularly as this approach began to have successes with the Logic Theory Machine (Newell & Simon) and General Problem Solver (Newell, Shaw, and Simon). By contrast, Frank Rosenblatt’s connectionist Perceptron model was found to have serious limitations and was contemptuously dismissed by many symbolists. Professional jealousies were aroused and communication between the symbolists and the others broke down. Worse, funding for the connectionist approach largely dried up.

Work within the symbol processing, or ‘classical’, approach has taught us some important lessons. These include the need to make problems tractable by directing attention to only part of the ‘search space’, by making simplifying assumptions and by ordering the search efficiently. However, the symbolic approaches also faced the issue of ‘combinatorial explosion’, meaning that logical processes would draw conclusions that were true but irrelevant. Likewise, in classical – or ‘monotonic’ – logic, once something is proved to be true it stays true, but in everyday life that is often not the case. Boden writes:

AI has taught us that human minds are hugely richer, and more subtle,  than psychologists previously imagined. Indeed, that is the main lesson to be learned from AI.

Throughout the lean years for connectionist AI a number of researchers had plugged away regardless, and in the late 1980s there was a sudden explosion of research under the name of ‘Parallel Distributed Processing’ (PDP). These models consist of many interconnected units, each one capable of computing only one thing. There are multiple layers of units, including an input layer, an output layer, and a ‘hidden layer’ or layers in between. Some connections feed forward, others backwards, and others connect laterally. Concepts are represented within the state of the entire network rather than within individual units.

PDP models have had a number of successes, including their ability to deal with messy input. Perhaps the most notable finding occured when a network produced over-generalisation of past tense learning (e.g. saying ‘go-ed’ rather than ‘went’), indicating – contrary to Chomsky – that this aspect of language learning may not be an inborn linguistic rule. Consequently, the research funding tap was turned back on, especially from the US Department of Defense. Nonetheless, PDP models have their own weaknesses too, such as not being able to represent precision as well as classical models:

Q: What’s 2 + 2?

A: Very probably 4.

Learning within ANN’s usually involves changing the strength (the ‘weights’) of the links between units, as expressed in the saying “fire together, wire together”. It involves the application of ‘backprop’ (backwards propagation) algorithms which trace responsibility for performance back from the output layer into the hidden layers, identifying the units that need to be adopted, and thence to the input layer. The algorithm needs to know the precise state of the output layer when the network is giving the correct answer.

Although PDP propaganda plays up the similarity between network models and the brain’s neuronal connections, in fact there is no backwards propagation in the brain. Synapses feed forwards only. Also, brains aren’t strict hierarchies. Boden also notes (p.91):

a single neuron is as computationally complex as an entire PDP system, or even a small computer.

Subsequent to the 1980s PDP work it has been discovered that connections aren’t everything:

Biological circuits can sometimes alter their computational function (not merely make it more or less probable), due to chemicals diffusing through the brain.

One example of this is Nitrous Oxide. Researchers have now developed new types of ANNs, including GasNets, used to evolve “brains for autonomous robots.

Boden also discusses other approaches within the umbrella of AI, including robots and artificial life (‘A-life’), and evolutionary AI. These take in concepts such as distributed cognition (minds are not within individual heads), swarm intelligence (simple rules can lead to complex behaviours), and genetic algorithms (programs are allowed to change themselves, using random variation and non-random selection).

But are any of these systems intelligent? Many AI models have been very successful within specific domains and have outperformed human experts. However, the essence of human intelligence – even though the word itself does not have a standard definition among psychologists – is that it involves the ability to perform in many different domains, including perception, language, memory, creativity, decision making, social behaviour, morality, and so on. Emotions appear to be an important part of human thought and behaviour, too. Boden notes that there have been advances in the modelling of emotion, and there are programs that have demonstrated a certain degree of creativity. There are also some programs that operate in more than one domain, but are still nowhere near matching human abilities. However, unlike some people who have warned about the ‘singularity’ – the moment when machine intelligence exceeds that of humans – Boden does not envisage this happening. Indeed, whilst she holds the view that, in principle, truly intelligent behaviour could arise in non-biological systems, in practice this might not be the case.

Likewise, the title of Gary Smith’s book is not intended to decry all research within the field of AI. He also agrees that many achievements have occurred and will continue to do so. However, the ‘delusion’ of the title occurs when people assign to computers an ability that they do not in fact possess. Excessive trust can be dangerous. For Smith:

True intelligence is the ability to recognize and assess the essence of a situation.

This is precisely what he argues AI systems cannot do. He gives the example of a drawing of a box cart. Computer systems can’t identify this object, he says, whereas almost any human being could not only identify it, but suggest who might use it, what it might be used for, what the name on the side means, and so on.

Screenshot 2018-09-02 at 17.28.21

Smith refers to the Winograd Schema Challenge. The Stanford Computer Science Professor, Terry Winograd, has put up a $25,000 prize for anyone who can design a system that is at least 90% accurate in interpreting sentences like this one:

I can’t cut that tree down with that axe; it is too [thick/small]

Most people realise that if the bracketed work is ‘thick’ it refers to the tree, whereas if it is ‘small’ it refers to the axe. Computers are typically – ahem – stumped by this kind of sentence, because they lack the real-world experience to put words in context.

Much of Smith’s concern is about the data-driven (rather than theory-driven) way that machine learning approaches use statistics. In essence, when a machine learning program processes data it does not stop to ask ‘Where did the data come from?’ or ‘Why these data?’ These are important questions to ask and Smith takes us through various problems that can arise with data (his previous book was called Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics).

One important limitation associated with data is the ‘survivor bias’. A study of Allied warplanes returning to Britain after bombing runs over Germany found that most of the bullet and shrapnel holes were on the wings and rear of the plane, but very few on the cockpit, engines, or fuel tanks. The top brass therefore planned to attach protective metal plates to the wings and rear of their aircraft. However, the statistician Abraham Wald pointed out that the planes that returned were, by definition, the ones that had survived the bullets and shrapnel. The planes that had not returned had most likely been struck in the areas that the returning planes had been spared. These were the areas that should be reinforced.

Another problem is the one discussed in my previous blog, that of fake or bad data, arising from the perverse incentives of academia and the publishing world. The ‘publish-or-perish’ climate, together with the wish of journals to publish ‘novel’ or ‘exciting’ results, has led to an exacerbation of ‘Questionable Research Practices’ or outright fakery, with the consequence that an unfortunately high proportion of published papers contain false findings.

Smith is particularly scathing about the practice of data mining, something that for decades was regarded as a major ‘no-no’ in academia. This is particularly problematic in the advent of big data, when machine learning algorithms can scour thousands upon thousands of variables looking for patterns and relationships. However, even among sequences that are randomly generated, correlations between variables will occur. Smith shows this to be the case with randomly generated sequences of his own. He laments that

The harsh truth is that data-mining algorithms are created by mathematicians who often are more interested in mathematical theory than practical reality.

and

The fundamental problem with data mining is that it is very good at finding models that fit the data, but totally useless in gauging whether the models are ludicrous.

When it comes to the choice of linear or non-linear models, Smith says that expert opinion is necessary to decide which is more realistic (though one recent systematic comparison of methods, involving a training set of data and a validation set, found that the non-linear methods associated with machine learning were dominated by the traditional linear methods). Other problems arise with particular forms of regression analysis, such as stepwise regression and ridge regression. Data reduction methods, such as factor analysis or principal components analysis, can also cause problems because the transformed data are hard to interpret. Especially if mined from thousands of variables they will contain nonsense. Smith looks at some dismal attempts to beat the stock market using data mining techniques.

But as if the statistical absurdities weren’t bad enough, Smith’s penultimate chapter – the one that everything else has been leading up to, he says – concerns the application of these techniques to our personal affairs in ways which impinge upon our privacy. For example, software exists that examines the online behaviour of job applicants. Executives who ought to know better may draw inappropriate causal inferences from the data. One of the major examples discussed earlier in the book is Hillary Clinton’s presidential campaign. Although not widely known, her campaign made use of a powerful computer program called Ada (after Ada Lovelace, an early pioneer in AI). This crunched masses of data about potential voters across the country, running 400,000 simulations per day. No-one knows exactly how Ada worked, but it was used to guide decisions about where to target campaigning resources. The opinions of seasoned campaigners were entirely sidelined, including perhaps the greatest campaigner of all – Bill Clinton (reportedly furious about this, too). We all know what happened next.

 

 

Review: The 7 Deadly Sins of Psychology

Screenshot 2018-08-18 at 19.05.15

I remember being taught as an undergraduate psychology student that replication, along with the principle of falsification, was a vital ingredient in the scientific method. But when I flipped through the pages of the journals (back in those pre-digital days), the question that frequently popped into my head was ‘Where are all these replications?’ It was a question I never dared actually ask in class, because I was sure I must simply have been missing something obvious. Now, about 30 years later, it turns out I was right to wonder.

In Chris Chambers’ magisterial new book The 7 Deadly Sins of Psychology, he reports that it wasn’t until 2012 that the first systematic study was conducted into the rate of replication within the field of psychology. Makel, Plucker and Hegarty searched for the term “replicat*” among the 321,411 articles published in the top 100 psychology journals between 1900 and 2012. Just 1.57 per cent of the articles contained this term, and among a randomly selected subsample of 500 papers from that 1.57 per cent,

only 342 reported some form of replication – and of these, just 62 articles reported a direct replication of a previous experiment. On top of that, only 47 per cent of replications within the subsample were produced by independent researchers (p.50).

Why does this matter? It seems that researchers, over a long period, have engaged in a variety of ‘Questionable Research Practices’ (QRPs), motivated by ambitions that are often shaped by the perverse incentives of the publishing industry.

A turning point occurred in 2011 when the Journal of Personality and Social Psychology published Daryl Bem’s now-notorious paper ‘Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect’. Taking a classic paradigm in which an emotional manipulation influences the speed of people’s responses on a subsequent task, Bem conducted a series of studies in which the experimental manipulation happened after participants made their responses. His results seemed to indicate that people’s responses were being influenced by a manipulation that hadn’t yet happened. There was general disbelief among the scientific community and Bem himself said that it was important for other researchers to attempt to replicate his findings. However, when the first – failed – replication was submitted to the same journal, they rejected it on the basis that their policy was to not publish replication studies, whether or not they were successful.

In fact, many top journals – e.g. Nature, Cortex, Brain, Psychological Science – explicitly state, in various ways, that they only publish findings that are novel. A December 2015 study in the British Medical Journal, that perhaps appeared too late for inclusion in Chambers’ book, found that over a forty year period scientific abstracts had shown a steep increase in the use of words relating to novelty or importance (e.g. “novel”, “robust”, “innovative” and “unprecedented”). Clearly, then, researchers know what matters when it comes to getting published.

A minimum requirement, though not the only one, for a result being interesting is that it is statistically significant. In the logic of null hypothesis significance testing (NHST), this means that if chance were the only factor in producing a result, then the probability of getting that result would be less than 5 per cent (or less than 1/20). Thus, researchers hope that any of their key tests will lead to a p-value of less than .05, as – agreed by convention – this allows them to reject the null hypothesis in favour of their experimental hypothesis (the explanation that they are actually proposing, and which they may be invested in).

It is fairly easy to see how the academic journals could be – and almost certainly are – overpopulated with papers that claim evidential support for hypotheses that are false. For instance, suppose many different researchers test a hypothesis that is, unknown to them, incorrect. Perhaps just one researcher finds a significant result, which is a fluke result arising by chance. That one person is likely to get published, whereas the others will not. In reality, many researchers will not bother to submit their null findings. But here lies another problem. A single researcher may conduct several studies of the same hypothesis, but only attempt to the publish the one (or ones) that turn out significant. He or she may feel a little guilty about this, but hey! – they have careers to progress and this is the system that the publishers have forced upon them.

Replication is supposed to help discover which hypotheses are false and which are likely to be true. As we have seen, though, failed replications may never see the light of day. More problematic is the use of ‘conceptual replications’, in which a researcher tries to replicate a previous finding using a methodology that is, to a greater or lesser degree, novel. The researcher can claim to be “extending” the earlier research by testing the generality of its findings. Indeed, having this element of originality may increase the chances of publication. However, as Chambers notes, there are three problems with conceptual replications.

Firstly, how similar must the methods be in order that the new study can count as a replication, and who decides this? Second, there is a risk of certain findings becoming unreplicated. If a successful conceptual replication later on turns out to have produced its result through an entirely different causal mechanism, then the original study has just been unreplicated. Thirdly, attempts at conceptual replication can fuel confirmation bias. If the attempt at a conceptual replication produces a different result to the initial study, the authors of the first study will inevitably claim that their own results were not reproduced precisely because the attempted replication didn’t follow exactly the same methodology.

Chambers sums up the replication situation as follows:

To fit with the demands of journals, psychologists have thus replaced direct replication with conceptual replication, maintaining the comfortable but futile delusion that our science values replication while still satisfying the demands of novelty and originality (p.20).

Because psychologists frequently run studies with more than one independent variable, they typically use statistical tests that provide various main effects and interactions. Unfortunately, this can tempt researchers to operate with a degree of flexibility that isn’t warranted by the original hypothesis. They may engage in HARK-ing – Hypothesizing After the Results are Known. Suppose a researcher predicts a couple of main effects, but that these turn out to be non-significant once the analysis has been performed. Nonetheless, there are some unpredicted significant interactions within the results. The researcher now goes through a process of trying to rationalise why the results turned out this way. Having come up with an explanation, he or she now rewrites the hypotheses as though these results were what had been expected all along. Recent surveys show that psychologists believe the prevalence of HARKing to be somewhere between 40%-90%, though the prevalence of those who admit to doing it themselves is, of course, much lower.

Another form of QRP is p-hacking. This refers to a cluster of practices whereby a researcher can illegitimately transform a non-significant result into a significant one. Suppose an experimental result has a p-value of .08, quite close to the magical threshold of .05 but also likely to be a barrier to publication. At this point, the researcher may try recruiting some new participants to the study in the hope that this will boost the p-value to below .05. However, bearing in mind that there will always be some variation in the way that participants respond, regardless of whether or not a hypothesis is true, then “peeking” at the results and recruiting new participants until such point that p falls below .05 is simply inflating the likelihood of obtaining a false positive result.

A second form of p-hacking is to analyse the data in different ways until you get the result you want. There is no single agreed method for the exclusion of ‘outliers’ in the data, so a researcher may run several analyses in which differing numbers of outliers are excluded, until a significant result is returned. Alternatively, there may be different forms of statistical test that can be applied. All tests are essentially estimates, and while equivalent-but-different tests will produce broadly similar results, the difference of a couple of decimal places or so may be all that is needed to transform a non-significant result into a significant one.

A third form of p-hacking is to change your dependent variables. For example, if three different measures of an effect are all just slightly non-significant, then a researcher might try integrating these into one measure to see if these brings the p-value below .05.

Several recent studies have examined the distributions of p-values in similar kinds of studies and have found that there is often a spike in p-values just below .05, which would appear to be indicative of p-hacking. The conclusion that follows from this is that many of the results in the psychological literature are likely to be false.

Chris Chambers also examines a number of other ways in which the scientific literature can be distorted by incorrect hypotheses. One such way is the hoarding of data. Many journals do not require, or even ask, that authors deposit their data with them. Authors themselves often refuse to provide data when a request is received, or will only provide it under certain restrictive conditions (almost certainly not legally enforceable). Yet one recent study found that statistical errors were more frequent in papers where the authors had failed to provide their data. Refusal to share may, of course, be one way of hiding misconduct. Chambers argues that data sharing should be the norm, not least because even the most scrupulous and honest authors may, over time, lose their own data, whether because of the updating of computer equipment or in the process of changing institutions. And, of course, everyone dies sooner or later. So why not ensure that all research data is held in accessible repositories?

Chapter 7 – The Sin of Bean Counting – covers some ground that I discussed in an earlier blog, when I reviewed Jerry Muller’s book The Tyranny of Metrics.  Academic journals now have a ‘Journal Impact Factor’ (JIF), which uses the citation counts of their papers to index the overall quality of the work published in the journals. Yet, a journal’s JIF is accounted for only by a very small proportion of the papers they carry. Most papers only have a small number of citations. Worse, the supposedly high impact journals are in fact the ones with the highest rates of retractions owing to fraud or suspected fraud. Chambers argues that it would be more accurate to refer to them as “high retraction” journals rather than “high impact” journals. The JIF is also easily massaged by editors and publishers, and, rather than being objectively calculated, is a matter of negotiation between the journals and the company that determines the JIF (Thomson Reuters).

Yet:

Despite all the evidence that JIF is more-or-less worthless, the psychological community has become ensnared in a groupthink that lends it value.

It is used within academic institutions to help determine hiring and promotions, and even redundancies. Many would argue that JIF and other metrics have damaged the collegial atmosphere that one associates with universities, which in many instances have become arenas of overwork, stress and bullying.

Indeed, recent years have seen a number of instances of fraudulent behaviour by psychologists, most notably Diederik Stapel, who invented data for over 50 publications before eventually being exposed by a group of junior researchers and one of his own PhD students. By his own account, he began by engaging in “softer” acts of misrepresentation before graduating to more serious behaviours. Sadly, his PhD students, who had unwittingly incorporated his fraudulent results into their own PhDs (which they were allowed to retain) had their peer-reviewed papers withdrawn from the journals in which they had been published. Equally sad is ‘Kate’s Story’ (also recounted in Chapter 5) which describes the unjust treatment meted out to a young scientist after she was caught up in a fraud investigation against the Principal Investigator of the project she was working on, even though she was not the one who had reported him. Kate is reported as saying that if you suspect someone of making up data, but lack definitive proof, then do not expect any sympathy or support for speaking out.

Fortunately, Chris Chambers has given considerable thought as to how psychology’s replication crisis might be addressed. Indeed, he and a number of other psychologists have been instrumental in effecting some positive changes in academic publishing. His view is that it would be hopeless to try to address the biases (many likely unconscious) that researchers possess. Rather, it is the entire framework of the scientific and publishing enterprise which must be changed. His suggestions include:

  • The pre-registration of studies. Researchers submit their research idea to a journal in advance of carrying out the work. This includes details of hypotheses to be tested, the methodology and the statistical analyses that will be used. If the peer reviewers are happy with the idea, then the journal commits to publication of the findings – however they turn out – if the researchers have indeed carried out the work in a satisfactory manner.
  • The use of p-curve analyses to determine which fields in psychology are suffering from p-hacking.
  • The use of disclosure statements. Joe Simmons and colleagues have pioneered a 21-word statement:

We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.

  • Data sharing.
  • Solutions to allow “optional stopping” during data collection. One method is to reduce the alpha-level every time a researcher “peeks” at the data. A second method is to use Bayesian hypothesis testing instead of NHST. Whereas NHST only actually tests the null hypothesis (and doesn’t provide an estimate of the likelihood of the null hypothesis), the Bayesian approach allows researchers to directly estimate the probability of the null hypothesis relative to the experimental hypothesis.
  • Standardization of research practices. This may not always be possible, but where researchers conduct more than one type of analysis then the details of each should be reported and the robustness of the outcomes summarised.

Chambers devotes most space to the discussion of pre-registration. Many objections have been raised against this idea, and Chambers tackles these objections (convincingly, I think) in his Chapter 8: Redemption.

Although the issue of replication and QRPs is not unique to psychology, evidence indicates that it may be a bigger problem than for other disciplines. Therefore, if psychologists wish to be taken seriously then it is incumbent upon them to clean up their act. Fortunately, a number of psychologists – Chambers included – have been at the forefront of both uncovering poor practice and proposing ways to improve matters. A good starting point for anyone wanting to appreciate the scale of the problem and how to deal with it would be to read this book. Indeed, I think every university in the library should have at least one copy of this book on its shelves, and it should be on the reading list for classes in research methods and statistics. Despite being a book on methodology, I didn’t find it a dry read. On the contrary, it is something of a detective story – like Sherlock Holmes explaining how he worked out whodunnit – and, as such, I found it rather gripping.

 

 

 

Review: ‘The Mind is Flat’ by Nick Chater

The nature of consciousness is a topic that psychologists and philosphers have spilt much ink and many pixels over. Outside of psychoanalytic circles, what has been less discussed is the nature of the ‘unconscious mind’. Claims made by some psychologists about the power of the unconscious mind to influence behaviour have proven controversial.

Now, in a book that will have psychoanalysts and many others protesting loudly, cognitive scientist Nick Chater has plunged a stake through the very concept of an unconscious mind. In The Mind Is Flat Chater argues that our minds have no depths, let alone hidden ones. His primary claim is that the brain exists to make sense of the world by creating a stable perception of it and ourselves; but the brain does not provide us with an account of its own workings. These perceptions are created from our interpretations of a limited number of sensory inputs, with the assistance of various memory traces (themselves based on our interpretations of past events).

Chater’s opening chapter, The Power of Invention, describes how we can create an apparently rich internal picture of a fictional person or location based on a limited description that may have gaps or inconsistencies (Chater discusses Anna Karenina and Gormenghast). So it is with our perceptions of the actual world and, indeed, ourselves.  Most of our visual receptors are incapable of colour detection, yet we perceive the world in glorious colour. Our eyes are continually darting about all over the place, yet our perception of the world is smooth, not jerky. In short, much or most of what we perceive is an illusion foisted upon us by our brains.

Screenshot 2018-08-05 at 16.44.32

For centuries, philosophers consulted their ‘inner oracle’ in order to determine how the world works. Yet, Chater points out, the inner oracle has consistently misled us about concepts such as heat, weight, force and energy. Early researchers in artificial intelligence (AI) tried to do the same thing. They tried to excavate the mental depths of experts, recover ‘common sense theory’ and then devise methods to reason over this database. However, by the 1980s it had become clear that this program was going nowhere, and so was quietly abandoned.

As Chater puts it:

The mind is flat: our mental ‘surface’, the momentary thoughts, explanations and sensory experiences that make up our stream of consciousness is all there is to mental life. (p.31)

One reason why we are unaware of the fictional nature of our perceptions is precisely because our eyes are constantly moving about and picking up new sensory fragments. I may be unaware of the type of flower on the mantelpiece, but if you mention it my eyes go there automatically. In gaze-contingent eye tracking studies, the text on a screen changes according to where a person is looking. In fact, most of the text on the screen consists of Xs. As a participant’s eyes move across the screen the Xs that would have been in their fixation point change to become real words, and the area where they had been looking reverts to Xs. The participant, however, perceives that the entire page consists of meaningful text.

Likewise, when we construct a mental image it is never truly a ‘picture in the mind’. If we are asked to describe some details from the image, we simply ‘create’ those in our imagination in response to the question. Nothing is being retrieved from a complete image.

We often talk about a battle between ‘the heart and the head’, but Chater argues that we are in fact simply posing one reason against another reason. Citing the Kuleshov Effect, and the work of Schacter & Singer (1962) and Aron & Dutton (1974) on the labeling of emotional states, Chater concludes that “our feelings do not burst unbidden from within – they do not pre-exist at all” (p.98). Indeed:

The meaning of pretty much anything comes from its place in a wider network of relationships, causes and effects – not from within. (p.107)

Despite, or perhaps because of, our lack of inner depth, we are extremely good at dreaming up explanations for all kinds of things, including our inner motives. Perhaps my favourite example is from the work on choice blindness, in which participants were asked to choose the most attractive of two faces, each of which was presented on a card. After a participant made their choice, the researcher supposedly passed them the card they had chosen and asked them to explain why they had preferred that face. In fact, the researcher used sleight-of-hand to pass them the face they hadn’t chosen. Most people didn’t spot the discrepancy and readily provided an explanation as to why they preferred the face that they had not in fact chosen.

This research links to a wider body of work in decision making research, which shows that people’s preferences are constructed during the process of choice, depending on various contextual factors, as opposed to the conventional economic account that assumes people to have stable preferences that are revealed by the choices they make.

Chater also goes on to talk about people’s attentional limitations, arguing that – in almost all circumstances – our brains are only able to work on one problem at a time (where a problem is something which requires an act of interpretation on our part, rather than an habitual action such as putting one foot in front of the other when walking). This also fits with decades of work on human judgment, which has repeatedly found that people are unable to reliably integrate multiple items of information when trying to make a judgment.

Finally, Chater isn’t arguing that there are no unconscious processes. However, these unconscious processes aren’t ‘thoughts’. The mind isn’t like an iceberg, with a few thoughts appearing in consciousness and many others below the level of consciousness. Rather, the real nature of the unconscious is “the vastly complex patterns of nervous activity that create and support our slow, conscious experience” (p.175). Thus:

There is just one type of thought, and each thought has two aspects: a conscious read-out, and unconscious processes operating the read-out. And we can have no more conscious access to these brain processes than we can have conscious awareness of the chemistry of digestion or the biophysics of our muscles.

 The Mind is Flat is a book that I wish I’d written, in that it expresses, with evidence, a viewpoint that I have held for some time. The writing is clear and entertaining, and I devoured the book in just a few days. Recommended.

 

British Library exhibition – James Cook: The Voyages

416px-James_Cook's_portrait_by_William_Hodges
Captain James Cook, by William Hodges

For anyone looking for something to do in London before the end of August 2018, I thoroughly recommend a trip to the British Library (nearest rail/tube station: Kings Cross/St. Pancras) to see this new exhibition about the voyages of Captain James Cook.

Born in North Yorkshire in 1728, Cook joined the Royal Navy in 1755, took part in the seven year’s war against Canada and made a name for himself by charting the coast of Newfoundland. In 1765, the Admiralty engaged Cook to lead an expedition to Tahiti in order to observe the Transit of Venus. Commencing in 1768, this was the first of Cook’s three voyages to the Pacific and the order of the exhibition follows these three voyages.

Cook’s first voyage didn’t stop at Tahiti. They also spent six months circumnavigating New Zealand (given that name by the Dutch), where some of the encounters with the native population were violent. From there they went on to Australia, where they charted most of the Eastern coast (the rest of the coast having already been charted by the Dutch). And from Australia they sailed to Batavia, the centre of the Dutch empire in the East Indies. Upon return to Britain, Cook was promoted to Commander.

The second voyage, from 1773-1775, was in search of the Great South Continent. This turned out not to exist, but during the voyage Cook and his men became the first explorers to cross the Antarctic Circle, which they eventually did three times. The journey also took in Easter Island, Dusky Sound (at New Zealand), a sighting of South Georgia, and the New Hebrides (now called Vanuatu).

The third voyage, from 1776-1780, was to the North Pacific, taking in Alaska and the Hawai’ian islands.

The various naturalists and artists that accompanied Cook on his expeditions amassed a valuable collection of plants and artistic renderings of people, animals and landscapes. Some wonderful examples of these are on display in the exhibition galleries. Especially noteworthy are the artistic works of William Hodges, described by Sir David Attenborough as the first academically-trained artist to go on such an expedition (“and it shows”). Hodges accompanied Cook on the second voyage and one of his pictures shows the expedition’s ships dwarfed by the vast icebergs of the Antarctic, something which the general public would never have seen before.

There is no doubt that the voyages must have been exceedingly arduous and fatalities were numerous. Deaths through illness on the first voyage included Sydney Parkinson, famous for his drawings of Maori people and the first European to draw a kangaroo. Another artist, Alexander Buchan, died on Tahiti from an epileptic seizure. The surgeon William Monkhouse died during the stopover at Batavia, as did Tupaia – the High Priest of Tahiti – who had helped Cook chart the Tahitian islands (one current native of Tahiti describes Tupaia as a ‘traitor’, though others speak of him more admiringly).

Men were also lost in violent encounters. In 1773, ten men were killed in a dispute at Queen Charlotte Sound. Cook himself was killed by an angry crowd on the Hawai’ian island of Kealakekua.

What seems clear from the exhibition is that the scientific work carried out by the expeditions was secondary to the unstated goal of colonisation. Whilst the first voyage had the publicly-stated goal to map the Transit of Venus, Cook in fact had secret orders to search for “convenient” land. The exhibition includes testimony from the native people’s of the territories visited by Cook, one of whom notes that the expeditions described Australia as “terra nullius”, or “no people”, indicating that the non-white natives weren’t counted as people. In 1934, Cook’s house was transported from North Yorkshire to Melbourne, yet increasingly the indigenous people and their supporters are questioning the traditional view of Cook, and Australia Day has now become a day of protest for many.

It has been suggested that Cook was relatively egalitarian for a man of his time, yet the exhibition makes clear that he frequently hostaged native chiefs whenever some bit of naval property went missing. Such an event led to his own killing on Hawai’i, though the exact events of that day are unclear due to contradictory accounts. It was in fact a wealthy naturalist, Joseph Banks, who had paid for his place on the first voyage, that first proposed that Australia – specifically, Botany Bay – be used as the location for a penal colony. It is hard not to visit this exhibition and not feel a great sadness of what befell the native peoples of the places Cook visited (much of the worst, of course, came in the wake of Cook’s voyages).

Eventually, some voices of the Enlightenment began to question the wisdom of such ventures and Adam Smith, in his 1776 book The Wealth of Nations, argued in favour of free trade rather than territorial expansion.

In a video display Sir David Attenborough describes Cook as the greatest naval explorer that has ever lived, which seems like a fair assessment in terms of distance travelled, lands explored, and hardships endured. However, his legacy is increasingly under the spotlight. This is an exhibition well worth visiting.

Destroying the soul, by numbers

I think the first time I became aware of metrics in the workplace was between 1990 and 1993, when I was studying for a PhD at the University of Wales, College of Cardiff (now simply ‘Cardiff University’). One day, A4 sheets of paper had appeared on walls and doors in the Psychology Department proclaiming “We are a five star department!” A friend explained to me that this related to our performance in the ‘Research Assessment Exercise’ (RAE), about which I knew nothing. He scoffed at this proclamation in a rather scathing manner, clearly thinking that this kind of rating exercise had little to do with what really mattered in science. I didn’t realise then how right he was. But the RAE was used as a determinant of how much research income institutions could expect from government (via the funding councils).

A few years later, in my first full-time lecturing post, at London Guildhall University, I was put in charge of organising our entry to the next RAE. Part of this pre-exercise exercise was to determine which members of staff would be included and which excluded. Immediately this raised the question in my mind: “If the RAE is supposed to assess a department’s strengths in research, then shouldn’t all staff members be included?” Such was my introduction to the “gaming” of metrics. Every institution was, of course, gaming the system in this and various other ways. Those that could afford it would buy in star performers just before the RAE (often to depart not long afterwards), leading to new rules to prevent such behaviour.

At some point, universities also got landed with the National Student Survey (NSS), which consisted of numerous questions relating to the “student experience”, but with most of the impact falling on lecturing staff who, either explicitly or implicitly, were informed that they needed to improve. With the introduction of – and subsequent increase in – tuition fees, students were now seen as consumers for whom league tables in research and the NSS were sources of information that could be used to distinguish between institutions when applying. The NSS has also led to gaming, sometimes not so subtly – as when lecturers or managers have warned students that they themselves might suffer from a worse educational experience resulting from institutional loss of income as a consequence of their own low ratings.

These changes within universities have been accompanied by another change: an expansion in the number of administrative staff employed and a shift in power away from academics. And academic staff themselves now spend considerably more time on paperwork than was the case in the past.

A new book by Jerry Z. Muller, The Tyranny of Metrics, shows that the experience of higher education is typical of many areas of working life. He traces the history of workplace metrics, the controversies surrounding them and the evidence of their effectiveness (or lack of). As far back as 1862, the Liberal MP Robert Lowe was proposing that the funding of schools should be determined on a payment-by-results basis, a view that was challenged by Matthew Arnold (himself a schools inspector) for the narrow and mechanical conception of education that it promoted.

In the early twentieth century, Frederick Winslow Taylor promoted the idea of “scientific management”, based on his time-and-motion studies of pig iron production in factories. He advocated that people should be paid according to output in a system that required enforced standardisation of methods, enforced adoption of the best implements and working conditions, and enforced cooperation. Note that the use of metrics and pay-for-performance are distinct things, but often go together in practice.

Later in the century, the doctrine of managerialism became more prominent. This is the idea that the differences among organisations are less important than their similarities. Thus, traditional domain-specific expertise is downplayed and senior managers can move from one organisation to another where the same kinds of management techniques are deployed. In the US, Defence Secretary Robert McNamara took metrics to the army, where “body counts” were championed as an index of American progress in Vietnam. Officers increasingly took on a managerial outlook.

The use of metrics found supporters on both the political left and the right. Particularly in the 1960s, the left were suspicious of established elites and demanded greater accountability, whilst the right were suspicious that public sector institutions were run for the benefit of their employees rather than the public. For both sides, numbers seemed to give the appearance of transparency and objectivity.

Other developments included the rising ideology of consumer choice (especially in healthcare), whereby empowerment of the consumer in a competitive market environment would supposedly help to bring down costs. ‘Principal-Agent Theory’ highlighted that there was a gap between the purposes of institutions and the interests of the people who run them and are employed by them. Shareholders’ interests are not necessarily the same as the interests of corporate executives, and the interests of executives are not necessarily the same as those of their subordinates (and so on). Principals (those with an interest) were needed to monitor agents (those charged with carrying out their interests), which meant motivating them with pecuniary rewards and punishments.

In the 1980s, the ‘New Public Management’ developed. This advocated that not-for-profit organisations needed to function more like businesses, such that students, patients, or clients all became “customers”. Three strategies helped determine value for money:

  1. The development of performance indicators (to replace price).
  2. The use of performance-related rewards and punishments.
  3. The development of competition among providers and the transparency of performance indicators.

Critics of this approach have noted that not-for-profit organisations often have multiple purposes that are difficult to isolate and measure, and that their employees tend to be more motivated by the mission rather than the money. Of course, money does matter, but that recognition should come through the basic salary rather than performance-related rewards.

Indeed, evidence indicates that extrinsic (i.e. external to the person) rewards are most effective in commercial organisations. Where a job attracts people for whom intrinsic rewards (e.g. personal satisfaction, verbal praise) are more important, the application of pay-for-performance can undermine intrinsic motivation. Moreover, the people doing the monitoring tend to adopt measures for those things that are most visible or most easily measured, neglecting many other things that are important but which are less visible or not easily measured. This can lead to a distortion of organisational goals.

Many conservative and classic liberal thinkers have criticised such ideas, including Hayek, who drew a comparison with the failed attempts of socialist governments (notably the Soviet Union) at large-scale economic planning. Nonetheless, from Thatcher to Blair, from Clinton to Bush and Obama, politicians of different hues have continued to expand metrics further into the public domain.

Muller is not entirely a naysayer on metrics, noting that they can sometimes genuinely highlight areas of poor performance. In particular, he notes that in the US there have been some success stories associated with the application of metrics in healthcare. However, closer examination of these cases shows that these successes owe more to their being embedded within particular organisational cultures rather than with measurement per se. Indeed, these successes seem to be the exceptions rather than the rule, with other research showing no lasting effect on outcomes and no change in consumer behaviour. Research by the Rand corporation found that stronger methodological design in studies was associated with a lower likelihood of identifying significant improvements associated with pay-for-performance.

What is clear – and Muller looks at universities, schools, medicine, policing, the military, business, charities and foreign aid – is that metrics have a range of unintended consequences. These included various ways in which managers and employees try to game the system, including: teaching to the test (education), treating to the test (medicine), risk aversion (e.g. in medicine, not operating on the most severely ill patients), and short-termism (e.g. police arresting the easy targets rather than chasing down the crime bosses). There is also outright cheating (e.g. teachers changing the test results of their pupils).

Incidentally, another recent book, The Seven Deadly Sins of Psychology (by Chris Chambers) documents how institutional pressures and the publishing system have incentivized a range of behaviours that have led to ‘bad science’. For instance, ‘Journal Impact Factors’ (JIFs) supposedly provide information about the overall quality of the research that appears in different journals. Researchers can cite this information when applying for tenure, promotion, or for their inclusion in the UK’s Research Excellence Framework (formerly the RAE). However, only a small number of publications in any given journal account for most of the citations that feed into the JIF. Another issue with JIFs concerns statistical power – the likelihood that a study will identify a genuine effect (statistical power depends on sample size and several other factors). It turns out that there is no relationship between the JIF and the average level of statistical power within a journal’s publications. Worse, high impact journals have a higher rate of retractions due to errors or outright fraud.

But one of the impacts of metrics is the expansion of resources (people, time, money, equipment) in order to do the necessary monitoring. Even the people being monitored must give up time and effort in order to produce the necessary documentation to satisfy the system. And as new rules are introduced to crack down on attempts to game the system, so the administrative resources are expanded even further. This diversion of resources obviously works against the productivity gains that are supposed to be produced by the application of metrics.

I was less convinced by the penultimate chapter in Muller’s book, in which he addresses transparency in politics and diplomacy. He speaks scornfully of the actions of Chelsea Manning and Edward Snowden in disclosing secret documents, which he says have had detrimental effects on American intelligence. Undoubtedly, transparency can sometimes be a hazard – compromise between different parties is made harder under the full glare of transparency – and there is a balance to be struck, but I would argue that the scale of wrongdoing revealed by these individuals justifies the actions they took and for which they have both paid a price. In the UK, as I write, there is an ongoing scandal over the related issues of illegal blacklisting of trade union activists in the construction industry and spying on political and campaigning groups (including undercover police officers having sexual relationships with campaigners). A current TV program (A Very English Scandal) concerns the leader of a British political party who – in living memory – arranged the attempted murder of his former lover, and was exonerated following an outrageously biased summing up in court by the judge.  And of course the Chilcot report into the Iraq war found that Prime Minister Blair deliberately exaggerated the threat posed by the Iraq regime, and was damning about the way the final decision was made (of which no formal record was kept).

However, as far as the ordinary workplace is concerned, especially in not-for-profit organisations, the message is clear – beware of metrics!

Book review: Behind the Shock Machine (author: Gina Perry)

One evening in 1974, at a home in New Haven, the family of the late Jim McDonough gathered around their television to watch The Phil Donahue Show. To their horror, a piece of 1960s black and white footage was being shown in which Jim was having electrodes attached to his body. Jim was apparently the learner in an experiment whereby he would receive increasingly strong electric shocks whenever he failed to deliver a correct response to a question.

Bearing in mind that Jim had died of a heart attack in the mid-60s, his late wife Kathryn must have been concerned that there might be a connection with this extraordinary piece of research. She wrote to the show’s producer, asking to be put in touch with the man who’d run the experiment, Dr Stanley Milgram. Shortly afterwards, she received a phone call from Milgram, who provided reassurance that her late husband had not in reality received any electric shocks at all. He also sent her an inscribed copy of the book that had caused the media interest: Obedience to Authority.

The Milgram shock experiments are the subject of an enthralling book by psychologist Gina Perry, published in 2012: Behind the Shock Machine: The Untold Story of the Notorious Milgram Psychology Experiments. By sifting through Milgram’s archive material, as well as interviewing some of his experimental subjects  and assistants (or their surviving relatives), Perry shows that the popular account of the shock experiments, as promoted by Milgram himself, is but a pale and dubious version of what really happened and what the research means.

The popular account goes as follows. Milgram wanted to know whether the behaviour of the Nazis during the Holocaust was due to something specific about German culture, or whether it reflected a deeper aspect of humanity. In other words, could the same thing happen anywhere? In order to investigate this question, Milgram created an experimental scenario in which people would be pressured to commit a potentially lethal act. His subjects were recruited through newspaper advertisements in which they were promised payment for taking part in a study of learning and memory. As they arrived at Milgram’s laboratory in Yale University, a second subject (actually a paid staff member) would also appear. The experimenter (also a paid confederate of Milgram’s) explained that they were to take part in a study of the effects of punishment on learning. One of them would be the teacher and the other the learner. The two men drew a piece of paper to determine which would be which, but this was of course rigged: the subject was always the teacher. The teacher was told that any shocks received by the learner would be painful but not dangerous. He would then receive a small shock himself as an illustration of what he would potentially be delivering to the learner. During the experiment, the teacher and learner would be in separate rooms, unseen to each other but connected by audio.

At the beginning of the experiment, the teacher would read out a list of word pairs to the learner. After this, he would read out each target word followed by four words, only one of which was paired with the target. The learner would supposedly press a button corresponding to the word he thought was correct. If the learner picked the wrong word, then the teacher had to flick a switch on a machine in order to deliver an electric shock to the learner. The level of shock increased with each word, varying from 15 volts to 450 volts. The two highest settings on the shock machine were labelled ‘XXX – dangerous, severe shock’. The experimenter was always present to oversee the teacher and, if the teacher began to show concern or balk at giving further shocks, would deliver an increasingly stern series of commands (according to a script) requiring the teacher to carry on.

In the first version of the experiment the teacher did not hear from the learner, but in other experiments the learner would begin to call out in increasing levels of distress once the 150V level was reached. There were additional variations, too, such as having the learner and teacher in the same room, having the teacher place the learner’s hand on the shock plate, changing the actors, changing the location to a downtown building, having the learner mention heart trouble, and using female subjects. The experiments began in August 1961 and concluded in May 1962. During the last three days of the experiments, Milgram shot the documentary footage that would form the basis of his film Obedience.

Obedient subjects were defined as those who delivered the highest possible supposed shock of 450V. In most scenarios about 65% of subjects were classed as obedient, though some of the variations (such as teacher and learner in the same room) did lead to lower levels of obedience. By the time Milgram came to write up his research, the Nazi Adolf Eichmann had been tried and hanged in Israel and Hannah Arendt had coined the phrase “the banality of evil”. The observation that dull administrative processes could lie behind the most atrocious war crimes was an ideal peg on which Milgram could hang his research. In an era when the Korean war had given rise to concerns about brainwashing, the concept of ‘American Eichmanns’ took hold.

Milgram’s first account of his work was published in October 1963 in the Journal of Abnormal Psychology, but his famous book – still in print – did not appear until 1974. The original publication of Milgram’s work, and the later publication of his book, met with a mixed response from academics. Critics raised ethical concerns about the treatment of his subjects, pointed to the lack of any underlying theory, and wondered whether it all really meant anything. Wasn’t Milgram just showing what we all knew already, that people can be pushed to commit extreme acts? In response, Milgram pointed to a survey of psychiatrists in which most of them believed that his subjects would not be willing to cause extreme harm to the learners. He also cited follow-up interviews with subjects by a psychiatrist, Dr Paul Errera, which concluded that they had not been harmed and that most had endorsed Milgram’s research.

In his 1974 book, Milgram provided the theory to explain the behaviour of his obedient subjects. This was the notion of the ‘agentic shift’, according to which the presence of an authority figure leads people to view themselves as the agents of another person and therefore not responsible for their own actions. I can recall reading Obedience to Authority as a student in the late ’80s and being confused. To me, the agentic shift theory didn’t seem to be explaining anything. It simply begged the question of why people might give up their sense of responsibility in the presence of an authority figure. Gina Perry points out that the theory also fails to explain the substantial proportion of people who didn’t obey, not to mention the discomfort, questions and objections of those people who nonetheless ended up delivering the maximum supposed shock (these objections figured in Milgram’s earlier publications but less so in his book). In suggesting that ordinary Americans could behave like Nazis, Milgram was also ignoring the entire counterculture movement and especially widespread protest and civil disobedience in relation to America’s involvement in the Vietnam war.

But Perry goes deeper than merely questioning Milgram’s theory, which many other academics have also done. Her research into the archives resulted in the realisation that, over time, Milgram’s paid actors began to depart from their script. The experimenter was provided with a series of four increasingly strict commands that he was expected to give when faced with a subject who was reluctant to continue. If the subject still refused to continue, then the experimenter was expected to call a halt. But John McDonough, Milgram’s usual paid experimenter, began to extemporise some of his commands and to cycle back through the list of four. In other words, some subjects were classed as obedient when in fact they should have been classed as disobedient.

It also turns out that many or most of Milgram’s subjects were not told straight away that the study they had taken part in was a hoax. In a relatively small community, he didn’t want the word to get about that this was the case. Despite this, in the published reports Milgram referred to “dehoaxing” the subjects at the end of the study. Subjects were sent a report about the study, including that the procedure had been a hoax, a little while after the entire series of studies had been completed. However, for whatever reason, some of the people that Gina Perry tracked down said they had never received such a report. They had gone most of their lives not knowing the truth.

Worse than this, contrary to what Milgram claimed, it is clear that some subjects were not happy about the nature of his research, either at the time (the usual experimenter, John Williams, appears to have been assaulted on more than one occasion) or later on. Some appear to have been adversely affected by their participation. In some cases, Milgram did manage to mollify people by taking them into his confidence. He then cited them as evidence that subjects were happy to endorse his studies. Some of Milgram’s subjects were Jewish, an ironic fact given Milgram’s linkage of his research to the Holocaust (Milgram himself was Jewish, but this was not something he disclosed in his earlier writings).

It also turns out that the clean bill of health given to Milgram’s research by the Yale psychiatrist Paul Errera was not quite what it seems. In fact, Errera’s interviews with some of Milgram’s subjects had taken place at the insistence of Yale University after complaints had been made. Only a small proportion of subjects were contacted and an even smaller number agreed to be interviewed, but in his book Milgram referred to these – against Errera’s wishes – as the “worst cases”, who had nonetheless endorsed his work. Milgram actually watched the interviews from behind a one-way mirror and, in some instances, revealed himself to the subjects and engaged in interaction with them. Perry suggests that Errera’s endorsement of Milgram’s work may have been influenced by his reluctance to derail the career of a young psychologist who clearly had so much riding on his controversial research. In any case, the presence of Milgram at the interviews was hardly ideal.

Milgram moved to Harvard University in July 1963. Perhaps mindful of the controversy surrounding his work, his research there avoided personal contact with subjects. In 1967, having been denied tenure at Harvard, he left for a job at the City University of New York. Perry notes that with both staff and students Milgram could alternate between graciousness and rudeness. She wonders if his mood swings might have been influenced by his drug use. This doesn’t feature highly in the book, but Milgram had been using drugs since his student days, including marijuana, cocaine and methamphetamine. When writing Obedience to Authority he used drugs to help overcome his writer’s block and occasionally kept notes on the influence of his intake on the creative process.

Did his research ultimately tell us much at all? It seems unlikely that it really sheds light on the Holocaust, an event involving the actions of people working in groups and in the grip of a specific ideology. By contrast, Milgram’s subjects were acting as individuals in a highly ambiguous context. On the one hand they believed they were being instructed by a scientist, a highly trusted figure whom they would have been reluctant to let down. On the other hand, the setup didn’t make sense. Why was it necessary for a member of the public to play the role of the teacher in the experiment? Why didn’t the experimenter do this for himself? Also, some of Milgram’s own subjects were aware that punishment is not an effective method for making people learn, something that was well-established by the time that he ran his studies. One of Milgram’s research assistants, Taketo Murata, conducted an analysis that showed that the subjects who delivered the maximum shock were more often the ones who expressed disbelief in the veracity of the setup. Whilst Milgram argued that their responses after the study couldn’t be trusted, he was nonetheless happy to use these when it suited him.

Gina Perry shows that in private Milgram often shared many of the doubts that critics voiced about his work, including their ethical concerns. Publicly, though, he strongly defended his work, and more so with the passage of time. He wanted to be seen among the greats of social psychology, including his own mentor Solomon Asch, whose work on conformity was an obvious precursor to Milgram’s work. It seems, though, that Asch eventually stopped responding to Milgram’s letters, presumably increasingly uncomfortable with the ethical issues surrounding the shock experiments. Another famous psychologist, Lawrence Kohlberg, had watched some of the experimental trials with Milgram behind the one-way mirror. Yet he subsequently regretted his own passivity in the face of unethical research. In a letter to the New York Times he described Milgram as “another victim, another banal perpetrator of evil”.

What about Milgram’s paid actors, Williams and McDonough? Were they also culpable in perpetrating evil? Perry is sympathetic to these men. Like the subjects, they had been duped. They needed the money and had responded to an advertisement for assistants in a study of learning and memory. Possibly as the trials proceeded, they themselves became desensitized to what was happening. In any case, they received two pay rises from Milgram in recognition of the efforts they were making on his behalf. Another actor, Bob Tracy, took part in some trials but quit after an army buddy arrived at the lab and he couldn’t go through with the deception. But what kind of pressure were Williams and McDonough under? We know that Williams was assaulted more than once in the lab. And both men were dead of heart attacks within five years of the research ending. This is ironic, as many of the experiments featured the learner stating at the outset that he had a heart problem. There is also evidence that McDonough did experience a heart ‘flutter’ during one of the trials. Did Milgram know about his heart problem and deliberately incorporate this into the experimental scenario?

In conclusion, it is undeniably true that human beings, under certain circumstances, can do terrible things. But Gina Perry has done us a great service by showing that the behaviour of authority figures does not automatically turn us into unthinking automata who will commit atrocities. Through an exemplary piece of detective work she has shown that the people who served as Milgram’s subjects were, by turn, concerned, questioning, rebellious and even disbelieving. Some, though, were affected by the experiments for years afterwards. After all, if you had been pressured into delivering  very painful shocks, and possibly a lethal shock, in the name of science, only to be told that you were the person being studied, and possibly not being told that no real shocks were delivered, how would you feel about yourself later on?

Note: Gina Perry is also the author of a new book ‘The Lost Boys’, which I hope to write about in due course.