Double book review: Margaret Boden and Gary Smith on Artificial Intelligence

AI – Its nature and future, by Margaret A. Boden. Oxford University Press. 2016.

The AI Delusion, by Gary Smith. Oxford University Press. 2018.

AI, machine learning, algorithms, robots, automation, chatbots, sexbots, androids – in recent years all these terms have regularly been appearing in the media, either to tell us about the latest achievements in technology, about exciting future possibilities, or in the context of warnings about threats to our jobs and freedoms.

Two recent books, from Margaret Boden and Gary Smith, respectively, are useful guides to the perplexed in explaining the issues. Each is clearly written and highly readable. Margaret Boden, Research Professor of Cognitive Science at the University of Sussex, begins with a basic definition:

Artificial intelligence (AI) seeks to make computers do the sorts of things that minds can do.

People who work in AI tend to work in one of two different camps (though occasionally both). They either take a technological approach, whereby they attempt to create systems that can perform certain tasks, regardless of how they do it; or they take a scientific approach, whereby they are interested in answering questions about human beings or other living things.

Screenshot 2018-09-02 at 17.27.36

Boden’s book is essentially a potted history of the field, guiding the reader through the different approaches and philosophical arguments. Alan Turing, of Bletchley Park fame, seems to have envisaged all the current developments in the field, though during his lifetime the technology wasn’t available to implement these ideas. The first approach to hit the big time is now known as ‘Good Old-Fashioned AI (GOFAI)’. This assumes that intelligence arises from physical entities that can process symbols in the right kind of way, whether these entities are living organisms, arrangements of tin cans, silicon chips or whatever else. The other approaches are not reliant on sequential symbol processing. These are: 1. Artificial Neural Networks (ANNs), or connectionism, 2. Evolutionary programming, 3. Cellular automata (CA), and 4. Dynamical systems. Some researchers argue in favour of hybrid systems that combine elements of symbolic and non-symbolic processing.

For much of the 1950s, researchers of different theoretical persuasions all attended the same conferences and exchanged ideas, but in the late ’50s and 1960s a schism developed. In 1956 John McCarthy coined the term ‘Artificial Intelligence’ to refer to the symbol processing approach. This was seized upon by journalists, particularly as this approach began to have successes with the Logic Theory Machine (Newell & Simon) and General Problem Solver (Newell, Shaw, and Simon). By contrast, Frank Rosenblatt’s connectionist Perceptron model was found to have serious limitations and was contemptuously dismissed by many symbolists. Professional jealousies were aroused and communication between the symbolists and the others broke down. Worse, funding for the connectionist approach largely dried up.

Work within the symbol processing, or ‘classical’, approach has taught us some important lessons. These include the need to make problems tractable by directing attention to only part of the ‘search space’, by making simplifying assumptions and by ordering the search efficiently. However, the symbolic approaches also faced the issue of ‘combinatorial explosion’, meaning that logical processes would draw conclusions that were true but irrelevant. Likewise, in classical – or ‘monotonic’ – logic, once something is proved to be true it stays true, but in everyday life that is often not the case. Boden writes:

AI has taught us that human minds are hugely richer, and more subtle,  than psychologists previously imagined. Indeed, that is the main lesson to be learned from AI.

Throughout the lean years for connectionist AI a number of researchers had plugged away regardless, and in the late 1980s there was a sudden explosion of research under the name of ‘Parallel Distributed Processing’ (PDP). These models consist of many interconnected units, each one capable of computing only one thing. There are multiple layers of units, including an input layer, an output layer, and a ‘hidden layer’ or layers in between. Some connections feed forward, others backwards, and others connect laterally. Concepts are represented within the state of the entire network rather than within individual units.

PDP models have had a number of successes, including their ability to deal with messy input. Perhaps the most notable finding occured when a network produced over-generalisation of past tense learning (e.g. saying ‘go-ed’ rather than ‘went’), indicating – contrary to Chomsky – that this aspect of language learning may not be an inborn linguistic rule. Consequently, the research funding tap was turned back on, especially from the US Department of Defense. Nonetheless, PDP models have their own weaknesses too, such as not being able to represent precision as well as classical models:

Q: What’s 2 + 2?

A: Very probably 4.

Learning within ANN’s usually involves changing the strength (the ‘weights’) of the links between units, as expressed in the saying “fire together, wire together”. It involves the application of ‘backprop’ (backwards propagation) algorithms which trace responsibility for performance back from the output layer into the hidden layers, identifying the units that need to be adopted, and thence to the input layer. The algorithm needs to know the precise state of the output layer when the network is giving the correct answer.

Although PDP propaganda plays up the similarity between network models and the brain’s neuronal connections, in fact there is no backwards propagation in the brain. Synapses feed forwards only. Also, brains aren’t strict hierarchies. Boden also notes (p.91):

a single neuron is as computationally complex as an entire PDP system, or even a small computer.

Subsequent to the 1980s PDP work it has been discovered that connections aren’t everything:

Biological circuits can sometimes alter their computational function (not merely make it more or less probable), due to chemicals diffusing through the brain.

One example of this is Nitrous Oxide. Researchers have now developed new types of ANNs, including GasNets, used to evolve “brains for autonomous robots.

Boden also discusses other approaches within the umbrella of AI, including robots and artificial life (‘A-life’), and evolutionary AI. These take in concepts such as distributed cognition (minds are not within individual heads), swarm intelligence (simple rules can lead to complex behaviours), and genetic algorithms (programs are allowed to change themselves, using random variation and non-random selection).

But are any of these systems intelligent? Many AI models have been very successful within specific domains and have outperformed human experts. However, the essence of human intelligence – even though the word itself does not have a standard definition among psychologists – is that it involves the ability to perform in many different domains, including perception, language, memory, creativity, decision making, social behaviour, morality, and so on. Emotions appear to be an important part of human thought and behaviour, too. Boden notes that there have been advances in the modelling of emotion, and there are programs that have demonstrated a certain degree of creativity. There are also some programs that operate in more than one domain, but are still nowhere near matching human abilities. However, unlike some people who have warned about the ‘singularity’ – the moment when machine intelligence exceeds that of humans – Boden does not envisage this happening. Indeed, whilst she holds the view that, in principle, truly intelligent behaviour could arise in non-biological systems, in practice this might not be the case.

Likewise, the title of Gary Smith’s book is not intended to decry all research within the field of AI. He also agrees that many achievements have occurred and will continue to do so. However, the ‘delusion’ of the title occurs when people assign to computers an ability that they do not in fact possess. Excessive trust can be dangerous. For Smith:

True intelligence is the ability to recognize and assess the essence of a situation.

This is precisely what he argues AI systems cannot do. He gives the example of a drawing of a box cart. Computer systems can’t identify this object, he says, whereas almost any human being could not only identify it, but suggest who might use it, what it might be used for, what the name on the side means, and so on.

Screenshot 2018-09-02 at 17.28.21

Smith refers to the Winograd Schema Challenge. The Stanford Computer Science Professor, Terry Winograd, has put up a $25,000 prize for anyone who can design a system that is at least 90% accurate in interpreting sentences like this one:

I can’t cut that tree down with that axe; it is too [thick/small]

Most people realise that if the bracketed work is ‘thick’ it refers to the tree, whereas if it is ‘small’ it refers to the axe. Computers are typically – ahem – stumped by this kind of sentence, because they lack the real-world experience to put words in context.

Much of Smith’s concern is about the data-driven (rather than theory-driven) way that machine learning approaches use statistics. In essence, when a machine learning program processes data it does not stop to ask ‘Where did the data come from?’ or ‘Why these data?’ These are important questions to ask and Smith takes us through various problems that can arise with data (his previous book was called Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics).

One important limitation associated with data is the ‘survivor bias’. A study of Allied warplanes returning to Britain after bombing runs over Germany found that most of the bullet and shrapnel holes were on the wings and rear of the plane, but very few on the cockpit, engines, or fuel tanks. The top brass therefore planned to attach protective metal plates to the wings and rear of their aircraft. However, the statistician Abraham Wald pointed out that the planes that returned were, by definition, the ones that had survived the bullets and shrapnel. The planes that had not returned had most likely been struck in the areas that the returning planes had been spared. These were the areas that should be reinforced.

Another problem is the one discussed in my previous blog, that of fake or bad data, arising from the perverse incentives of academia and the publishing world. The ‘publish-or-perish’ climate, together with the wish of journals to publish ‘novel’ or ‘exciting’ results, has led to an exacerbation of ‘Questionable Research Practices’ or outright fakery, with the consequence that an unfortunately high proportion of published papers contain false findings.

Smith is particularly scathing about the practice of data mining, something that for decades was regarded as a major ‘no-no’ in academia. This is particularly problematic in the advent of big data, when machine learning algorithms can scour thousands upon thousands of variables looking for patterns and relationships. However, even among sequences that are randomly generated, correlations between variables will occur. Smith shows this to be the case with randomly generated sequences of his own. He laments that

The harsh truth is that data-mining algorithms are created by mathematicians who often are more interested in mathematical theory than practical reality.

and

The fundamental problem with data mining is that it is very good at finding models that fit the data, but totally useless in gauging whether the models are ludicrous.

When it comes to the choice of linear or non-linear models, Smith says that expert opinion is necessary to decide which is more realistic (though one recent systematic comparison of methods, involving a training set of data and a validation set, found that the non-linear methods associated with machine learning were dominated by the traditional linear methods). Other problems arise with particular forms of regression analysis, such as stepwise regression and ridge regression. Data reduction methods, such as factor analysis or principal components analysis, can also cause problems because the transformed data are hard to interpret. Especially if mined from thousands of variables they will contain nonsense. Smith looks at some dismal attempts to beat the stock market using data mining techniques.

But as if the statistical absurdities weren’t bad enough, Smith’s penultimate chapter – the one that everything else has been leading up to, he says – concerns the application of these techniques to our personal affairs in ways which impinge upon our privacy. For example, software exists that examines the online behaviour of job applicants. Executives who ought to know better may draw inappropriate causal inferences from the data. One of the major examples discussed earlier in the book is Hillary Clinton’s presidential campaign. Although not widely known, her campaign made use of a powerful computer program called Ada (after Ada Lovelace, an early pioneer in AI). This crunched masses of data about potential voters across the country, running 400,000 simulations per day. No-one knows exactly how Ada worked, but it was used to guide decisions about where to target campaigning resources. The opinions of seasoned campaigners were entirely sidelined, including perhaps the greatest campaigner of all – Bill Clinton (reportedly furious about this, too). We all know what happened next.

 

 

Destroying the soul, by numbers

I think the first time I became aware of metrics in the workplace was between 1990 and 1993, when I was studying for a PhD at the University of Wales, College of Cardiff (now simply ‘Cardiff University’). One day, A4 sheets of paper had appeared on walls and doors in the Psychology Department proclaiming “We are a five star department!” A friend explained to me that this related to our performance in the ‘Research Assessment Exercise’ (RAE), about which I knew nothing. He scoffed at this proclamation in a rather scathing manner, clearly thinking that this kind of rating exercise had little to do with what really mattered in science. I didn’t realise then how right he was. But the RAE was used as a determinant of how much research income institutions could expect from government (via the funding councils).

A few years later, in my first full-time lecturing post, at London Guildhall University, I was put in charge of organising our entry to the next RAE. Part of this pre-exercise exercise was to determine which members of staff would be included and which excluded. Immediately this raised the question in my mind: “If the RAE is supposed to assess a department’s strengths in research, then shouldn’t all staff members be included?” Such was my introduction to the “gaming” of metrics. Every institution was, of course, gaming the system in this and various other ways. Those that could afford it would buy in star performers just before the RAE (often to depart not long afterwards), leading to new rules to prevent such behaviour.

At some point, universities also got landed with the National Student Survey (NSS), which consisted of numerous questions relating to the “student experience”, but with most of the impact falling on lecturing staff who, either explicitly or implicitly, were informed that they needed to improve. With the introduction of – and subsequent increase in – tuition fees, students were now seen as consumers for whom league tables in research and the NSS were sources of information that could be used to distinguish between institutions when applying. The NSS has also led to gaming, sometimes not so subtly – as when lecturers or managers have warned students that they themselves might suffer from a worse educational experience resulting from institutional loss of income as a consequence of their own low ratings.

These changes within universities have been accompanied by another change: an expansion in the number of administrative staff employed and a shift in power away from academics. And academic staff themselves now spend considerably more time on paperwork than was the case in the past.

A new book by Jerry Z. Muller, The Tyranny of Metrics, shows that the experience of higher education is typical of many areas of working life. He traces the history of workplace metrics, the controversies surrounding them and the evidence of their effectiveness (or lack of). As far back as 1862, the Liberal MP Robert Lowe was proposing that the funding of schools should be determined on a payment-by-results basis, a view that was challenged by Matthew Arnold (himself a schools inspector) for the narrow and mechanical conception of education that it promoted.

In the early twentieth century, Frederick Winslow Taylor promoted the idea of “scientific management”, based on his time-and-motion studies of pig iron production in factories. He advocated that people should be paid according to output in a system that required enforced standardisation of methods, enforced adoption of the best implements and working conditions, and enforced cooperation. Note that the use of metrics and pay-for-performance are distinct things, but often go together in practice.

Later in the century, the doctrine of managerialism became more prominent. This is the idea that the differences among organisations are less important than their similarities. Thus, traditional domain-specific expertise is downplayed and senior managers can move from one organisation to another where the same kinds of management techniques are deployed. In the US, Defence Secretary Robert McNamara took metrics to the army, where “body counts” were championed as an index of American progress in Vietnam. Officers increasingly took on a managerial outlook.

The use of metrics found supporters on both the political left and the right. Particularly in the 1960s, the left were suspicious of established elites and demanded greater accountability, whilst the right were suspicious that public sector institutions were run for the benefit of their employees rather than the public. For both sides, numbers seemed to give the appearance of transparency and objectivity.

Other developments included the rising ideology of consumer choice (especially in healthcare), whereby empowerment of the consumer in a competitive market environment would supposedly help to bring down costs. ‘Principal-Agent Theory’ highlighted that there was a gap between the purposes of institutions and the interests of the people who run them and are employed by them. Shareholders’ interests are not necessarily the same as the interests of corporate executives, and the interests of executives are not necessarily the same as those of their subordinates (and so on). Principals (those with an interest) were needed to monitor agents (those charged with carrying out their interests), which meant motivating them with pecuniary rewards and punishments.

In the 1980s, the ‘New Public Management’ developed. This advocated that not-for-profit organisations needed to function more like businesses, such that students, patients, or clients all became “customers”. Three strategies helped determine value for money:

  1. The development of performance indicators (to replace price).
  2. The use of performance-related rewards and punishments.
  3. The development of competition among providers and the transparency of performance indicators.

Critics of this approach have noted that not-for-profit organisations often have multiple purposes that are difficult to isolate and measure, and that their employees tend to be more motivated by the mission rather than the money. Of course, money does matter, but that recognition should come through the basic salary rather than performance-related rewards.

Indeed, evidence indicates that extrinsic (i.e. external to the person) rewards are most effective in commercial organisations. Where a job attracts people for whom intrinsic rewards (e.g. personal satisfaction, verbal praise) are more important, the application of pay-for-performance can undermine intrinsic motivation. Moreover, the people doing the monitoring tend to adopt measures for those things that are most visible or most easily measured, neglecting many other things that are important but which are less visible or not easily measured. This can lead to a distortion of organisational goals.

Many conservative and classic liberal thinkers have criticised such ideas, including Hayek, who drew a comparison with the failed attempts of socialist governments (notably the Soviet Union) at large-scale economic planning. Nonetheless, from Thatcher to Blair, from Clinton to Bush and Obama, politicians of different hues have continued to expand metrics further into the public domain.

Muller is not entirely a naysayer on metrics, noting that they can sometimes genuinely highlight areas of poor performance. In particular, he notes that in the US there have been some success stories associated with the application of metrics in healthcare. However, closer examination of these cases shows that these successes owe more to their being embedded within particular organisational cultures rather than with measurement per se. Indeed, these successes seem to be the exceptions rather than the rule, with other research showing no lasting effect on outcomes and no change in consumer behaviour. Research by the Rand corporation found that stronger methodological design in studies was associated with a lower likelihood of identifying significant improvements associated with pay-for-performance.

What is clear – and Muller looks at universities, schools, medicine, policing, the military, business, charities and foreign aid – is that metrics have a range of unintended consequences. These included various ways in which managers and employees try to game the system, including: teaching to the test (education), treating to the test (medicine), risk aversion (e.g. in medicine, not operating on the most severely ill patients), and short-termism (e.g. police arresting the easy targets rather than chasing down the crime bosses). There is also outright cheating (e.g. teachers changing the test results of their pupils).

Incidentally, another recent book, The Seven Deadly Sins of Psychology (by Chris Chambers) documents how institutional pressures and the publishing system have incentivized a range of behaviours that have led to ‘bad science’. For instance, ‘Journal Impact Factors’ (JIFs) supposedly provide information about the overall quality of the research that appears in different journals. Researchers can cite this information when applying for tenure, promotion, or for their inclusion in the UK’s Research Excellence Framework (formerly the RAE). However, only a small number of publications in any given journal account for most of the citations that feed into the JIF. Another issue with JIFs concerns statistical power – the likelihood that a study will identify a genuine effect (statistical power depends on sample size and several other factors). It turns out that there is no relationship between the JIF and the average level of statistical power within a journal’s publications. Worse, high impact journals have a higher rate of retractions due to errors or outright fraud.

But one of the impacts of metrics is the expansion of resources (people, time, money, equipment) in order to do the necessary monitoring. Even the people being monitored must give up time and effort in order to produce the necessary documentation to satisfy the system. And as new rules are introduced to crack down on attempts to game the system, so the administrative resources are expanded even further. This diversion of resources obviously works against the productivity gains that are supposed to be produced by the application of metrics.

I was less convinced by the penultimate chapter in Muller’s book, in which he addresses transparency in politics and diplomacy. He speaks scornfully of the actions of Chelsea Manning and Edward Snowden in disclosing secret documents, which he says have had detrimental effects on American intelligence. Undoubtedly, transparency can sometimes be a hazard – compromise between different parties is made harder under the full glare of transparency – and there is a balance to be struck, but I would argue that the scale of wrongdoing revealed by these individuals justifies the actions they took and for which they have both paid a price. In the UK, as I write, there is an ongoing scandal over the related issues of illegal blacklisting of trade union activists in the construction industry and spying on political and campaigning groups (including undercover police officers having sexual relationships with campaigners). A current TV program (A Very English Scandal) concerns the leader of a British political party who – in living memory – arranged the attempted murder of his former lover, and was exonerated following an outrageously biased summing up in court by the judge.  And of course the Chilcot report into the Iraq war found that Prime Minister Blair deliberately exaggerated the threat posed by the Iraq regime, and was damning about the way the final decision was made (of which no formal record was kept).

However, as far as the ordinary workplace is concerned, especially in not-for-profit organisations, the message is clear – beware of metrics!

Book review: Behind the Shock Machine (author: Gina Perry)

One evening in 1974, at a home in New Haven, the family of the late Jim McDonough gathered around their television to watch The Phil Donahue Show. To their horror, a piece of 1960s black and white footage was being shown in which Jim was having electrodes attached to his body. Jim was apparently the learner in an experiment whereby he would receive increasingly strong electric shocks whenever he failed to deliver a correct response to a question.

Bearing in mind that Jim had died of a heart attack in the mid-60s, his late wife Kathryn must have been concerned that there might be a connection with this extraordinary piece of research. She wrote to the show’s producer, asking to be put in touch with the man who’d run the experiment, Dr Stanley Milgram. Shortly afterwards, she received a phone call from Milgram, who provided reassurance that her late husband had not in reality received any electric shocks at all. He also sent her an inscribed copy of the book that had caused the media interest: Obedience to Authority.

The Milgram shock experiments are the subject of an enthralling book by psychologist Gina Perry, published in 2012: Behind the Shock Machine: The Untold Story of the Notorious Milgram Psychology Experiments. By sifting through Milgram’s archive material, as well as interviewing some of his experimental subjects  and assistants (or their surviving relatives), Perry shows that the popular account of the shock experiments, as promoted by Milgram himself, is but a pale and dubious version of what really happened and what the research means.

The popular account goes as follows. Milgram wanted to know whether the behaviour of the Nazis during the Holocaust was due to something specific about German culture, or whether it reflected a deeper aspect of humanity. In other words, could the same thing happen anywhere? In order to investigate this question, Milgram created an experimental scenario in which people would be pressured to commit a potentially lethal act. His subjects were recruited through newspaper advertisements in which they were promised payment for taking part in a study of learning and memory. As they arrived at Milgram’s laboratory in Yale University, a second subject (actually a paid staff member) would also appear. The experimenter (also a paid confederate of Milgram’s) explained that they were to take part in a study of the effects of punishment on learning. One of them would be the teacher and the other the learner. The two men drew a piece of paper to determine which would be which, but this was of course rigged: the subject was always the teacher. The teacher was told that any shocks received by the learner would be painful but not dangerous. He would then receive a small shock himself as an illustration of what he would potentially be delivering to the learner. During the experiment, the teacher and learner would be in separate rooms, unseen to each other but connected by audio.

At the beginning of the experiment, the teacher would read out a list of word pairs to the learner. After this, he would read out each target word followed by four words, only one of which was paired with the target. The learner would supposedly press a button corresponding to the word he thought was correct. If the learner picked the wrong word, then the teacher had to flick a switch on a machine in order to deliver an electric shock to the learner. The level of shock increased with each word, varying from 15 volts to 450 volts. The two highest settings on the shock machine were labelled ‘XXX – dangerous, severe shock’. The experimenter was always present to oversee the teacher and, if the teacher began to show concern or balk at giving further shocks, would deliver an increasingly stern series of commands (according to a script) requiring the teacher to carry on.

In the first version of the experiment the teacher did not hear from the learner, but in other experiments the learner would begin to call out in increasing levels of distress once the 150V level was reached. There were additional variations, too, such as having the learner and teacher in the same room, having the teacher place the learner’s hand on the shock plate, changing the actors, changing the location to a downtown building, having the learner mention heart trouble, and using female subjects. The experiments began in August 1961 and concluded in May 1962. During the last three days of the experiments, Milgram shot the documentary footage that would form the basis of his film Obedience.

Obedient subjects were defined as those who delivered the highest possible supposed shock of 450V. In most scenarios about 65% of subjects were classed as obedient, though some of the variations (such as teacher and learner in the same room) did lead to lower levels of obedience. By the time Milgram came to write up his research, the Nazi Adolf Eichmann had been tried and hanged in Israel and Hannah Arendt had coined the phrase “the banality of evil”. The observation that dull administrative processes could lie behind the most atrocious war crimes was an ideal peg on which Milgram could hang his research. In an era when the Korean war had given rise to concerns about brainwashing, the concept of ‘American Eichmanns’ took hold.

Milgram’s first account of his work was published in October 1963 in the Journal of Abnormal Psychology, but his famous book – still in print – did not appear until 1974. The original publication of Milgram’s work, and the later publication of his book, met with a mixed response from academics. Critics raised ethical concerns about the treatment of his subjects, pointed to the lack of any underlying theory, and wondered whether it all really meant anything. Wasn’t Milgram just showing what we all knew already, that people can be pushed to commit extreme acts? In response, Milgram pointed to a survey of psychiatrists in which most of them believed that his subjects would not be willing to cause extreme harm to the learners. He also cited follow-up interviews with subjects by a psychiatrist, Dr Paul Errera, which concluded that they had not been harmed and that most had endorsed Milgram’s research.

In his 1974 book, Milgram provided the theory to explain the behaviour of his obedient subjects. This was the notion of the ‘agentic shift’, according to which the presence of an authority figure leads people to view themselves as the agents of another person and therefore not responsible for their own actions. I can recall reading Obedience to Authority as a student in the late ’80s and being confused. To me, the agentic shift theory didn’t seem to be explaining anything. It simply begged the question of why people might give up their sense of responsibility in the presence of an authority figure. Gina Perry points out that the theory also fails to explain the substantial proportion of people who didn’t obey, not to mention the discomfort, questions and objections of those people who nonetheless ended up delivering the maximum supposed shock (these objections figured in Milgram’s earlier publications but less so in his book). In suggesting that ordinary Americans could behave like Nazis, Milgram was also ignoring the entire counterculture movement and especially widespread protest and civil disobedience in relation to America’s involvement in the Vietnam war.

But Perry goes deeper than merely questioning Milgram’s theory, which many other academics have also done. Her research into the archives resulted in the realisation that, over time, Milgram’s paid actors began to depart from their script. The experimenter was provided with a series of four increasingly strict commands that he was expected to give when faced with a subject who was reluctant to continue. If the subject still refused to continue, then the experimenter was expected to call a halt. But John McDonough, Milgram’s usual paid experimenter, began to extemporise some of his commands and to cycle back through the list of four. In other words, some subjects were classed as obedient when in fact they should have been classed as disobedient.

It also turns out that many or most of Milgram’s subjects were not told straight away that the study they had taken part in was a hoax. In a relatively small community, he didn’t want the word to get about that this was the case. Despite this, in the published reports Milgram referred to “dehoaxing” the subjects at the end of the study. Subjects were sent a report about the study, including that the procedure had been a hoax, a little while after the entire series of studies had been completed. However, for whatever reason, some of the people that Gina Perry tracked down said they had never received such a report. They had gone most of their lives not knowing the truth.

Worse than this, contrary to what Milgram claimed, it is clear that some subjects were not happy about the nature of his research, either at the time (the usual experimenter, John Williams, appears to have been assaulted on more than one occasion) or later on. Some appear to have been adversely affected by their participation. In some cases, Milgram did manage to mollify people by taking them into his confidence. He then cited them as evidence that subjects were happy to endorse his studies. Some of Milgram’s subjects were Jewish, an ironic fact given Milgram’s linkage of his research to the Holocaust (Milgram himself was Jewish, but this was not something he disclosed in his earlier writings).

It also turns out that the clean bill of health given to Milgram’s research by the Yale psychiatrist Paul Errera was not quite what it seems. In fact, Errera’s interviews with some of Milgram’s subjects had taken place at the insistence of Yale University after complaints had been made. Only a small proportion of subjects were contacted and an even smaller number agreed to be interviewed, but in his book Milgram referred to these – against Errera’s wishes – as the “worst cases”, who had nonetheless endorsed his work. Milgram actually watched the interviews from behind a one-way mirror and, in some instances, revealed himself to the subjects and engaged in interaction with them. Perry suggests that Errera’s endorsement of Milgram’s work may have been influenced by his reluctance to derail the career of a young psychologist who clearly had so much riding on his controversial research. In any case, the presence of Milgram at the interviews was hardly ideal.

Milgram moved to Harvard University in July 1963. Perhaps mindful of the controversy surrounding his work, his research there avoided personal contact with subjects. In 1967, having been denied tenure at Harvard, he left for a job at the City University of New York. Perry notes that with both staff and students Milgram could alternate between graciousness and rudeness. She wonders if his mood swings might have been influenced by his drug use. This doesn’t feature highly in the book, but Milgram had been using drugs since his student days, including marijuana, cocaine and methamphetamine. When writing Obedience to Authority he used drugs to help overcome his writer’s block and occasionally kept notes on the influence of his intake on the creative process.

Did his research ultimately tell us much at all? It seems unlikely that it really sheds light on the Holocaust, an event involving the actions of people working in groups and in the grip of a specific ideology. By contrast, Milgram’s subjects were acting as individuals in a highly ambiguous context. On the one hand they believed they were being instructed by a scientist, a highly trusted figure whom they would have been reluctant to let down. On the other hand, the setup didn’t make sense. Why was it necessary for a member of the public to play the role of the teacher in the experiment? Why didn’t the experimenter do this for himself? Also, some of Milgram’s own subjects were aware that punishment is not an effective method for making people learn, something that was well-established by the time that he ran his studies. One of Milgram’s research assistants, Taketo Murata, conducted an analysis that showed that the subjects who delivered the maximum shock were more often the ones who expressed disbelief in the veracity of the setup. Whilst Milgram argued that their responses after the study couldn’t be trusted, he was nonetheless happy to use these when it suited him.

Gina Perry shows that in private Milgram often shared many of the doubts that critics voiced about his work, including their ethical concerns. Publicly, though, he strongly defended his work, and more so with the passage of time. He wanted to be seen among the greats of social psychology, including his own mentor Solomon Asch, whose work on conformity was an obvious precursor to Milgram’s work. It seems, though, that Asch eventually stopped responding to Milgram’s letters, presumably increasingly uncomfortable with the ethical issues surrounding the shock experiments. Another famous psychologist, Lawrence Kohlberg, had watched some of the experimental trials with Milgram behind the one-way mirror. Yet he subsequently regretted his own passivity in the face of unethical research. In a letter to the New York Times he described Milgram as “another victim, another banal perpetrator of evil”.

What about Milgram’s paid actors, Williams and McDonough? Were they also culpable in perpetrating evil? Perry is sympathetic to these men. Like the subjects, they had been duped. They needed the money and had responded to an advertisement for assistants in a study of learning and memory. Possibly as the trials proceeded, they themselves became desensitized to what was happening. In any case, they received two pay rises from Milgram in recognition of the efforts they were making on his behalf. Another actor, Bob Tracy, took part in some trials but quit after an army buddy arrived at the lab and he couldn’t go through with the deception. But what kind of pressure were Williams and McDonough under? We know that Williams was assaulted more than once in the lab. And both men were dead of heart attacks within five years of the research ending. This is ironic, as many of the experiments featured the learner stating at the outset that he had a heart problem. There is also evidence that McDonough did experience a heart ‘flutter’ during one of the trials. Did Milgram know about his heart problem and deliberately incorporate this into the experimental scenario?

In conclusion, it is undeniably true that human beings, under certain circumstances, can do terrible things. But Gina Perry has done us a great service by showing that the behaviour of authority figures does not automatically turn us into unthinking automata who will commit atrocities. Through an exemplary piece of detective work she has shown that the people who served as Milgram’s subjects were, by turn, concerned, questioning, rebellious and even disbelieving. Some, though, were affected by the experiments for years afterwards. After all, if you had been pressured into delivering  very painful shocks, and possibly a lethal shock, in the name of science, only to be told that you were the person being studied, and possibly not being told that no real shocks were delivered, how would you feel about yourself later on?

Note: Gina Perry is also the author of a new book ‘The Lost Boys’, which I hope to write about in due course.