Destroying the soul, by numbers

I think the first time I became aware of metrics in the workplace was between 1990 and 1993, when I was studying for a PhD at the University of Wales, College of Cardiff (now simply ‘Cardiff University’). One day, A4 sheets of paper had appeared on walls and doors in the Psychology Department proclaiming “We are a five star department!” A friend explained to me that this related to our performance in the ‘Research Assessment Exercise’ (RAE), about which I knew nothing. He scoffed at this proclamation in a rather scathing manner, clearly thinking that this kind of rating exercise had little to do with what really mattered in science. I didn’t realise then how right he was. But the RAE was used as a determinant of how much research income institutions could expect from government (via the funding councils).

A few years later, in my first full-time lecturing post, at London Guildhall University, I was put in charge of organising our entry to the next RAE. Part of this pre-exercise exercise was to determine which members of staff would be included and which excluded. Immediately this raised the question in my mind: “If the RAE is supposed to assess a department’s strengths in research, then shouldn’t all staff members be included?” Such was my introduction to the “gaming” of metrics. Every institution was, of course, gaming the system in this and various other ways. Those that could afford it would buy in star performers just before the RAE (often to depart not long afterwards), leading to new rules to prevent such behaviour.

At some point, universities also got landed with the National Student Survey (NSS), which consisted of numerous questions relating to the “student experience”, but with most of the impact falling on lecturing staff who, either explicitly or implicitly, were informed that they needed to improve. With the introduction of – and subsequent increase in – tuition fees, students were now seen as consumers for whom league tables in research and the NSS were sources of information that could be used to distinguish between institutions when applying. The NSS has also led to gaming, sometimes not so subtly – as when lecturers or managers have warned students that they themselves might suffer from a worse educational experience resulting from institutional loss of income as a consequence of their own low ratings.

These changes within universities have been accompanied by another change: an expansion in the number of administrative staff employed and a shift in power away from academics. And academic staff themselves now spend considerably more time on paperwork than was the case in the past.

A new book by Jerry Z. Muller, The Tyranny of Metrics, shows that the experience of higher education is typical of many areas of working life. He traces the history of workplace metrics, the controversies surrounding them and the evidence of their effectiveness (or lack of). As far back as 1862, the Liberal MP Robert Lowe was proposing that the funding of schools should be determined on a payment-by-results basis, a view that was challenged by Matthew Arnold (himself a schools inspector) for the narrow and mechanical conception of education that it promoted.

In the early twentieth century, Frederick Winslow Taylor promoted the idea of “scientific management”, based on his time-and-motion studies of pig iron production in factories. He advocated that people should be paid according to output in a system that required enforced standardisation of methods, enforced adoption of the best implements and working conditions, and enforced cooperation. Note that the use of metrics and pay-for-performance are distinct things, but often go together in practice.

Later in the century, the doctrine of managerialism became more prominent. This is the idea that the differences among organisations are less important than their similarities. Thus, traditional domain-specific expertise is downplayed and senior managers can move from one organisation to another where the same kinds of management techniques are deployed. In the US, Defence Secretary Robert McNamara took metrics to the army, where “body counts” were championed as an index of American progress in Vietnam. Officers increasingly took on a managerial outlook.

The use of metrics found supporters on both the political left and the right. Particularly in the 1960s, the left were suspicious of established elites and demanded greater accountability, whilst the right were suspicious that public sector institutions were run for the benefit of their employees rather than the public. For both sides, numbers seemed to give the appearance of transparency and objectivity.

Other developments included the rising ideology of consumer choice (especially in healthcare), whereby empowerment of the consumer in a competitive market environment would supposedly help to bring down costs. ‘Principal-Agent Theory’ highlighted that there was a gap between the purposes of institutions and the interests of the people who run them and are employed by them. Shareholders’ interests are not necessarily the same as the interests of corporate executives, and the interests of executives are not necessarily the same as those of their subordinates (and so on). Principals (those with an interest) were needed to monitor agents (those charged with carrying out their interests), which meant motivating them with pecuniary rewards and punishments.

In the 1980s, the ‘New Public Management’ developed. This advocated that not-for-profit organisations needed to function more like businesses, such that students, patients, or clients all became “customers”. Three strategies helped determine value for money:

  1. The development of performance indicators (to replace price).
  2. The use of performance-related rewards and punishments.
  3. The development of competition among providers and the transparency of performance indicators.

Critics of this approach have noted that not-for-profit organisations often have multiple purposes that are difficult to isolate and measure, and that their employees tend to be more motivated by the mission rather than the money. Of course, money does matter, but that recognition should come through the basic salary rather than performance-related rewards.

Indeed, evidence indicates that extrinsic (i.e. external to the person) rewards are most effective in commercial organisations. Where a job attracts people for whom intrinsic rewards (e.g. personal satisfaction, verbal praise) are more important, the application of pay-for-performance can undermine intrinsic motivation. Moreover, the people doing the monitoring tend to adopt measures for those things that are most visible or most easily measured, neglecting many other things that are important but which are less visible or not easily measured. This can lead to a distortion of organisational goals.

Many conservative and classic liberal thinkers have criticised such ideas, including Hayek, who drew a comparison with the failed attempts of socialist governments (notably the Soviet Union) at large-scale economic planning. Nonetheless, from Thatcher to Blair, from Clinton to Bush and Obama, politicians of different hues have continued to expand metrics further into the public domain.

Muller is not entirely a naysayer on metrics, noting that they can sometimes genuinely highlight areas of poor performance. In particular, he notes that in the US there have been some success stories associated with the application of metrics in healthcare. However, closer examination of these cases shows that these successes owe more to their being embedded within particular organisational cultures rather than with measurement per se. Indeed, these successes seem to be the exceptions rather than the rule, with other research showing no lasting effect on outcomes and no change in consumer behaviour. Research by the Rand corporation found that stronger methodological design in studies was associated with a lower likelihood of identifying significant improvements associated with pay-for-performance.

What is clear – and Muller looks at universities, schools, medicine, policing, the military, business, charities and foreign aid – is that metrics have a range of unintended consequences. These included various ways in which managers and employees try to game the system, including: teaching to the test (education), treating to the test (medicine), risk aversion (e.g. in medicine, not operating on the most severely ill patients), and short-termism (e.g. police arresting the easy targets rather than chasing down the crime bosses). There is also outright cheating (e.g. teachers changing the test results of their pupils).

Incidentally, another recent book, The Seven Deadly Sins of Psychology (by Chris Chambers) documents how institutional pressures and the publishing system have incentivized a range of behaviours that have led to ‘bad science’. For instance, ‘Journal Impact Factors’ (JIFs) supposedly provide information about the overall quality of the research that appears in different journals. Researchers can cite this information when applying for tenure, promotion, or for their inclusion in the UK’s Research Excellence Framework (formerly the RAE). However, only a small number of publications in any given journal account for most of the citations that feed into the JIF. Another issue with JIFs concerns statistical power – the likelihood that a study will identify a genuine effect (statistical power depends on sample size and several other factors). It turns out that there is no relationship between the JIF and the average level of statistical power within a journal’s publications. Worse, high impact journals have a higher rate of retractions due to errors or outright fraud.

But one of the impacts of metrics is the expansion of resources (people, time, money, equipment) in order to do the necessary monitoring. Even the people being monitored must give up time and effort in order to produce the necessary documentation to satisfy the system. And as new rules are introduced to crack down on attempts to game the system, so the administrative resources are expanded even further. This diversion of resources obviously works against the productivity gains that are supposed to be produced by the application of metrics.

I was less convinced by the penultimate chapter in Muller’s book, in which he addresses transparency in politics and diplomacy. He speaks scornfully of the actions of Chelsea Manning and Edward Snowden in disclosing secret documents, which he says have had detrimental effects on American intelligence. Undoubtedly, transparency can sometimes be a hazard – compromise between different parties is made harder under the full glare of transparency – and there is a balance to be struck, but I would argue that the scale of wrongdoing revealed by these individuals justifies the actions they took and for which they have both paid a price. In the UK, as I write, there is an ongoing scandal over the related issues of illegal blacklisting of trade union activists in the construction industry and spying on political and campaigning groups (including undercover police officers having sexual relationships with campaigners). A current TV program (A Very English Scandal) concerns the leader of a British political party who – in living memory – arranged the attempted murder of his former lover, and was exonerated following an outrageously biased summing up in court by the judge.  And of course the Chilcot report into the Iraq war found that Prime Minister Blair deliberately exaggerated the threat posed by the Iraq regime, and was damning about the way the final decision was made (of which no formal record was kept).

However, as far as the ordinary workplace is concerned, especially in not-for-profit organisations, the message is clear – beware of metrics!