Scholarly article on topic 'Developing a feeling for error: Practices of monitoring and modelling air pollution data'

Developing a feeling for error: Practices of monitoring and modelling air pollution data Academic research paper on "Social and economic geography"

Share paper
Academic journal
Big Data & Society
OECD Field of science

Academic research paper on topic "Developing a feeling for error: Practices of monitoring and modelling air pollution data"



Original Research Article

Developing a feeling for error: Practices of monitoring and modelling air pollution data

Big Data & Society July-December 2016: 1-12 © The Author(s) 2016 DOI: 10.1177/2053951716658061

Emma Garnett


This paper is based on ethnographic research of data practices in a public health project called Weather Health and Air Pollution. (All names are pseudonyms.) I examine two different kinds of practices that make air pollution data, focusing on how they relate to particular modes of sensing and articulating air pollution. I begin by describing the interstitial spaces involved in making measurements of air pollution at monitoring sites and in the running of a computer simulation. Specifically, I attend to a shared dimension of these practices, the checking of a numerical reading for error. Checking a measurement for error is routine practice and a fundamental component of making data, yet these are also moments of interpretation, where the form and meaning of numbers are ambiguous. Through two case studies of modelling and monitoring data practices, I show that making a 'good' (error free) measurement requires developing a feeling for the instrument-air pollution interaction in terms of the intended functionality of the measurements made. These affective dimensions of practice are useful analytically, making explicit the interaction of standardised ways of knowing and embodied skill in stabilising data. I suggest that environmental data practices can be studied through researchers' materialisation of error, which complicate normative accounts of Big Data and highlight the non-linear and entangled relations that are at work in the making of stable, accurate data.


Environmental data, air pollution, error, modelling, monitoring, data practices


Air is not a one, it does not offer fixity or community, but it is no less substantial. The question is whether we can feel it. (Choy, 2012: 121)

Choy's description of air encompasses its materiality and immateriality, its multiplicity and fluidity through which he inquires: how do we feel this amorphous yet substantial thing? In his chapter entitled 'Air's Substantiations', Choy uses air as a heuristic to capture the many atmospheric experiences air provides, among them dust, oxygen, dioxin, smell, particulate matter, visibility, humidity, heat, and various gases (2012: 127). His subsequent abstraction of air into 'atmospheric experiences' involves an interweaving of the multiple encounters air makes possible, producing what he

calls a 'poetics of air'. This conceptualisation of air enables him to trace the particular and everyday experiences of 'honghei' (ambient air) in Hong Kong, alongside the scientific and technical practices which seek to measure and scale air as a universal category. These experiences, he shows, are different ways of feeling air.

I begin with Choy's descriptions of 'airy matters' because he captures both human-material

Faculty of Public Health and Policy, London School of Hygiene and Tropical Medicine, UK

Corresponding author:

Emma Garnett, Faculty of Public Health and Policy, London School of Hygiene and Tropical Medicine, 15-17 Tavistock Place, London WCIH 9SH, UK. Email:

^ I Creative Commons CC-BY: This article is distributed under the terms of the Creative Commons Attribution 3.0 License (http:// KS^^K^^I which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (

entanglements with air, and the different sensory scales and registers at work within these. The story of air is increasingly described in relation to human activities, as the result of life-styles in modern industrial societies (Sloterdijk, 2009: 88). Air pollution is very much a hybrid thing (Haraway, 1991; Latour, 1999b), understood as the product of human and non-human relations which have actively changed the material constitution of air. Data generation has also actively shaped what constitutes air, and how air is experienced and engaged with. Occasions of measuring air are proliferating: from government-led monitoring devices to global scale computer models, and from participatory-modes of citizen science (Gabrys, 2014) to the collecting of data through mobile phone apps (e.g., the London Air iPhone app by the London Air Quality Network), which make new kinds of relations of air sensible. As research on urban spaces and 'smart cities' have emphasised, digital data's ability to record different kinds of sensations and states (Gabrys, 2014; Thrift, 2014; Tironi and Sanchez Criado, 2015), multiply the ways in which data are now made and used.

Technical devices that sense air and make it measurable are also prescriptive, configured by metrics and methods of measurement and compartmentalised into different species of particles and gases. Yet, as social studies of informational practices have shown, standards and classification have socio-material and political effects because they effect what will, or will not, be made visible (Bowker and Star, 1999). The notion of the 'making up' of data (Boellstorff, 2013: cf. Hacking, 2006) has been used to delineate the material and political dimensions of Big Data. Particular focus has been on transactional and surveillance data (Beer and Burrows, 2013), personal health data and the quantified-self movement (Lupton, 2014a, 2014b; Nafus and Sherman, 2014) and what this means for empirical sociology (Savage and Burrows, 2009). These studies pose new questions about the ethics and politics of Big Data, particularly in terms of how data are rendered meaningful and functional through everyday practices of data production, use and analysis.

There have been fewer studies, however, critically exploring scientific data practices and how these different kinds of Big Data are 'made up' in order to carry meaning and have political effects. Environmental data, in the form of air pollution data, are my specific focus in this paper. Scientific data of air pollution as part of other environmental Big Data raise particular kind of issues. The proliferation of methods to measure and make data of air pollution may even shift the object of study and therefore the relationship between human bodies and their environments. As Mei at al.'s (2014) research on 'sniffing social media' shows, by using text content from social media posts with

spatiotemporal correlations among cities and days it is possible to measure and predict air quality in very different ways. This expansion of what should or can be monitored is also raised by Ottinger and Zurer (2011), who point out that introducing monitoring technologies for communities to measure their exposure (rather than relying on data generated by local industries) assumes a benchmark for good/bad air can be set. The authors suggest that expanding who can measure air quality will not resolve the politics of air and data. Indeed, focusing only on the endpoint of data closes down alternative probings into how standards are achieved, concerns about air pollution stabilised, and decisions about what should and can be measured made. Acknowledging the role of air pollution data as part of these Big Data practices is vital if we are to understand the intricate ways in which environmental data gain scientific and political affordance.

Based on ethnographic fieldwork as member of a multi-disciplinary public health project called Weather, Health and Air Pollution (WHAP), in this paper I examine a key component of making environmental data: sensing error. I will examine the checking for error in two different data practices of air pollution - modelling and monitoring - focusing on the devices involved in these practices, and the material processes of collecting, capturing and 'making present' air pollution. I focus on modelling and monitoring practices due to their very different ways of managing 'error' (within WHAP these data were often opposed and contrasted). The different properties and meanings of modelled and monitored data meant they also caused tensions for researchers trying to share and combine these data across different situated practices. By focusing on how error was managed, I was able to go on and explore internal differences within data rather than only external differences between data (which researchers already acknowledged). Starting out with the performative dimensions and potential of data practices when meaning and value remain uncertain, I pay attention to the craftwork of articulating error, and therefore the stabilising of numerical measurements as mobile forms.

Embodiment and performance: Distinguishing data from error

As Gitelman and Jackson write, data are 'evolving assemblages rather than discrete entities', which need to be understood as 'framed and framing' (2013: 5). Drawing on ethnographic research of environmental data practices, I found air pollution data were similarly made through particular framings, both in terms of their considered geographic and environmental context and their role in national and global modes of governance. Error was a way to come to understand how

data were framed because what counted as error was shaped by the anticipated material form and discursive use. Indeed, to publish data on public platforms, in official documents or in academic journals requires accounting for 'omissions' - how error was managed in data production.

Bowker et al. suggest that the invisible work and 'quiet politics' of knowledge infrastructure are how values, policies and modes of practice become embedded in larger informational systems. If we consider air pollution data as part of a wider system of environmental health governance, then the work of making and maintaining this system need to be explored. Accordingly, data infrastructures can also be performative in enabling, or not, how data are classified and therefore how air pollution is made visible. In practice, classification is articulated through visual platforms, which enable modellers, for example, to manipulate and 'play with' atmospheric structures and processes on the computer screen (Alac, 2008). Seeing and responding to changes in a measuring instrument also result from the cultivating of an intuition for the phenomena being studied. It is this invisible craftwork of scientific practices tied up with the generation of environmental data which often get negated through a focus on what data do and how to analyse Big Data.

Error, I found, was a key part of crafting data and classifying what counts as data of air pollution. This is not an issue of determining right from wrong, because, as Bowker et al. have pointed out, part of what makes a good classification scheme is the enabling of comparability and prescription, an effective level of complexity. Thus, it is the craft of the data technician and scientist to make a judgement about how differentiated to make the classification (Bowker et al., 1995: 347). This mutual process of constructing and shaping differences through classification systems is crucial in our conceptualisation of any reality (1995: 346). Moreover, conceptualising classification as performative aligns with contemporary interests in STS and philosophy of science which foreground the material, relational and ontological dimensions of scientific practice (Barad, 2007; Coopmans et al., 2014; Mol, 2002; Myers, 2015b). Indeed, an interest in the situated and lively nature of data - their socio-material lives (Helgesson, 2010; Leonelli, 2009, 2010; Michael, 2004) - has resulted from a particular emphasis on the embodied and imaginative work that render data sense-able and sensible (Myers, 2015a).

In terms of error, it is this sense that the measurement is measuring the unintended that forces researchers to understand and materialise error as a form 'other' to that being studied. Further, error is interesting sociologically because it suggests a correction from

normative expectations about 'the real'. As Tilly (1996) has argued, error-correction is also a counter-factual explanation crucial to understanding social relations and therefore the duration of, for example, socio-tech-nical assemblages. By focusing on error in air pollution data practices alternative 'theories of the possible' may emerge because responses to error are neither instrumental nor random; they draw on historically accumulated understandings from culture (1996: 598). This point has also been developed by Sennett: using the coupling of resistance and ambiguity in craftwork (2009: 205). He argues that repairs and responses to unexpected outcomes are productive for material knowledge making, and it is at these moments that imaginations of and competence in coming to know an object can be expanded.

The different properties of error in modelling and monitoring problematise the assumption that data are direct measures of air pollution, and instead highlight the active ways in which the modification of data and thereby what constitutes air pollution take place in everyday environmental knowledge practices. Reflecting on the conceptual capacities of'error', I suggest, opens up avenues for thinking about and researching the configuring of technical devices, bodily movements and materials (and their relations) that remain alternative (Sennett, 2009: 200), whilst remaining very much a part of these environmental data.

Different data practices, shifting articulations of error

The first case study I am going to describe is an air pollution monitoring station. The PI (Principal Investigator) put me in touch with a contact involved in monitoring air pollution in City 1, and who was also a member of WHAP's advisory committee. This led to me attending a routine monitoring site trip, during which I observed the process of checking monitors were functioning correctly and recording the performance of the monitors as part of wider 'quality assurance' processes.

The second case study was a very different kind of material setting, and focuses on the data practices of the atmospheric chemists on WHAP who used large-scale computer models to simulate atmospheric processes of air pollution. This involved sharing an office with two key modellers to observe and participate in model runs, and was followed up by emails and phone conversations to explicate these processes further, allowing me to ask questions and query particular motions and interpretations of model outputs. Spending time with the modellers also enabled me to experience the banality and the everyday-ness of modelling as particular kinds of data practices.

Although these case studies took place in distinct locations, both modelled and monitored data were central to weekly team discussions, during which their difference and comparability were considered and very often contested. Following traces and the formation of objects (Latour, 2005; Latour and Woolgar, 1986) is an Actor Network Theory inspired approach to the study of knowledge, which accounts for the agency of human and non-human forms and relations, and attends to how these are constituted in and articulated through socio-material practices. Indeed, it was through articulated differences by researchers on WHAP, that I came to appreciate the role error played in the making of air pollution data. By following the material practices of data making, I was able to render visible some of the ways in which particular data practices were embedded in a wider network of relations. Tracing these associations was a continual process because heterogeneous relations are always shifting, (re)producing and reshuffling all kinds of actors, including data, scientists and their institutional arrangements (Latour, 2005; Law and Hassard, 1999).

Case study 1: Monitoring instruments

We start by climbing up the outdoor stairway to the roof of the school. On top of the roof I see a grey porta-cabin [...] On entering, I am greeted by a set of four large rectangular boxes stacked on top of each other, supported by a shelving unit. Inside the boxes are two tubes, one attached to an outlet in the roof and the other connecting to four stacked boxes. To the left of the shelving unit are two gas canisters [which later I learnt host the different certified gases]. (Fieldnotes City 1, 25 October 2012)

The monitoring station I discuss here is a 'back-ground'2 monitoring station, which, I was told, had been used to collect measurement for seventeen years. The area has four other sites, and this was known as 'number one', which relates to its relatively long history. Phil, an air monitoring expert, visits the site every two weeks to test the calibration equipment. Calibration is a process whereby the measurement made is compared with another 'true' measurement in order to test it for error. This is one part of a much more elaborate process of testing collected data for error. Indeed, monitoring sites are also visited by engineers and auditors, so the site check I attended was one among many others to ensure 'quality data'.

Air pollution is monitored across the UK, initiated by central government and often carried out by local councils. Measurements are collected at different monitoring sites, and these are organised into different

networks according to the location of the site. For example, London has one accredited 'air quality network' managed by government departments, local councils, university research groups and environmental agencies. The pollutants measured at the site I visited included: ozone (O3), particulate matter (PM10), oxides of nitrogen (NOx) and sulphur dioxide (SO2).3 The purpose of these measurement data are stated as twofold, as providing the public and authorities 'real-time'4 information on current air pollution levels, and to enable short-term and long-term responses to air pollution as a public health concern (DEFRA, 2012a). Using the data produced at monitoring stations, air pollution levels are reported in 'real time', on a scale that informs the public of air quality in different areas, inciting recommendations and actions to protect health.

Zero air and the calibration test

'Error' is a term used for a numerical reading not considered as measuring air pollution accurately. However, error is not always a mistake, although this is accounted for, but also a scale, an acceptable range of variability. In this way, error seems to be a part of measuring, rather than an unanticipated outcome. There are many different reasons for an erroneous measurement and the aim of a 'calibration test' is to control and account for some of these. In order to make a non-erroneous measurement, it is essential that the instrument used does not influence the measurement being taken. The main cause of error that calibration tests for is 'the drift' of the instrument from 'zero'. Zero is a term used to denote a baseline from which a measurement can be made, to construct an 'unmediated setting'. If the baseline is not zero, then the instrument is drifting by the difference between the measurement taken and zero. Drift is, then, a measurement too, and is a technique that accounts for the margin of error in a measurement.

Zero air (also referred to as 'pure air') is understood as air with no pollutants in it. Zero air does not exist in the environment, but has to be actively made using a scrubber, a device that quite literally scrubs away the parts of the air that are not being measured. The notion of pure air was described by Phil, the technician and researcher who I attended the site check with, as a standardised external reference material, made on-site or in a laboratory, and which can be physically introduced into the measurement setting in order to test for error. Without this fabricated reference point, the measurement made cannot be stabilised as data. Phil mediated the standard and the actual measurement in his manual calibration test, where the standard became a way to gain purchase on the authenticity of the

measurement. In this way, pure air was created to produce a kind of 'objective nature' through which error and measures of air could be made material and 'real'.

Each monitor takes an air sample and measures the concentration of, in our case here, ozone, in the sample with a sensor. The measurement is the concentration of ozone in the sample with a metric of one-millionth of a gram per cubic meter air (mg/m3). The air sample is drawn into tubes by a pump unit connecting the outside of the station with the indoor instruments (see Figure 1). A measurement is made with a UV light beam that shines through the tube and reacts with the different chemical components in the air sample. The tubes are called single reaction cells and are fitted with pneumatic valves, which enable them to switch between the zero measure and the sampled ambient air paths. The measure of ozone is the measure of this reaction in comparison to the measurement taken made with zero air. Without this comparative process no measurement can be made.

The reading is the level of absorption of ozone in the UV beam, compared with the measurement made of absorbance of the pollutant in the zero air sample. Phil explained to me that it is the switching between these measurements which results in the making of data of air pollution, where the monitor is:

[...] alternately measuring the absorption of the air path with no ozone present (zero air) and the absorption in the ambient sample. Gases pass through these UV beams and absorb some of the transmitted energy, which appear in the measured absorbance data. (Fieldnotes, 25 October 2012)

What this explanation shows is the multiple kinds of measurements being made in the process of working out the concentration of an air pollutant in an air

Figure 1. Inside an air pollution monitor in City I (Photo courtesy of author).

sample (considered as ambient air). Indeed, there is no baseline, as I've detailed, so measurements of air with the pollutant in and air with no pollutants in are made and used in order to construct a measure of air pollution. The purpose of testing this process is to check whether the monitoring device is measuring air pollution concentrations accurately, so that the inaccuracy of the measures can be accounted for in the final data. What is interesting here is the multiple measurements made in the process of configuring a final measure of air pollution.

Seeing error, sensing data

The flux that results from the manual calibration test is visible. As I sat next to Phil, a series of numbers appeared very quickly on the small screen at the front of the monitoring boxes. Indeed, numbers were continuously shown on the front of the box as air was constantly pumped through the tube and measured. (Fieldnotes, 25 October 2012)

Zero air and the calibration gases are measured and compared and, ideally, the readings on the front of the monitor should be the same as the measure in the gas canisters. Looking for this 'span and drift' in a calibration test means waiting for the reaction to take place and a stable measurement to be made-both of zero air and the calibration gases.5 Phil has to wait at least ten minutes in order for the analyser (the name used to refer to the piece of equipment that makes the measurement) to stabilise:

As the numbers on the front of the monitor start to slow down, Phil tells me that the display on the front of the monitor box is the 'zero reading'. Phil types this into the spreadsheet under the table 'data acquisition response'. (Fieldnotes, 25 October 2012)

Once stabilised, the readings are formally recorded on a shared spreadsheet, which also operational-ises the numerical readings in a series of further transformations, 'because Excel is also a calculating tool' (Phil, 25 October 2012). The readings of zero and that of the calibration gas are compared by a mathematical equation, which then provides a measure of 'error'.

The process of reading and interpreting the sequence of numbers that stabilises as measurements are, as explained Phil, contingent on the pollutant being measured. So, for example, the calibration of ozone requires checking another nearby monitoring station to see if it is similar, as ozone is stable over a regional area. However, particulate matter (pm2.5 and pm10)

would not generate a stable reading, since particulates are unstable in space and time:

I asked Phil how he knew what the numbers meant. He responded by explaining that meaning comes from 'experience [...] you need an eye to know what to look for'. Expanding on this notion of experience, Phil suggests there is an embodied aspect of doing this kind of work, through which someone can develop 'a good eye'. (Fieldnotes, 25 October 2012)

The experience of carrying out calibration tests, then, enables one to 'know what to look for', drawing upon the age old distinction between seeing and knowing (Lynch and Woolgar, 1990), and exemplifies the symbiotic relationship and circulatory nature of seeing and knowing in practice (see also Latour, 1999a). As Myers (2014) develops in her account of protein modellers, the work of seeing and knowing extends beyond vision to the embodied, kinaesthetic and performative processes of coming to materialise scientific phenomena. Indeed, Phil went on to compare his own experience of seeing and thereby getting 'good data' with 'non-data analysers' (specifically individuals employed by local authorities and inexperienced technicians) as people who 'don't know what to look for', and who have not developed the necessary craft skills to re-present phenomena in ways that make data of a high enough quality.

Cleaning and mobilising air pollution data

Whilst running the calibration test, Phil balanced his laptop on his knees and opened up the spreadsheet ready to input the recordings he made. The maintenance of the record of measurements taken and calibration results was the second major task of visiting the site, which Phil explicitly referred to as 'a record keeping exercise'. The spreadsheet is a table which structures the measurements, with a list of variables including the name of the monitoring site, the date, time, temperature in the cabin, and the calibration results. These records go straight into a database:

Constant data is the aim and records of calibration results are kept and put into the database for the time period [...] and you scale it [the data from the monitor] until the next time someone comes to the site [according to the calibration results of this visit]. (Fieldnotes, 25 October 2012; technicalities confirmed via email, 30 October 2012)

The calibration results then become attached to the measurements made by the monitor, so that future

data analysis can draw upon the results to check and explain the measurements made and make any adjustments required.

The cleaning procedures for air pollution data are governed by standardised protocols and related thresholds of validity according to UK and EU legislation, so that the data become further shaped and formatted by influences from outside the situation of initial capture (Helgesson, 2010: 61). The cleaning practices are recorded, maintained and sustained and become part of the history of air pollution monitoring. Indeed, during the site visit, Phil emphasised the importance of maintaining the records for 'data capture' and the proceeding journey of these captures to their stabilisation as 'ratified data'.

Continuity is also a useful metaphor to think analytically about this process of getting 'good data'. Air pollution was conceptualised by Phil as something that is always in emergence and therefore continuous. However, continuity is difficult to measure in practice and one of the ways in which continuity was constructed was through checking for errors and maintaining the material context of measurement, which remained identifiable and attached to numerical readings in their journey to becoming data.6 Constructing continuity by making lots of different kinds of measurements - of humidity, of instrument performance, of date and time - enabled the data to become 'more real'. This is a logic which resonates with claims made in contemporary discourse about Big Data and reality (Grant, 2012; Shaw, 2014). In the case of air pollution data, in order to make data continuous and 'big', a series of interferences were required to construct this sense of 'the real'. The way in which data were made real, however, was specific to the particular pollutant monitored.

The standardised procedures of reducing and accounting for error, through which monitored data became more stable, simultaneously mobilised data into different practices. For example, the London Air Quality Network publishes data in 'real time' on their website and these are then classified according to low, medium or high air pollution levels. In this movement from measurement practices to the practices that work with and re-use these data, data become an objectified form, from which claims can be made and further epi-stemic inquiries initiated. By providing discrete measurements every 15 seconds, monitoring enables the potential extension of scientific relations and analytical patterns of air pollution. This process is a component of making Big Data and, although not defined as such by Phil, the checking of error and stabilising of accurate data is informed by its functionality, as making up larger data sets for controlling and responding to air quality and urban environments.

Case study 2: Running a simulation and checking for error

The modeller states 'see, there is an error' pointing to the computer screen where, after several seconds, a series of numbers appear. However, I can't see the error. Following this apparent visualisation of error the modeller describes how he is going to now seek to understand this error, explaining that the compilation of code is tricky because if it is compiled on one computer then it won't necessarily work on another, so by re-running the model you start to work out where the error lies and therefore what counts as data. (Fieldnotes, 28 September 2012)

Modelling involves a different kind of measurement setting and data practice to monitoring. At the same time, I found there were strong resonances between these practices. In modelling, the measurement setting was built with a computer, and the complexities that make up environments, such as temperature, weather conditions and time, were constructed within the model structure. This approach contrasts with monitoring, where the complexities in taking a measurement influence the setting in which a monitor is initially located. In order, for example, to measure traffic pollution monitors are placed on 'the roadside'; for 'ambient air'7 monitors are located away from traffic to pick up 'background air'. Furthermore, the process of deciding which interactions to study was the subject of continuing debate among the modellers and other researchers on WHAP. So, the process of producing air pollution data was worked out contingently in modelling - as a result of particular research interests and in relation to the aims of the wider project - rather than as the result of a standardised system of data collection.

A simulation model is often considered a theoretical representation of the atmosphere. The assumptions that underpin the model are described through mathematical equations. The combined model, of the weather and atmosphere, simulates atmospheric relations in process by reducing chemical processes to a number of physical laws and by inputting other data for specific variables that function as parameters for the running of a simulation. These equations represent an 'exact determination of how the [environmental] system will evolve through time' (Winsburg, 1999: 5), so that the actual simulation process is internal to the computer model.

The modellers on WHAP talked about the model as three dimensional, simulating the fluxes, flows and transport of air pollution rather than measuring air pollution at one point in time and space like monitors.8 So, even though the model can simulate a number of

different pollutants, including ozone and particulate matter, these were considered to be relational and in process. The different pollutants were referred to by the modellers as relations in the atmosphere, for example, as 'nitrogen and sulphur deposition' or 'surface ozone'. This is significant if we are to understand how modelling transforms a measurement into data, because how air pollution is configured and imagined in time and space shapes the stabilisation process.

The policy value of modelling is its ability to produce data on past, present and future air pollution, which can be used in environmental governance and policy making. At the same time, this feedback between data and use plays out in the making of data. As one senior modeller explained, which simulations to run is dependent on the pollutants considered as a health risk and therefore of interest to the policy maker (Elizabeth, November 2011).

A simulation run

I am sharing an office space with Craig and Tom and observing their running a simulation. I am surprised to find a simple and rather ordinary setting, an office very much like my own, considering the global remit of the CM-MW model. It seems to be time consuming. I learn that modelling relies on access to external expertise and technical resources, specifically, the model interface of the PC is connected to a super computer, which Craig can communicate with at his office computer.9 On the screen are lines of code, and below a box that Craig begins to type commands for the model into. (Fieldnotes, 8 November 2012)

The process of working out error in a simulation run was a key component of the practice of running a simulation, where there was an alteration between the computer screen as interface and the mathematical model accessed through the computer code. The computer screen became the material way in which the researcher engaged with air pollution as a digital abstraction. By typing out particular commands in the box on the screen, Craig manipulated the modelled atmosphere to produce a measurement of an air pollutant. Communicating with the model through computer code was an engagement that made the model do things:

The core model code provided by the model developers is modified for the specific needs of the project [WHAP] by manually editing via the keyboard. This human readable 'source code' is then 'compiled' via a standard software tool (compiler) into a set of binary instructions which can be understood and executed by the

computer. (Craig, personal correspondence, 19

December 2013)

The next step, compiling, translated the line of code into a series of actions. Compiling is an action considered as potentially generating error because how the code comes together and performs in practice is uncertain. The line of code was translated by the model into a series of actions, which then performed a simulation according to the specific parameters delineated: 'to execute the sequence of instructions created by the compile stage'. The instructions in the code order the variables of interest (e.g., which pollutant, meteorological conditions, location) into output files that are structured and stored under the details of the simulation run (see Figure 2). The arrangements of the output data into files results from a successful simulation run. This process of arranging output and input files and its effect on materialising some output over others resonates with Bowker et al.'s (1995) arguments around knowledge infrastructures. It is this work of structuring and storing data files, of encoding and classifying modelled outputs, which are made visible in practice based accounts of data. These organising techniques ultimately influence the informational content and material form air pollution data take.

Checking for error, configuring air pollution

The arrangements of the output data into files results from a successful simulation run. However, the majority of runs involve error. Craig's demonstration of a

simulation run showed me what error means in practice, and how it is visualised on the computer interface:

Having pressed 'run' on the computer interface, we wait about ten seconds for a series of lines of code, similar visually to my untrained eye, as the code that was input into the model. This is because the model output is also in code. The model presents a result in the response box above the command box on the interface that Craig sits in front of. (Fieldnotes, 28 September 2012)

There is error if the code does not produce a 'legible output', appearing as a line of script within which the error lies and needs to be worked out. The line of script becomes the object of interest. 'De-bugging' is an exercise in trying to understand the sources of error by re-examining the different elements of a model run. Primarily, this involves going back to the typed-in code commands. Indeed, there are a number of recognised sources of error in the data. There may be error in the performance of the code in a compilation; there may be error in the assumptions within the code, for example, the approximated measurement by those who wrote the code not co-ordinating with the approximation of measurement being produced in the simulation; or 'human error' in the process of re-writing the code for the particular compilation. These different kinds of sources of error were found, understood and controlled for through a series of interactions at the model interface.

Figure 2. Organising the input and output data in a simulation run (Table from CM-MW User-manual).

In Craig's account, checking for error seemed to be a process of getting a sense for the balance of the model as a good representation of the atmosphere, the code as the means by which the computer and modeller can communicate, and the assumptions behind the data used as inputs into the model. Craig described this act of generating data an 'art', of balancing the different elements comprising the modelled atmosphere, alongside an understanding of what kinds of air pollution relations are of interest to those using the data.10 This empirical anecdote extends what counts as error because it is through achieving balance, and the feeling for work that is required to do this, that 'good data' is made. Here, error free data is neither a classification nor a scale but an enactment, as Sennett suggests, a making of new possibilities (2009: 205). In this way, modelling was not simply a process of getting a good representation of the atmosphere, but an engagement that plays with atmospheric relations in ways that shape how air pollution in the atmosphere came to be known and performed.

At a different scale, then, the modelled generation of big, national scale data of air pollution is not unprob-lematic either. Modelled data of air pollution was also subject to the effects and affects of mundane, everyday data practices. Once the modelled outputs were made legible (but not stabilised data), they were read using open source software, which is an aid for analysing data through visualisation as mapped concentrations (see Figure 3). The mapped outputs were then used to hone and develop the simulation run in ways which would make data more accurate. Like in the calibration test, this shift back to the technical arrangements of the measurement setting by Craig was where the 'tinkering' (Knorr-Cetina, 1981) took place, so that Craig's engagement with the atmosphere in the building, running and re-running of model simulations intervened in the articulation of air pollution by making particular atmospheric relations 'more visible'. Error was a fundamental part of making 'good data', a process through which what counted as good/ bad data shaped how air pollution was ultimately stabilised and made real.

Discussion: The role of error in the making of air pollution data

The proliferation of environmental data poses a changing set of inquiries for those studying scientific practices and knowledge making. If we are going to study data practices as a particular way of doing science, then the ways in which these practices articulate phenomena, and how researchers sense and enact data in different ways, need to be explored at the multiple sites where these transformations take place. The ethnographic account of air pollution data offered in this paper is

Figure 3. The visualisation of error: A map of SO2 concentrations (Craig, personal correspondence, 15 January 2015).

an attempt to consider how data emerge as a result of one kind of transformation, that of a purification of data through the working out of error. In doing so, the affordances embedded in different kinds of data and how these relate to the scientific and policy worlds in which they are made and used were explored.

For Phil and Craig, making stable data was achieved through carefully balancing the context of measurement, the phenomena under study and their ability to effect and affect air pollution as a research object. Making data was not unmediated or discrete (Bowker, 2010), but rather a process that unfolded temporally through interaction between the phenomena in question, technical objects and other scientific values. Checking a measurement for error was routine practice, yet it was also an inherently uncertain and ambiguous process, which involved practices of feeling for error in order to both get a sense for, and make sense of, air pollution as data; a practice that didn't simply represent, but materialised air pollution as a tangible form. Model and monitor devices were engaged with in ways which made visible particular and contingent relations of air pollution as data.

This process of stabilising accurate and useable data of air pollution involved sensing for error, where error became the focus of investigation and the materialised relation from which air pollution could emerge and

take form. I have shown that in monitoring and modelling, what counted as error was shaped by contingent logics of functionality and framing. In monitoring, omissions were used to further fabricate and make accurate the final data. In modelling, error was used to re-configure the very measurement process, therefore operating differently to monitoring in its entanglement within data making rather than the final data. Developing a feeling for the relational inter-dependencies of making data was both a honing of professional vision (Goodwin, 1994) and the craftwork of articulating and managing error. Performance of error was also influenced by the social and political networks of modelling and monitoring air pollution. For example, monitored data is used in everyday public health management, and error is constantly made and taken away to make data instantly useable and verifiable. In modelling, error is checked for and then taken away by restructuring the measurement process (a simulation). Rather than a scale of an acceptable range of error in data (how far the measurement errs from 'the truth'), in modelling error is an enactment that, as Sennett notes, involves working with error and thereby reconfiguring the problem into other terms (2009: 222), generating new articulations of atmospheric relations and thereby allowing new kinds of questions to be posed.

Exploring these particular roles and artefacts of error extends our understanding of the social and political lives of data. It is through measuring and accounting for error that data's validity was made and mobilised. I have demonstrated the ways in which data are always shifting, demanding flexibility and local ordering (Bowker, 2010; Edwards et al., 2011). In doing so, I have also pointed out that an important part of this alignment process is the performance, management and taking away of error. Thus, starting out with the premise that data are always situated, material enactments is productive in coming to understand the social and political dimensions of data, and the ways in which data gain social and political validity as Big Data (Gabrys et al., 2016).

I've shown that the experienced multiplicity and heterogeneity involved in stabilising what comes to count as 'real data' are constantly negotiated by those scientists and technicians who craft and 'make up' data. Indeed, it was researchers' articulation of their embodied work that also enabled me, as an ethnographer, to get a feeling for the multiple agencies mobilised in measurement practices. Like Myers, I have emphasised the sensory dimensions of feeling for error and the sensibilities which configure these so that they become sensible and useable data. Error practices mobilise heterogeneous elements which give air relations both their sensitivity - ability to respond to intervening practices -and their sensibility - to endow them with a kind of

responsivity that can be used to make sense of their worlds (Myers, 2015a). It is these different kinds of sensing practices and the attentiveness demonstrated by scientists in my research that are made active in the accounts of error provided in this paper, and which highlight how valid data are made accountable to their relations, thereby becoming valid and 'sensible'. It is through error, then, that we can better understand the multiple agencies which configure different versions of air pollution in practice.

As Tilly (1996) and Sennett (2009) both point out, accounting for errors and ambiguity in everyday practices is fundamental to the maintenance of social (and socio-technical) relations and to the extension of knowledge (embodied, material and informational). This relationship between error and social and political networks is of particular significance if we are to understand and trace the expanse and extent of relations which form the social lives of scientific data, and indeed Big Data. Starting out with case studies of data practices of air pollution, I've also problematised the focus on 'data' in discourses of Big Data by foregrounding the other kinds of relations which configure and contain data. I've demonstrated the ways in which performative work of data making always involves materialising and accounting for error, a practice vital for data to carry meaning, circulate freely and mobilise as informational forms.


Special thanks to researchers on the WHAP project whose patience and support made this research possible. Thanks also to Judy Green, Catherine Montgomery and Simon Cohn for guidance during the course of the PhD on which this work is based, and to Jennifer Gabrys, Nerea Calvillo and the three anonymous reviewers for their insightful comments on earlier versions of this paper. Ethical approval was granted by the London School of Hygiene & Tropical Medicine Ethics Committee.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.


The author received financial support from The Natural Environment Research Council, UK.

1. WHAP involved researchers from five different UK universities, and these were largely split across two large cities in the UK - City 1 and 2.

2. Where monitors are placed relates to the character of the surrounding environment. Monitor locations are then

classified according to these surroundings. The classifications 'urban background' and 'rural background' signify that the monitor is measuring the lowest levels of air pollution in that surrounding area. In contrast, 'roadside' denotes a point which is considered as having high levels of air pollution, but which is not considered as 'representative' of the wider area.

3. The Automatic Urban Rural Network monitoring stations can also measure carbon monoxide and smaller par-ticulate matter, pm2.5.

4. 'Real-time' is not actually real time in the sense that there is a delay in the capture of air and the stabilisation of a measurement.

5. In a manual calibration test this means the certified gases in the canisters, rather than a sample of ambient air.

6. Sabina Leonelli's (2009) analysis of the journeys of data and the ways in which data get enrolled in order to make scientific claims suggests that data remain very much attached to their site of locution.

7. Ambient air quality refers to the quality of outdoor air in our surrounding environment, usually at ground level and away from direct sources of pollution.

8. The simulation model in WHAP was a combined chemistry transport model (CM) and meteorological model (WM). This CM-WM was used to simulate the concentration and movement of air pollutants in the atmosphere, generating three hourly description (by mathematical equation) of the evolution of the dependent variables (the parameters and boundary values) of the model (Project Protocol).

9. Super computers are able to carry out a very high amount of computation, and are used for working with very large data sets. The super computer used by the modellers at the university was based in a research institute close by, but every year it moves between prestigious scientific institutions.

10. This kind of balance between the scientific error and the requirements of particular kinds of data as a result of the research project point to another potential error. Although I have not detailed those tensions between scientific requirements of simulation modelling and the demands of, specifically, epidemiologists on WHAP, there were continual discussions about how to match theoretical correctness with the need for a certain scale of empirical data.


Alac M (2008) Working with brain scans: Digital images and gestrual interactions in fMRI laboratory. Social Studies of Science 38: 483-508.

Barad K (2007) Meeting the Universe Halfway: Quantum Physics and the Entanglement of Matter and Meaning. Durham: Duke University Press.

Beer D and Burrows R (2013) Popular culture, digital archives and the new social life of data. Theory, Culture & Society 30: 47-71.

Boellstorff T (2013) Making Big Data, in theory. First Monday 18.

Bowker G (2010) Biodiversity datadiversity. Social Studies of Science 30: 643-683.

Bowker G, Timmermans CS and Star SL (1995) Infrastructure and organizational transformation: Classifying nurses' work. In: Orlikowski W, Walsham G, Jones MR, et al. (eds) Information Technology and Changes in Organizational Work. London: Chapman and Hall, pp. 344-370.

Bowker GC and Star SL (1999) Sorting Things Out: Classification and Its Consequences. Cambridge, MA: MIT Press.

Choy T (2012) Air's substantiations. In: Rajan SK (ed.) Lively Capital. London: Duke University Press, pp. 121-155.

Coopmans C, Vertesi J, Lynch M, et al. (2014) Representation in Scientific Practice Revisited. London: MIT Press.

Economist (2010) The data deluge: Businesses, governments and society are only starting to tap its vast potential.

Edwards P, Mayernik SM, Batcheller LA, et al. (2011) Science friction: Data, metadata, and collaboration. Social Studies of Science 41: 667-690.

Gabrys J (2014) Programming environments: Environmentality and citizen sensing in the smart city. Environment and Planning D: Society and Space 32: 30-48.

Gabrys J, Pritchard H and Barratt B (2016) Big Data & Society.

Gitelman L and Jackson V (2013) Introduction. In: Gitelman L (ed) Raw Data Is an Oxymoron. Cambridge, MA: MIT Press, pp. 1-15.

Goodwin C (1994) Professional vision. American Anthropologist 96: 606-633.

Grant E (2012) The Promise of Big Data. Boston, MA: Harvard T.H. Chan School of Public Health.

Haraway D (1991) Simians, Cyborgs, and Women: The Reinvention of Nature. New York, NY: Routledge.

Helgesson C-F (2010) From dirty data to credible scientific evidence: Some practices used to clean data in large randomised clinical trials. In: Will C and Moreira T (eds) Medical Proofs, Social Experiments: Clinical Trials in Shifting Contexts. Surrey: Ashgate, pp. 49-67.

Knorr-Cetina K (1981) The Manufacture of Knowledge. An Essay on the Constructivist and Contextual Nature of Science. Oxford: Pergamon Press.

Latour B (1999a) Circulating reference: Sampling the soil in the Amazon forest. In: Latour B (ed.) Pandora's Hope. London: Harvard University Press, pp. 24-80.

Latour B (1999b) Pandora's Hope: An Essay on the Reality of Science Studies. Cambridge, MA: Harvard University Press.

Latour B (2005) Reassembling the Social: An Introduction to Actor-Network-Theory. Oxford: Oxford University Press.

Latour B and Woolgar S (1986) Laboratory Life: The Construction of Scientific Facts. Princeton, NJ: Princeton University Press.

Law J and Hassard J (1999) Actor Network Theory and After. Oxford: Blackwell Publishers/The Sociological Review.

Leonelli S (2009) On the locality of data and claims about phenomena. Philosophy of Science 76: 737-749.

Leonelli S (2010) Circulating evidence across research contexts: The locality of data and claims in model organism research. In: Working Papers on the Nature of Evidence:

How Well Do 'Facts' Travel? No. 25/08, London School of Economics, pp. 1-44.

Lupton D (2014a) Apps as artefacts: Towards a critical perspective on mobile health and medical app. Societies 4: 606-622.

Lupton D (2014b) Beyond techno-utopia: Critical approaches to digital health technologies. Societies 2: 706-711.

Lynch M and Woolgar S (1990) Representation in Scientific Practice. London, UK: MIT University Press.

Mei S, Li H, Fan J, et al. (2014) Inferring air pollution by sniffing social media. In: IEEE/ACM International Conference for Advanced Social Network Analysis.

Michael M (2004) On making data social: Heterogeneity in sociological practice. Qualitative Research 4: 5-23.

Mol A (2002) The Body Multiple: Ontology in Medical Practice. Durham: Duke University Press.

Myers N (2014) Rendering machinic life. In: Coopmans C, Vertesi J, Lynch M, et al (eds) Representation in Scientific Practice Revisited. Cambridge, MA: The MIT Press, pp.153-177.

Myers N (2015a) Conversations on plant sensing: Notes from the field. NatureCulture 1: 35-66.

Myers N (2015b) Rendering Life Molecular: Models, Modelers, and Excitable Matter. Durham: Duke University Press.

Nafus D and Sherman J (2014) This one does not go up to 11: The Quantified Self movement as an alternative Big Data practice. International Journal of Communication 8: 1785-1794.

Ottinger G and Zurer R (2011) New voices, new approaches: Drowning in data. Issues in Science and Technology XXVII: 71-82.

Savage M and Burrows R (2009) Some further reflections on the coming crisis of empirical sociology. Sociology 43: 762-772.

Sennett R (2009) The Craftsman. London: Penguin Books.

Shaw J (2014) Why ''Big Data'' is a big deal: Information science promises to change the world. Harvard Magazine 3: 30-75.

Sloterdijk P (2009) Terror from the Air. Los Angeles, CA: Semiotexts.

Stapleton LK (2011) Taming Big Data. IBM Data Magazine I: 1-6.

Thrift N (2014) The 'sentient' city and what it may portend. Big Data & Society 1: 1-21.

Tilly C (1996) Invisible elbow. Sociological Forum 11: 589-601.

Tironi M and Sanchez Criado T (2015) Of sensors and sensitivities: Towards a cosmopolitics of ''Smart Cities''? Italian Journal of Science and Technology Studies 6: 89-108.

This article is a part of Special theme on Practicing, Materializing and Contesting Environmental Data. To see a full list of all articles in this special theme, please click here: