Values and Evidence in Model-Based Climate Forecasting
No contemporary volume on the experimental side of modeling in science would be complete without a discussion of the “model experiments” being run in contemporary climate science. I am referring here, of course, to the enormous, complex, and elaborate global climate simulations being run to try to predict the pace and tempo of climate change, both at the global and regional level, over the course of the next century. And no section on evidential reasoning in science would be complete without some discussion of the appropriate role, if there is any, of social and ethical values in reasoning about scientific models, hypotheses, and predictions.
Fortunately, climate science and the computer simulation of climate in particular provide an excellent background against which to discuss the role of values in science. One reason for this is obvious: climate science is one of the most policy-driven disciplines in the scientific world. Outside of the health sciences, no area of science is more frequently called on to inform public policy and public action, and is thus more frequently wedded to aspects of public decision making that themselves draw on social and ethical value judgments. But there is a second, less obvious reason that climate science is an excellent place to study the role of values in the physical sciences. This has to do with a now-famous and widely accepted defense of the value-free nature of science that was first put forward in the 1950s by Richard Jeffrey. We will see more details in the sequel, but his basic idea was that the key to keeping science free of social and ethical values—and hence in preserving its objectivity—was for scientists to embrace probabilistic reasoning and to attach estimations of uncertainties to all their claims to knowledge.
But it is precisely this endeavor—the attempt to attach estimations of the degree of uncertainty to the predictions of global climate models—that has raised in the domain of climate science some of the most persistent and troubling conceptual difficulties. There has been over the last several years an explosion of interest and attention devoted to this problem—the problem of uncertainty quantification (UQ) in climate science. The technical challenges associated with this project are formidable: the real data sets against which model runs are evaluated are large, patchy, and involve a healthy mixture of direct and proxy data; the computational models themselves are enormous, and hence the number of model instances that can be run is minuscule and sparsely distributed in the solution space that needs to be explored; the parameter space that we would like to sample is vast and multidimensional; and the structural variation that exists among the existing set of models is substantial but poorly understood. Understandably, therefore, the statistical community that has engaged itself with this project has devoted itself primarily to overcoming some of these technical challenges. But as these technical challenges are being met, a number of persistent conceptual difficulties remain. What lessons can we draw from these conceptual difficulties concerning the proper role of social and ethical values in science?
Science and Social Values
What do we mean, first of all, by social values? Social values, I take it, are the estimations of any agent or group of agents of what is important and valuable—in the typical social and ethical senses—and what is to be avoided, and to what degree. What value does one assign to economic growth compared with the degree to which we would like to avoid various environmental risks? In the language of decision theory, by social values we mean the various marginal utilities one assigns to events and outcomes. The point of the word “social” in “social values” is primarily to flag the difference between these values and what Ernan McMullin (1983) once called “epistemic values,” such as simplicity and fruitfulness. But I do not want to beg any questions about whether values that are paradigmatically ethical or social can or cannot, or should or should not, play important epistemic roles. So I prefer not to use that vocabulary. I talk instead about social and ethical values when I am referring to things that are valued for paradigmatically social or ethical reasons. I do not carefully distinguish in this discussion between the social and the ethical.
The philosophically controversial question about social and ethical values is about the degree to which they are involved (or better put: the degree to which they are necessarily involved, or inevitably involved, and perhaps most importantly uncorrectably involved) in the appraisal of hypotheses, theories, models, and predictions. This is the question, after all, of the degree to which the epistemic and the normative can be kept apart.
This is a question of some importance because we would like to believe that only experts should have a say in what we ought to believe about the natural world. But we also think that it is not experts, or at least not experts qua experts, who should get to say what is important to us, or what is valuable, or has utility. Such a division of labor, however, is only possible to the extent that the appraisal of scientific hypotheses, and other matters that require scientific expertise, can be carried out in a manner that is free of the influence of social and ethical values.
Philosophers of science of various stripes have mounted a variety of arguments to the effect that the epistemic matter of appraising scientific claims of various kinds cannot be kept free of social and ethical values. Here, we will be concerned only with one such line of argument—one that is closely connected to the issue of UQ—that goes back to the midcentury work of the statistician C. West Churchman (1948, 1956) and a philosopher of science, Richard Rudner (1953). This line of argument is now frequently referred to as the argument from inductive risk. It was first articulated by Rudner in the following schematic form:
- 1. The scientist qua scientist accepts or rejects hypotheses.
- 2. No scientific hypothesis is ever completely (with 100% certainty) verified.
- 3. The decision to either accept or reject a hypothesis depends upon whether the evidence is sufficiently strong.
- 4. Whether the evidence is sufficiently strong is “a function of the importance, in a typically ethical sense, of making a mistake in accepting or rejecting the hypothesis.”
- 5. Therefore, the scientist qua scientist makes value judgments.
Rudner’s oft repeated example was this: How sure do you have to be about a hypothesis if it says (1) a toxic ingredient of a drug is not present in lethal quantity, versus (2) a certain lot of machine-stamped belt buckles is not defective. “How sure we need to be before we accept a hypothesis will depend upon how serious a mistake would be?” (Rudner 1953, 2). We can easily translate Rudner’s lesson into an example from climate science. Should we accept or reject the hypothesis, for example, that, given future emissions trends, a certain regional climate outcome will occur? Should we accept the hypothesis, let us say, that a particular glacial lake dam will burst in the next fifty years? Suppose that, if we accept the hypothesis, we will replace the moraine with a concrete dam. But whether we want to build the dam will depend not only on our degree of evidence for the hypothesis, but also on how we would measure the severity of the consequences of building the dam and having the glacier not melt, versus not building the dam and having the glacier melt. Thus, Rudner would have us conclude that so long as the evidence is not 100 percent conclusive we cannot justifiably accept or reject the hypothesis without making reference to our social and ethical values.
The best known reply to Rudner’s argument came from the philosopher, logician, and decision theorist Richard Jeffrey (1956). Jeffrey argued that the first premise of Rudner’s argument, that it is the proper role of the scientist qua scientist to accept and reject hypotheses, is false. Their proper role, he urged, is to assign probabilities to hypotheses with respect to the currently available evidence. Others, for example policy makers, can attach values or utilities to various possible outcomes or states of affairs and, in conjunction with the probabilities provided by scientists, decide how to act.
It is clear that Jeffrey did not anticipate the difficulties that modern climate science would have with the task that he expected to be straightforward and value free, the assignment of probability with respect to the available evidence. There are perhaps many differences between the kinds of examples that Rudner and Jeffrey had in mind and the kinds of situations faced by climate scientists. For one, Rudner and Jeffrey discuss cases in which we need the probability of the truth or falsity of a single hypothesis, but climate scientists generally are faced with having to assign probability distributions over a space of possible outcomes.
I believe that the most significant difference between the classic kind of inductive reasoning Jeffrey had in mind (in which the probabilities scientists are meant to offer are their subjective degrees of belief based on the available evidence) and the contemporary situation in climate science is the extent to which epistemic agency in climate science is distributed across a wide range of scientists and tools. This is a point that will receive more attention later in this discussion. For now, we should turn to what I would claim are typical efforts in climate science to deliver probabilistic forecasts, and we will see how they fare in terms of Jeffrey’s goal to use probabilities to divide labor between the epistemic and the normative.
Uncertainty in Climate Science
Where do probabilistic forecasts in climate science come from? We should begin with a discussion of the sources of uncertainty in climate models. There are two main sources that concern us here: structural model uncertainty and parameter uncertainty. Although the construction of climate models is guided by basic science—science in which we have a great deal of confidence—these models also incorporate a barrage of auxiliary assumptions, approximations, and parameterizations, all of which contribute to a degree of uncertainty about the predictions of these models. Different climate models (with different basic structures) produce substantially different predictions. This source of uncertainty is often called structural model uncertainty.
Next, complex models involve large sets of parameters or aspects of the model that have to be quantified before the model can be used to run a simulation of a climate system. We are often highly uncertain about what the best value for many of these parameters is, and hence, even if we had at our disposal a model with ideal (or perfect) structure, we would still be uncertain about the behavior of the real system we are modeling because the same model structure will make different predictions for different values of the parameters. Uncertainty from this source is called parameter uncertainty.
Most efforts in contemporary climate science to measure these two sources of uncertainty focus on what one might call sampling methods. In practice, in large part because of the high computational cost of each model run, these methods are extremely technically sophisticated, but in principle they are rather straightforward.
I can best illustrate the idea of sampling methods with an example regarding parameter uncertainty. Consider a simulation model with one parameter and several variables. If one has a data set against which to benchmark the model, one could assign a weighted score to each value of the parameter based on how well it retrodicts values of the variables in the available data set. Based on this score, one could then assign a probability to each value of the parameter. Crudely speaking, what we are doing in an example like this is observing the frequency with which each value of the parameter is successful in replicating known data. How many of the variables does it get right? With how much accuracy? Over what portion of the time history of the data set? And then we weight the probability of the parameter taking this value in our distribution in proportion to how well it had fared in those tests.
The case of structural model uncertainty is similar. The most common method of estimating the degree of structural uncertainties in the predictions of climate models is a set of sampling methods called “ensemble methods.” The core idea is to examine the degree of variation in the predictions of the existing set of climate models that happen to be on the market. By looking at the average prediction of the set of models and calculating their standard deviation, one can produce a probability distribution for every value that the models calculate.
Some Worries about the Standard Methods
There are reasons to doubt that these kinds of straightforward methods for estimating both structural model uncertainty and parameter uncertainty are conceptually coherent. Signs of this are visible in the results that have been produced. These signs have been particularly well noted by the climate scientists Claudia Tebaldi and Reto Knutti (2007), who note, in the first instance, that many studies founded on the same basic principles produce radically different probability distributions.
Indeed, I would argue that there are four reasons to suspect that ensemble methods are not a conceptually coherent set of methods.
- 1. Ensemble methods either assume that all models are equally good, or they assume that the set of available methods can be relatively weighted.
- 2. Ensemble methods assume that, in some relevant respect, the set of available models represent something like a sample of independent draws from the space of possible model structures.
- 3. Climate models have shared histories that are very hard to sort out.
- 4. Climate modelers have a herd mentality about success.
I will discuss each of these four reasons in what follows. But first, consider a simple example that mirrors all four elements on the list. Suppose that you would like to know the length of a barn. You have one tape measure and many carpenters. You decide that the best way to estimate the length of the barn is to send each carpenter out to measure the length, and you take the average. There are four problems with this strategy. First, it assumes that each carpenter is equally good at measuring. But what if some of the carpenters have been drinking on the job? Perhaps you could weight the degree to which their measurements play a role in the average in inverse proportion to how much they have had to drink. But what if, in addition to drinking, some have also been sniffing from the fuel tank? How do you weight these relative influences? Second, you are assuming that each carpenter’s measurement is independently scattered around the real value. But why think this? What if there is a systematic error in their measurements? Perhaps there is something wrong with the tape measure that systematically distorts them. Third (and relatedly), what if all the carpenters went to the same carpentry school, and they were all taught the same faulty method for what to do when the barn is longer than the tape measure? And fourth, what if each carpenter, before they record their value, looks at the running average of the previous measurements, and if theirs deviates too much they tweak it to keep from getting the reputation as a poor measurer?
All these sorts of problems play a significant role—both individually and, especially, jointly—in making ensemble statistical methods in climate science conceptually troubled. I will now discuss the role of each of them in climate science in detail.
- 1. Ensemble methods either assume that all models are equally good, or they assume that the set of available methods can be relatively weighted.
If you are going to use an ensemble of climate models to produce a probability distribution, you ought to have some grounds for believing that all of them ought to be given equal weight in the ensemble. Failing that, you ought to have some principled way to weight them. But no such thing seems to exist. Although there is widespread agreement among climate scientists that some models are better than others, quantifying this intuition seems to be particularly difficult. It is not difficult to see why.
As Gleckler, Taylor, and Doutriaux (2008) point out, no single metric of success is likely to be useful for all applications. They carefully studied the success of various models at various prediction tasks. They showed that there are some unambiguous flops on the list and no unambiguous winner—and no clear way to rank them.
- 2. Ensemble methods assume that, in some relevant respect, the set of available models represent something like a sample of independent draws from the space of possible model structures.
This is surely the greatest problem with ensemble statistical methods. The average and standard deviation of a set of trials is only meaningful if those trials represent a random sample of independent draws from the relevant space—in this case, the space of possible model structures. Many commentators have noted that this assumption is not met by the set of climate models on the market. In fact, I would argue it is not exactly clear what this would even mean in this case. What, after all, is the space of possible model structures? And why would we want to sample randomly from this? After all, we want our models to be as physically realistic as possible, not random. Perhaps we are meant to assume instead that the existing models are randomly distributed around the ideal model, in some kind of normal distribution. This would be an analogy to measurement theory. But modeling is not measurement, and there is very little reason to think this assumption holds.
- 3. Climate models have shared histories that are very hard to sort out.
One obvious reason to doubt that the last assumption is valid is that large clusters of the climate models on the market have shared histories. Some models share code, many scientists move from one laboratory to another and bring ideas with them, some parts of climate models (though not physically principled) are from a common toolbox of techniques, and so on. Worse still, we do not even have a systematic understanding of these inter-relations. So it is not just the fact that most current statistical ensemble methods are naïve with respect to these effects, but that it is far from obvious that we have the background knowledge we would need to eliminate this “naïveté”—to account for them statistically.
- 4. Climate modelers have a herd mentality about success.
Herd mentality is a frequently noted feature of climate modeling. Most climate models are highly tunable with respect to some of their variables, and to the extent that no climate laboratory wants to be the oddball on the block there is significant pressure to tune one’s model to the crowd. This kind of phenomenon has historical precedent. In 1939 Walter Shewhart published a chart of the history of measurement of the speed of light. The chart showed a steady convergence of measured values that was not well explained by their actual success. Myles Allen (2008) put the point like this: “If modelling groups, either consciously or by ‘natural selection,’ are tuning their flagship models to fit the same observations, spread of predictions becomes meaningless: eventually they will all converge to a delta-function.”
The Inevitability of Values: Douglas contra Jeffrey
What should we make of all these problems from the point of view of the Rudner–Jeffrey debate? This much should be clear: from the point of view of Jeffrey’s goal, to separate the epistemic from the normative, UQ based on statistical ensemble methods will not do. But this much should have been clear from Heather Douglas’s (2000) discussion of the debate about science and values.
Douglas noted a flaw in Jeffrey’s response to Rudner. She remarked that scientists often have to make methodological choices that do not lie on a continuum. Douglas points out that which choice I make will depend on my inductive risk profile. To the extent that I weigh more heavily the consequences of saying that the hypothesis is false if it is in fact true, I will choose a method with a higher likelihood of false positives. And vice versa. But that, she points out, depends on my social and ethical values. Social and ethical values, she concludes, play an inevitable role in science.
There are at least two ways in which methodological choices in the construction of climate models will often ineliminably reflect value judgments in the typically social or ethical sense.
- 1. Model choices have reflected balances of inductive risk.
- 2. Models have been optimized, over their history, to particular purposes and to particular metrics of success.
The first point should be obvious from our discussion of Rudner. When a climate modeler is confronted with a choice between two ways of solving a modeling problem, she may be aware that each choice strikes a different balance of inductive risks with respect to a problem that concerns her at the time. Choosing which way to go in such a circumstance will have to reflect a value judgment. This will always be true so long as a methodological choice between methods A and B are not epistemologically forced in the following sense: while option A can be justified on the grounds that it is less likely to predict, say, outcome O than B is when O will not in fact occur, option B could also be preferred on the grounds that it is more likely to predict O if O will in fact occur.
The second point is that when a modeler is confronted with a methodological choice she will have to decide which metric of success to use when evaluating the success of the various possibilities. And it is hard to see how choosing a metric of success will not reflect a social or ethical value judgment or possibly even a response to a political pressure about which prediction task is more “important” (in a not-purely-epistemic sense). Suppose choice A makes a model that looks better at matching existing precipitation data, but choice B is better at matching temperature data. She will need to decide which prediction task is more important in order to decide which method of evaluation to use, and that will influence which methodological choice is pursued.
I think this discussion should make two things clear. First, ensemble sampling approaches to UQ are founded on conceptually shaky ground. Second, and perhaps more importantly, they do not enable UQ to fulfill its primary function—to divide the epistemic from the normative in the way that Jeffrey expected probabilistic forecasts to do. And they fail for just the reasons that Douglas made famous: because they ossify past methodological choices (which themselves can reflect balances of inductive risk and other social and ethical values) into “objective” probabilistic facts.
This raises, of course, the possibility that climate UQ could respond to these challenges by avoiding the use of “objective” statistical ensemble methods and by adopting more self-consciously Bayesian methods that attempt to elicit the expert judgment of climate modelers about their subjective degrees of belief concerning future climate outcomes. Call this the Bayesian response to the Douglas challenge (BRDC).
Indeed, this approach to UQ has been endorsed by several commenters on the problem. Unfortunately, the role of genuinely subjective Bayesian approaches to climate UQ has been primarily in theoretical discussions of what to do, rather than in actual estimates that one sees published and that are delivered to policy makers. Here, I would to point to some of the difficulties that might explain this scarcity. Genuinely Bayesian approaches to UQ in climate science, in which the probabilities delivered reflect the expert judgment of climate scientists rather than observed frequencies of model outputs, face several difficulties. In particular, they arise as a consequence of three features of climate models: their massive size and complexity; the extent to which epistemic agency in climate modeling is distributed, in both time and space, across a wide range of individuals; and the degree to which methodological choices in climate models are generatively entrenched. I will try to say a bit about what I mean by each of these features in the next section.
Three Features of Climate Models
Size and Complexity
Climate models are enormous and complex. Take one of the state of the art American climate models, the U.S. National Oceanic and Atmospheric Administration (NOAA) Geophysical Fluid Dynamics Laboratory (GFDL) CM2.x. The computational model itself contains over a million lines of code. There are over a thousand different parameter options. It is said to involve modules that are “constantly changing” and involve hundreds of initialization files that contain “incomplete documentation.” The CM2.x is said to contain novel component modules written by over a hundred different people. Just loading the input data into a simulation run takes over two hours. Using over a hundred processors running in parallel, it takes weeks to produce one model run out to the year 2100, and months to reproduce thousands of years of paleoclimate. If you store the data from a state-of-the-art general circulation model (GCM) every five minutes, they can produce tens of terabytes per model year.
Another aspect of the models’ complexity is their extreme “fuzzy modularity” (Lenhard and Winsberg, 2010). In general, a modern state-of-the-art climate model has a theoretical core that is surrounded and supplemented by various submodels that themselves have grown into complex entities. The interaction of all of them determines the dynamics. And these interactions are themselves quite complex. The coupling of atmospheric and oceanic circulation models, for example, is recognized as one of the milestones of climate modeling (leading to so-called coupled GCMs). Both components have had their independent modeling history, including an independent calibration of their respective model performance. Putting them together was a difficult task because the two submodels now interfered dynamically with each other.
Today, atmospheric GCMs have lost their central place and given way to a deliberately modular architecture of coupled models that comprise a number of highly interactive submodels, such as atmosphere, oceans, or ice cover. In this architecture the single models act (ideally!) as interchangeable modules. This marks a turn from one physical core—the fundamental equations of atmospheric circulation dynamics—to a more networked picture of interacting models from different disciplines (Küppers and Lenhard 2006).
In sum, climate models are made up of a variety of modules and submodels. There is a module for the general circulation of the atmosphere, a module for cloud formation, for the dynamics of sea and land ice, for effects of vegetation, and many more. Each of them, in turn, includes a mixture of principled science and parameterizations. And it is the interaction of these components that brings about the overall observable dynamics in simulation runs. The results of these modules are not first gathered independently and only after that synthesized; rather, data are continuously exchanged between all modules during the runtime of the simulation. The overall dynamics of one global climate model are the complex result of the interaction of the modules—not the interaction of the results of the modules. This is why I modify the word “modularity” with the warning flag “fuzzy” when I talk about the modularity of climate models: due to interactivity and the phenomenon of “balance of approximations,” modularity does not break down a complex system into separately manageable pieces.
Distributed Epistemic Agency
Climate models reflect the work of hundreds of researchers working in different physical locations and at different times. They combine incredibly diverse kinds of expertise, including climatology, meteorology, atmospheric dynamics, atmospheric physics, atmospheric chemistry, solar physics, historical climatology, geophysics, geochemistry, geology, soil science, oceanography, glaciology, paleoclimatology, ecology, biogeography, biochemistry, computer science, mathematical and numerical modeling, time series analysis, and others.
Not only is epistemic agency in climate science distributed across space (the science behind model modules comes from a variety of laboratories around the world) and domains of expertise, but also across time. No state-of-the-art, coupled atmosphere-ocean GCM (AOGCM) is literally built from the ground up in one short surveyable unit of time. They are assemblages of methods, modules, parameterization schemes, initial data packages, bits of code, and coupling schemes that have been built, tested, evaluated, and credentialed over years or even decades of work by climate scientists, mathematicians, and computer scientists of all stripes.
Methodological Choices Are Generatively Entrenched
Johannes Lenhard and I (2010) have argued that complex climate models acquire an intrinsically historical character and show path dependency. The choices that modelers and programmers make at one time about how to solve particular problems of implementation have effects on what options will be available for solving problems that arise at a later time. And they will have effects on what strategies will succeed and fail. This feature of climate models, indeed, has led climate scientists such as Smith (2002) and Palmer (2001) to articulate the worry that differences between models are concealed in code that cannot be closely investigated in practice. We called this feature of climate models generative entrenchment and argued that it leads to an analytical impenetrability of climate models; we have been unable—and are likely to continue to be unable—to attribute all or perhaps even most of the various sources of their successes and failures to their internal modeling assumptions.
This last claim should be clarified to avoid misunderstanding. As we have seen, different models perform better under certain conditions than others. But if model A performs better at making predictions on condition A*, and model B performs better under condition B*, then optimistically one might hope that a hybrid model—one that contained some features of model A and some features of model B—would perform well under both sets of conditions. But what would such a hybrid model look like?
Ideally, to answer that question one would like to attribute the success of each of the models A and B to the success of their particular submodels—or components. One might hope, for example, that a GCM that is particularly good at predicting precipitation is one that has, in some suitably generalizable sense, a particularly good rain module. We called success in such an endeavor, the process of teasing apart the sources of success and failure of a simulation, “analytic understanding” of a global model. We would say that one has such understanding precisely when one is able to identify the extent to which each of the submodels of a global model is contributing to its various successes and failures.
Unfortunately, analytic understanding is extremely hard to achieve in this context. The complexity of interaction between the modules of the simulation is so severe, as is the degree to which balances of approximation play an important role, that it becomes impossible to independently assess the merits or shortcomings of each submodel. One cannot trace back the effects of assumptions because the tracks get covered during the kludging together of complex interactions. This is what we called “analytic impenetrability” (Lenhard and Winsberg, 2010, 261). But analytic impenetrability makes epistemically inscrutable the effects on the success and failure of a global model of the past methodological assumptions that are generatively entrenched.
State-of-the-art global climate models are highly complex, they are the result of massively distributed epistemic labors, and they arise from a long chain of generatively entrenched methodological choices whose effects are epistemically inscrutable. These three features, I would now argue, make the BRDC very difficult to pull off with respect to climate science.
The Failure of the BRDC in Climate Science
Recall how the BRDC is meant to go. Rudner argues that the scientist who accepts or rejects hypotheses has to make value judgments. Jeffrey replies that she should only assign probabilities to hypotheses on the basis of the available evidence and, in so doing, avoid making value judgments. Douglas argues that scientists make methodological choices and that these choices will become embedded in the mix of elements that give rise to estimates of probabilities that come from classical, as opposed to Bayesian, statistics. Because those methodological choices will involve a balance of inductive risks, the scientist cannot avoid value judgments. The BRDC suggests that scientists avoid employing any deterministic algorithm that will transmit methodological choices into probabilities (such as employing a classical statistical hypothesis test in the toxicology case, or employing ensemble averages in the climate case) and should instead rely on their expert judgment to assess what the appropriate degree of belief in a hypothesis is, given that a particular methodological choice is made and resultant evidence acquired. The probabilities such a scientist would offer should be the scientist’s subjective degree of belief, one that has been conditionalized on the available evidence.
There are, in fact, other methods for estimating probabilities in climate science that lean more heavily on the subjective judgment of experts, such as expert elicitation methods in which priors and likelihoods are obtained by asking scientists for their degrees of belief in various hypotheses. Imagining that these estimates are free from the values I have discussed overlooks the fact that the judgments of experts can be shaped significantly by their experience working with particular models and by their awareness of particular modeling results. Indeed, models are used in the study of complex systems in part because it can be very difficult to reason (accurately) about such systems without them. For sciences in which complex, nonlinear models play a prominent role, it is naïve to think that scientists can typically reach conclusions about probabilities that are both adequately well-informed and independent of the set of models that happen to be available to them at any particular time.
Values in the Nooks and Crannies
At this point in the discussion, it might be natural for a reader to ask for a specific example of a social, political, or ethical value that has influenced a methodological choice in the history of climate modeling. It is easy to give a couple of potted examples. In previous work, I have focused on what I have here labeled the second kind of role of values: that climate models have been optimized, over their history, to particular purposes and to particular metrics of success. I gave the example that in the past modelers had perhaps focused on the metric of successfully reproducing known data about global mean surface temperature, rather than other possible metrics. And I speculated that they might have done so because of a social and political climate in which the concern was about “global warming” rather than the now more popular phrase “anthropogenic climate change.”
But I now think it was a mistake to focus on particular historical claims about specific motives and choices. I want to focus instead on the fact that climate modeling involves literally thousands of unforced methodological choices. Many crucial processes are poorly understood, and many compromises in the name of computational exigency need to be made. All one needs to see is that, as in the case of the biopsy stain in the toxicology case, no unforced methodological choice can be defended in a value vacuum. If one asks, “Why parameterize this process rather than try to resolve it on the grid?” or “Why use this method for modeling cloud formation?” it will rarely be the case that the answer can be “because that choice is objectively better than the alternative.” As some of the examples I have provided clearly illustrate, most choices will be better in some respects and worse in other respects than their alternatives, and the preference for the one over the other will reflect the judgment that one aspect is more important. Some choices will invariably increase the probability of finding a certain degree of climate variation, while its alternative will do the opposite—and so the choice that is made can be seen as reflecting a balance of inductive risks.
Kevin Elliot (2011b, 55) has identified three conditions under which scientists should be expected to incorporate social and ethical values in particular scientific cases: (1) the “ethics” principle, that scientists have an ethical responsibility to consider the impact of their methodological choices on society in the case under consideration; (2) the “uncertainty” principle, that the available scientific information is uncertain or incomplete; and (3) the “no-passing-the-buck” principle, that scientists cannot just withhold their judgment or give value-free information to policy makers and let them deal with the social/ethical issues. That the second condition is in play in climate science is clear. That the third one is in play follows from the failure of the BRDC.
How do we know that the first one is in play without mention of particular historical claims about specific motives and choices? I think all we need to argue here is that many of the choices made by climate modelers had to have been unforced in the absence of a relevant set of values—that in retrospect such choices could only be defended against some set of predictive preferences and some balance of inductive risks. In other words, any rational reconstruction of the history of climate science would have to make mention of predictive preferences and inductive risks at pain of making most of these choices seem arbitrary. But what I want to be perfectly clear about here (in a way that I think I have not been in earlier work) is that I do not mean to attribute to the relevant actors these psychological motives, nor any particular specifiable or recoverable set of interests. I am not in the business of making historical, sociological, or psychological claims. I have no idea why individual agents made the choices that they made—and indeed it is part of my argument that these facts are mostly hidden from view. In fact, for many of the same reasons that these methodological choices are immune from the BRDC, they are also relatively opaque to us from a historical, philosophical, and sociological point of view. They are buried in the historical past under the complexity, epistemic distributiveness, and generative entrenchment of climate models.
Some readers may find that this makes my claim about the value-ladenness of climate models insufficiently concrete to have any genuine bite. One might ask, “Where are the actual values?” Some readers, in other words, might be craving some details about how agents have been specifically motivated by genuine concrete ethical or political considerations. They might be tempted to think that I have too abstractly identified the role of values here to be helpful. But this is to miss the dialectical structure of my point. The very features that make the BRDC implausible make this demand unsatisfiable. No help of the sort that “finds the hidden values” can be forthcoming on my account. The social, political, and ethical values that find their way into climate models cannot be recovered in bite-sized pieces.
Recall that we began this whole discussion with a desire to separate the epistemic from the normative. But we have now learned that, with respect to science that relies on models that are sufficiently complex, epistemically distributed, and generatively entrenched, it becomes increasingly difficult to tell a story that maintains that kind of distinction. And without being able to provide a history that respects that distinction, there is no way to isolate the values that have been involved in the history of climate science.
One consequence of the blurred distinction between the epistemic and the normative in our case is that the usual remarks philosophers often make about the value-ladenness of science do not apply here. Those who make the claim that science is value-laden often follow up with the advice that scientists ought to be more self-conscious in their value choices, and that they ought to ensure that their values reflect those of the people they serve. Or they suggest implementing some system for soliciting public opinions or determining public values and making that the basis for these determinations. But in the picture I am painting neither of these options is really possible. The bits of value-ladenness lie in all the nooks and crannies, might very well have been opaque to the actors who put them there, and are certainly opaque to those who stand at the end of the long, distributed, and path-dependent process of model construction. In the case of the biopsy stains I can say, “Consumer protection is always more important than corporate profits! Even in the absence of epistemologically forcing consideration, the toxicologist should choose the stain on the left!” But in the climate case, the situation is quite different. We can of course ask for a climate science that does not reflect systematic biases, like one that is cynically paid for by the oil industry. But this kind of demand for a science that reflects the “right values” cannot go all the way down into all those nooks and crannies. In those relevant respects, it becomes terribly hard to ask for a climate science that reflects “better” values.
1. I would like to emphasize that the focus of this chapter is on that topic: attempts to predict the pace and tempo of future climate change, rather than on the question of whether climate change is happening and whether humans are its cause. The first topic is a topic of scientific controversy with interesting epistemological issues. The second topic is not.
2. And when I variously use the expressions “social values,” “ethical values,” or “social and ethical values,” these differences in language should not be read as flagging important philosophical differences.
3. In addition to Churchman and Rudner, see also Frank (1954), Neurath (1913/1983), Douglas (2000), Howard (2006), Longino (1990, 1996, 2002), Kourany (2003a, 2003b), Solomon (2001), Wilholt (2009), and Elliott (2011a, 2011b).
4. Many discussions of UQ in climate science will also identify data uncertainty. In evaluating a particular climate model, including both its structure and parameters, we compare the model’s output to real data. Climate modelers, for example, often compare the outputs of their models to records of past climate. These records can come from actual meteorological observations or from proxy data—inferences about past climate drawn from such sources as tree rings and ice core samples. Both of these sources of data, however, are prone to error, so we are uncertain about the precise nature of the past climate. This, in turn, has consequences for our knowledge of the future climate. Although data uncertainty is a significant source of uncertainty in climate modeling, I will not discuss this source of uncertainty here. For the purposes of this discussion, I make the crude assumption that the data against which climate models are evaluated are known with certainty. Notice, in any case, that data uncertainty is part of parameter uncertainty and structural uncertainty because it acts by affecting our ability to judge the accuracy of our parameters and our model structures.
5. A parameter for a model is an input that is fixed for all time whereas a variable takes a value that varies with time. A variable for a model is thus both an input for the model (the value the variable takes at some initial time) and an output (the value the variable takes at all subsequent times). A parameter is simply an input.
6. Some might argue that if we look at how the models perform on past data (for, say, mean global surface temperature), they often are distributed around the observations. But, first, these distributions do not display anything like random characteristics (i.e., normal distribution). And, second, this feature of one variable for past data (the data for which the models have been tuned) is a poor indicator that it might obtain for all variables and for future data.
7. An article by Masson and Knutti, “Climate Model Geneology” (2011), discusses this phenomenon and its effects on multimodel sampling in detail.
8. Which, inter alia, did much to bring the issue of “inductive risk” back into focus for contemporary philosophy of science and epistemology.
9. See, for example, Goldstein and Rougier (2006).
10. All the above claims and quotations come from Dunne (2006), communicated to the author by John Dunne and V. Balaji, in personal correspondence.
11. For an account of the controversies around early coupling, see Shackley et al. (1999); for a brief history of modeling advances, see Weart (2010).
12. As, for example, in the Earth System Modeling Framework (ESMF); see, for instance, Dickenson et al. (2002).
13. In that sense, one can accurately describe them as parallel rather than serial models, in the sense discussed in Winsberg (2006).
14. “Balance of approximations” is a term introduced by Lambert and Boer (2001) to indicate that climate models sometimes succeed precisely because the errors introduced by two different approximations cancel out.
15. There has been a move, in recent years, to eliminate “legacy code” from climate models. Even though this may have been achieved in some models (this claim is sometimes made about CM2), it is worth noting that there is a large difference between coding a model from scratch and building it from scratch (i.e., devising and sanctioning from scratch all the elements of a model).
16. See especially Biddle and Winsberg (2009) and also chapter 6 of Winsberg (2010).
17. One might complain that if the decisions do not reflect the explicit psychological motives or interests of the scientist, then they do not have a systematic effect on the content of science, and are hence no different than the uncontroversial examples of social values I mentioned in the introduction (such as attaching greater value to AIDS research than to algebraic quantum field theory). But though the effect of the values in the climate case might not have a systematic effect on the content of science, it is nonetheless an effect internal to science in a way that those other examples are not.
Allen, Myles. 2008. “What Can Be Said about Future Climate? Quantifying Uncertainty in Multi-Decade Climate Forcasting.” ClimatePrediction.net, Oxford University. https://www.climateprediction.net/wp-content/publications/allen_Harvard2008.pdf.
Biddle, Justin, and Eric Winsberg. 2009. “Value Judgments and the Estimation of Uncertainty in Climate Modeling.” In New Waves in the Philosophy of Science, edited by P. D. Magnus and J. Busch, 172–97. New York: Palgrave MacMillan.
Churchman, C. West. 1948. Theory of Experimental Inference. New York: Macmillan.
Churchman, C. West. 1956. “Science and Decision Making.” Philosophy of Science 22: 247–49.
Dickenson, Robert E., Stephen E. Zebiak, Jeffrey L. Anderson, Maurice L. Blackmon, Cecelia De Luca, Timothy F. Hogan, Mark Iredell, Ming Ji, Ricky B. Rood, Max J. Suarez, and Karl E. Taylor. 2002. “How Can We Advance Our Weather and Climate Models as a Community?” Bulletin of the American Meteorological Society 83: 431–34.
Douglas, Heather. 2000. “Inductive Risk and Values in Science.” Philosophy of Science 67: 559–79.
Dunne, J. 2006. “Towards Earth System Modelling: Bringing GFDL to Life.” Presented at the 18th Annual Australian Community Climate and Earth System Simulator (ACCESS) BMRC Modeling Workshop, November 28–December 1, 2006.
Elliot, Kevin C. 2011a. “Direct and Indirect Roles for Values in Science.” Philosophy of Science 78: 303–24.
Elliot, Kevin C. 2011b. Is a Little Pollution Good for You? Incorporating Societal Values in Environmental Research. New York: Oxford University Press.
Frank, Philipp G. 1954. “The Variety of Reasons for the Acceptance of Scientific Theories.” In The Validation of Scientific Theories, edited by P. G. Frank, 3–17. Boston: Beacon Press.
Gleckler, P. J., K. E. Taylor, and C. Doutriaux. 2008. “Performance Metrics for Climate Models.” Journal of Geophysical Research 113 (D6): D06104.doi:10.1029/2007JD008972.
Goldstein, M., and J. C. Rougier. 2006. “Bayes Linear Calibrated Prediction for Complex Systems.” Journal of the American Statistical Association 101: 1132–43.
Howard, Don A. 2006. “Lost Wanderers in the Forest of Knowledge: Some Thoughts on the Discovery-Justification Distinction.” In Revisiting Discovery and Justification: Historical and Philosophical Perspectives on the Context Distinction, edited by J. Schickore and F. Steinle, 3–22. New York: Springer.
Jeffrey, Richard C. 1956. “Valuation and Acceptance of Scientific Hypotheses.” Philosophy of Science 22: 237–46.
Kourany, Janet. 2003a. “A Philosophy of Science for the Twenty-First Century.” Philosophy of Science 70: 1–14.
Kourany, Janet. 2003b. “Reply to Giere.” Philosophy of Science 70: 22–26.
Küppers, Günter, and Johannes Lenhard. 2006. “Simulation and a Revolution in Modelling Style: From Hierarchical to Network-like Integration.” In Simulation: Pragmatic Construction of Reality, edited by J. Lenhard, G. Küppers, and T. Shinn, 89–106. Dordrecht, the Netherlands: Springer.
Lambert, Steven J., and G. J. Boer. 2001. “CMIP1 Evaluation and Intercomparison of Coupled Climate Models.” Climate Dynamics 17: 83–106.
Lenhard, Johannes, and Eric Winsberg. 2010. “Holism, Entrenchment, and the Future of Climate Model Pluralism.” Studies in History and Philosophy of Modern Physics 41: 253–62.
Longino, Helen. 1990. Science as Social Knowledge: Values and Objectivity in Scientific Inquiry. Princeton, N.J.: Princeton University Press.
Longino, Helen. 1996. “Cognitive and Non-Cognitive Values in Science: Rethinking the Dichotomy.” In Feminism, Science, and the Philosophy of Science, edited by L. H. Nelson and J. Nelson, 39–58. Dordrecht, the Netherlands: Kluwer Academic.
Longino, Helen. 2002. The Fate of Knowledge. Princeton, N.J.: Princeton University Press.
Masson, D., and R. Knutti. 2011. “Climate Model Genealogy.” Geophysical Research Letters 38: L08703. doi:10.1029/2011GL046864.
McMullin, Ernan. 1983. “Values in Science.” In PSA 1982: Proceedings of the Biennial Meeting of the Philosophy of Science, 1982. Vol. 2: Symposia and Invited Papers, 3–28. Chicago: University of Chicago Press.
Neurath, Otto. (1913) 1983. “The Lost Wanderers of Descartes and the Auxiliary Motive (On the Psychology of Decision).” In Philosophical Papers 1913–1946, edited and translated by R. S. Cohen and M. Neurath, 1–12. Dordrecht, the Netherlands: D. Reidel. First published as “Die Verirrten des Cartesius und das Auxiliarmotiv. Zur Psychologie des Entschlusses,” in Jahrbuch der Philosophischen Gesellschaft an der Universität Wien (Leipzig: Johann Ambrosius Barth).
Palmer, T. N. 2001. “A Nonlinear Dynamical Perspective on Model Error: A Proposal for Non-local Stochastic–Dynamic Parameterization in Weather and Climate Prediction Models.” Quarterly Journal of the Royal Meteorological Society 127: 279–304.
Rudner, Richard. 1953. “The Scientist Qua Scientist Makes Value Judgments.” Philosophy of Science 20: 1–6.
Shackley, Simon, J. Risbey, P. Stone, and Brian Wynne. 1999. “Adjusting to Policy Expectations in Climate Change Science: An Interdisciplinary Study of Flux Adjustments in Coupled Atmosphere Ocean General Circulation Models.” Climatic Change 43: 413–54.
Shewhart, Walter A. 1939. Statistical Method from the Viewpoint of Quality Control. New York: Dover.
Smith, Leonard A. 2002. “What Might We Learn from Climate Forecasts?” Proceedings of the National Academy of Sciences of the United States of America 4: 2487–92.
Solomon, Miriam. 2001. Social Empiricism. Cambridge, Mass.: MIT Press.
Tebaldi, Claudia, and Reto Knutti. 2007. “The Use of the Multi-model Ensemble in Probabilistic Climate Projections.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 365 (1857): 2053–75.
Weart, Spencer. 2010. “The Development of General Circulation Models of Climate.” Studies in History and Philosophy of Modern Physics 41: 208–217.
Wilholt, Torsten. 2009. “Bias and Values in Scientific Research.” Studies in History and Philosophy of Science 40: 92–101.
Winsberg, Eric. 2006. “Handshaking Your Way to the Top: Simulation at the Nanoscale.” Philosophy of Science 73: 582–594.
Winsberg, Eric. 2010. Science in the Age of Computer Simulation. Chicago: University of Chicago Press.