Tuesday, November 27, 2007

Black Boxes

There are different schools of thought on the best way to build a model. One of these, commonly known as ‘black-box’ emphasizes accurate behavior of the model with respect to input and output, stimulus and response, etc., regardless of whether the internal structure or other nature of the model itself bears any resemblance to the system it mimics. You treat the system as if it is contained by a black box that prevents you from seeing inside it, and so choose not to care about what you can’t see, as long as what you can see works as expected. I have heard the alternative called ‘white-box’, although that clearly misses the point, since you can’t see inside a white box any better than you can a black one. It would probably make more sense to call the first approach ‘opaque box’ so the second could be ‘transparent box’, or ‘closed box’ and ‘open box’. Regardless, the second approach emphasizes the correctness of the model as a reflection of the system it represents. In building such a model, you are not allowed any ‘then a miracle occurs’ steps; you must be able to justify the microscopic structure of the model, as well as its macroscopic behavior.

Of course, this dichotomy is not unique to modeling biological systems. For instance, most economists make a fundamental distinction between macro-economics and micro-economics. The former is concerned with characterizing, and ultimately predicting, the behavior of large-scale economic systems, while the latter attempts to explain the myriad small-scale decisions made by the individual agents making up such an economy. Some tenets do appear to succeed on both fronts, such as the law of supply and demand explaining macro- level price fluctuations in terms of micro- level decisions by consumers to buy, or by producers to invest in production. But there’s obviously a lot more to economics than that, or they wouldn’t keep handing out Nobel prizes, right? The macro/micro split is driven by the fact that economic systems are hideously complex, which presents a choice: do you want to try to describe the behavior of the whole system while not being able to explain any of it, or would you prefer a believable explanation of some small and artificially isolated portion of that system, and still be left flipping a coin when it comes to the system as a whole?

I’m not sure whether to claim that biological systems are more or less complex than economies. But they certainly appear to be complex enough to present the same choice when attempting to model them. Which way you decide depends on your goal, that is, what is your motivation for building the model in the first place? Companies like Entelos excel at building large-scale models to simulate the behavior of complex biological systems. Their goal, or at least the goal of their customers, is explicitly to use these models to replace testing on live subjects, whether for reasons of morality or expense. The sheer complexity and incomplete understanding of the systems they model, coupled with the urgency of their goal, places them squarely in the macro/black-box camp. If a few gazillion euros and the lives of countless bunnies are on the line, what's the harm of a few fudge factors? Or rather, if your priority is a reasonably accurate facsimile of the behavior of a system that is not fully understood, you will inevitably be forced to invoke the black box at some point in your modeling. Note: I have no inside information on Entelos or its processes, beyond what I have seen in public presentations. I also mean no disrespect; I am simply observing the pressures involved in modeling beyond what is known.

So what then is the problem with black-box modeling? I can make two arguments, one of them practical, and the other more aesthetic. The practical argument comes up if you plan to use the model to make predictions outside of the data used to generate it. You can fit a model to the data without being able to validate its internal correctness, but once you stray from the region of the fit, you simply cannot be certain of anything. On the other hand, if a model accurately reflects the components and interactions that make up a system, it can be expected to behave pretty much the same as that system even in previously unexplored territory. My other argument is the naive notion that it is worth understanding how something works, even if you can't yet think of a practical justification. This is the essence of scientific pursuit. The public may support science--and governments may fund it--based on the promise of practical applications. But scientists throughout history have pursued it simply because they are driven to understand.

The astronomer Ptolemy resorted to black-box constructs such as epicycles and cosmic spheres to model the motion of the planets. His system worked quite well, and made predictions that were quite accurate. Eventually, centuries of detailed measurement revealed subtle discrepancies, prompting the addition of less and less justifiable black boxes to keep the model afloat. Ultimately, this ungainly system of epicycles on epicycles was replaced by a heliocentric system, better reflecting the physical reality of the solar system, as we now understand it. Why did Ptolemy's system survive for 17 centuries, and what really prompted its replacement? Did the minor inaccuracies in prediction really matter to the average person? Even now, would it really make a practical difference in your life if Jupiter were a hundredth of a degree away from where you thought it would be? I believe in the end it was not about the accuracy of the predictions, but the aesthetics of the model itself, and what it said about the place of humans in the universe. This was of course also the basis for resistance to the change.

Black-box models may be a practical or even necessary compromise when working with otherwise unapproachably complex systems. From an application or business point of view, that makes perfect sense. But the scientist in me will always find them unsatisfying.

No comments: