The paradoxes and problems with Cox's Theorem (Part 3/4)
And, in any case, there are arguments for the claim that we must assign probabilities to hypotheses like ‘The die lands on 1’ and ‘There will exist at least \(10^{16}\) people in the future.’ If we don’t assign probabilities, we are vulnerable to getting Dutch-booked and accuracy-dominated …
… So, to summarize the above, we have to assign probabilities to empirical hypotheses, on pain of getting Dutch-booked and accuracy-dominated. And all reasonable-seeming probability assignments imply that we should pursue longtermist interventions. (last italics mine)
Anyone who questions the idea of assigning ‘credences’ to beliefs will quickly learn about Dutch books. Consider an agent who assigns probability \(0.9\) to the belief Jon Hamm will enslave us in a white egg-shaped gadget, and probability \(0.9\) to the belief Jon Hamm will not enslave us in a white egg-shaped gadget. The dutch book argument says this is irrational, because such a person would gladly bet \(\$0.90\) to win \(\$1.00\) if Jon Hamm enslaves us in a white egg-shaped gadget, and simultaneously bet \(\$0.90\) that this won’t happen. Thus the agent is going to lose money because having paid a total of \(\$1.80\), they’re guaranteed to make only \(\$ 1.00\) in return.
This is a Dutch book - a set of bets in which financial losses/gains are guaranteed. In order to be considered rational according to the Dutch Book argument, having assigned a credence of \(0.9\) to the belief Jon Hamm will enslave us in a white egg-shaped gadget, an agent must assign a credence of \(0.1\) to the belief Jon Hamm will not enslave us in a white egg-shaped gadget . More generally, the belief in \(p\) and \(\neg p\) must sum to one, and obey all the other rules of the probability calculus as well. (Accuracy-domination arguments are the same as Dutch books because every accuracy-dominating point corresponds to a dutch book. See the formal proof here.)
In my previous post, I argued against bayesian absolutism in favor of the instrumentalist perspective, which views probability theory as a useful tool in certain circumstances. This is the perspective held by most bayesian statisticians, and is best summarized by George Box in the aphorism “All models are wrong, but some are useful”.
The instrumentalist view is a threat to the idea of ‘credences’, and so is often met with appeals to Dutch book-style arguments, which all say the probability calculus is a tool we must use, if we are to be rational. A Law of Rationality in fact - a set of rules by which our thinking must abide. For example:
Old School statisticians thought in terms of tools, tricks to throw at particular problems. Bayesians - at least this Bayesian, though I don’t think I’m speaking only for myself - we think in terms of laws … Bayesian theorems are elegant, coherent, optimal, and provably unique because they are laws. (emphasis in original)
The laws of probability are laws, not suggestions, but often the true Law is too difficult for us humans to compute. (emphasis via hyperlink in original)
Core tenet 3: We can use the concept of probability to measure our subjective belief in something. Furthermore, we can apply the mathematical laws regarding probability to choosing between different beliefs. If we want our beliefs to be correct, we must do so. (emphasis via hyperlink in original)
(Kaj Sotala, linking to Eliezer Yudkowsky)
To Bayesians, the brain is an engine of accuracy: it processes and concentrates entangled evidence into a map that reflects the territory. The principles of rationality are laws in the same sense as the Second Law of Thermodynamics…
…Remember this, when you plead to be excused just this once. We can’t excuse you. It isn’t up to us. (emphasis mine)
Against this position I will make four arguments:
And I’ll end on an optimistic note, by offering an alternative framework for decision making. If you want to skip ahead, here’s a little table of contents.
(Note that none of these arguments “prove too much” because they only attack the subjectivist view of probability, but leave the instrumentalist view unharmed.)
A primary justification for the view that “the laws of probability are laws, not suggestions” is a desire to avoid paradox. Consider this quote from Eliezer Yudkowsky in My Bayesian Enlightenment, where he explains why one must strictly adhere to “The Rules” of rationality:
Here was probability theory, laid out not as a clever tool, but as The Rules, inviolable on pain of paradox. If you tried to approximate The Rules because they were too computationally expensive to use directly, then, no matter how necessary that compromise might be, you would still end doing less than optimal. … Paradoxes could not coexist with [E. T. Jaynes’s] precision. Not an answer, but the answer. (emphasis in original)
But as we have seen with The Pasadena Game, paradoxes lurk inside the framework also. For example, contrast the previous quotation from Yudkowsky with the following from Will MacAskill:
Robert Wiblin: It sounds like, if you take the expected value approach, then things might be quite straightforward. You give different probabilities to different moral theories. You look at how good or bad different actions are given those moral theories. Is there even that much more to say about this? It seems kind of easy.
Will MacAskill: If only that were true. I think there’s not just one book’s worth of content in terms of addressing some of the problems that this raises. I think it’s probably many volumes. First, there’s the question how do you even assign credences to different moral views. That’s already an incredibly difficult question. It’s not the question I address because that’s just the question of moral philosophy, really.
Secondly, supposing you do have a set of credences across different moral views, how do you act? … A third problem is the fanaticism problem, which is the worry that perhaps under moral uncertainty, what you ought to do is determined by really small credences in infinite amounts of value or huge amounts of value.
- Our descendants will probably see us as moral monsters. What should we do about that?
So from Yudkowsky we learn that we cannot violate the laws without running into paradox, only to learn from MacAskill that using the very same laws results in even more paradox! So many, in fact, they could fill a multi-volume book. Hence, paradoxes lurking outside bayesian epistemology are the reason one can never leave it, but paradoxes lurking inside are exciting research opportunities.
Paradoxes are also euphemistically referred to as “problems” - which implies they have a solution, and should therefore be worked on - later on in the interview:
Robert Wiblin: Is this like the 2 envelope paradox? It’s really reminding me of that.
Will MacAskill: You do get 2 envelope style problems under moral uncertainty …
Robert Wiblin: If you haven’t heard of the 2 envelope paradox it’s a real treat. I’ll stick up a link to that, and you should definitely check it out. It’s one of my favorite paradoxes._
So not only is the “paradox” reworded to “problem”, but we learn this in fact represents an entire style of “problem” with the field of moral uncertainty.
Other paradoxes within bayesian epistemology include the Necktie paradox, the St. Petersburg paradox, Newcomb’s paradox, Ellsberg Paradox, Pascal’s Mugging, Bertrand’s paradox, The Mere addition paradox (aka “The Repugnant Conclusion”), The Allais Paradox, The Boy or Girl Paradox, The Paradox Of The Absent Minded Driver (aka the “Sleeping Beauty problem”), and Siegel’s paradox.
By way of contrast, Popper recognized that when a question you were asking lead to paradox, that meant you were asking the wrong question. This lead to the fruitful methodological approach of replacing bad questions (i.e ones that lead to paradox) with better ones (i.e ones that don’t). Two examples of this approach are replacing “How do we prove induction?” with “How do we detect mistakes?”, and replacing “How do we have a perfectly representative democracy?” with “How do we remove bad rulers without violence?”.
One might also imagine replacing “How do we make perfect decisions?” with “How do we design systems that are resilient to bad decisions?”.
But let’s return to the subject at hand - where is the flaw in the Dutch book argument? To answer this, we must first discuss Cox’s Theorem, upon which all these claims are premised. For this we’ll enlist the assistance of three papers I consider to be essential reading for anyone interested in the subject of uncertainty quantification. They are:
Colyvan states Cox’s theorem, as well as the three assumptions underlying it, as follows:
Theorem 1 (Cox). Any measure of belief is isomorphic to a probability measure.
This holds if you are willing to make the following three assumptions:
The first assumption is so important I’ve colored it red and will name The Credence Assumption in order to refer to it later. The Credence Assumption states that every belief an agent has is to be assigned a real number, and Cox’s theorem states that if you’re willing to make that assumption (plus assumptions 2 and 3), then the beliefs of a “rational agent” will obey the laws of probability (where “rational agent” is defined circularly to mean “any agent abiding by Cox’s assumptions”.)
Note that the Credence Assumption is far from settled science. Others have noticed problems with it as well, as Horn highlights:
We must mention, however, that [the Credence Assumption] is not without controversy. In fact, it is arguably the most fundamental distinction between the Bayesian and other approaches to plausible reasoning.
By representing plausibilities with a single real number we implicitly assume that the plausibilities of any two propositions are comparable. That is, given any two propositions A and B and a state of information X, either A and B are equally plausible, or one is more plausible than the other. Some consider this assumption of universal comparability unwarranted, and have explored weaker assumptions. Fine [13] gives one summary of approaches that do away with universal comparability, whereas Jaynes [8, Appendix A] argues for universal comparability on pragmatic grounds.
The most common objection to universal comparability is a more fundamental objection to representing one’s degree of certainty or belief in a proposition with a single value. (emphasis mine)
- Kevin S. Van Horn, Constructing a Logic of Plausible Inference: A Guide to Cox’s Theorem
The reason why everything written about the supposed “Laws of Rationality” is a myth is because there is no law of nature requiring you to make the Credence Assumption.
I choose not to assume it. Have I suddenly turned in my rationality card? Am I suddenly unable to make decisions? Am I any less rational for recognizing this assumption leads inextricably to paradoxes and irrationality, and therefore wanting to try something else? What kind of definition of rationality would force someone to make such an irrational assumption?
Every single piece of writing about Core tenets or The Rules telling people how they must think, should have a giant red “…provided you’re willing to grant the Credence Assumption!” bumpersticker slapped at the bottom. All of Yudkowsky’s authoritative commandments above, plus everything ever written about Dutch Books and accuracy-domination, disappear once you stop granting that assumption.
This is how we know the “laws of rationality” aren’t actually laws - because they disappear once we stop believing in them. Real laws of nature are prohibitions - they exclude certain physical transformations from taking place (like traveling faster than the speed of light), and importantly, don’t disappear when one changes their assumptions.
Much of your argument seems to turn on rejecting the ways that longtermists typically think about uncertainty. But without an explicit alternative, you make it too easy for yourself to do the thing you decry: make stuff up (in this case, vague suggestions about how people should deal with uncertainty) that lets you endorse whatever conclusions you want to endorse.
- friend, private communication.
So as we have seen, there is nothing in the laws of nature forcing us to assign numbers called “credences” to our beliefs. Some acknowledge this, but claim that while it may not be perfect, it’s the best we’ve got. There is just so much uncertainty out there, we have to deal with it somehow. Despite its many imperfections, what explicit alternative is there?
Here are some alternatives.
Multi-valued representations of uncertainty: Instead of making The Credence Assumption, Glenn Shafer proposed a two-dimensional representation of uncertainty. This is motivated by the consideration that because the probability calculus requires \(P(A) = 1 - P(\neg A)\), one cannot become less certain about a proposition without becoming more certain about its negation.
So for example, let \(A\) be some far-away dystopian nightmare scenario, like extermination of humanity by AI. A modest and humble longtermist might wish to express their lack of confidence here by using a tiny credence value - say \(0.00001\%\). But then, is this same humble longtermist willing to confidently state they are \(99.99999\%\) certain humanity won’t be exterminated by AI? Like Newton’s third law, the probability calculus requires every unit of uncertainty \(p\) have a (non) equal and opposite amount of certainty \(1 - p\). Some people find such overconfident-sounding probability assignments objectionable.
A two-valued representation of uncertainty, like the Dempster-Shafer theory, lets one express uncertainty about both \(A\) and \(\neg A\), where the latter is defined as doubt, (i.e the degree of belief in \(\neg A\)). Had a young Yudkowsky picked up Shafer’s A Mathematical Theory of Evidence instead of E.T. Jaynes, everyone on Less Wrong would be representing their uncertainty using credence tuples - “I am \((0.0001 \bigotimes 0.01)(A)\) certain about \(A\)”.
Alternative Logics: Mark Colyvan points out that Cox’s Thereom’s implicitly assume Aristotelian two-valued logic, where a proposition is either \(A\) or \(\neg A\). But this fails to represent an important domain of uncertainty - non-epistemic uncertainty (I refer the reader to section 2 of The Philosophical Significance of Cox’s Theorem for an entertaining discussion on non-epistemic uncertainty). In Section 3 Colyvan highlights the problem with this implicit assumption:
The assumption of classical logic is particularly troublesome if Cox’s theorem is to be wielded as a weapon against non-classical systems of belief representation. And, I should add, that some commentators do put Cox’s theorem to such a purpose.
For example, Lindley [36] draws the following conclusion (from a similar theorem): ‘‘The message is essentially that only probabilistic descriptions of uncertainty are reasonable’’ (p. 1) and Jaynes [31] suggests that ‘‘the mathematical rules of probability theory […] are […] the unique consistent rules for conducting inference (i.e., plausible reasoning) of any kind (p. xxii).’
But Cox’s result is simply a representation theorem demonstrating that if belief has the structure assumed for the proof of the theorem, classic probability theory is a legitimate calculus for representing degrees of belief. But as it stands it certainly does not legitimate * only * classical probability theory as a means of representing belief, nor does it prove that such a representation is adequate for all domains. (emphasis added)
- Mark Colyvan, Making Ado Without Expectations
Alternative logics one might use to construct a belief theory include fuzzy logics, three-valued logic, and quantum logic, although none of these have been developed into full-fledged belief theories. The point is only that we shouldn’t complacently assume Bayes theorem is the only way to represent belief and uncertainty.
Making Do Without Expectations: Some people have tried to rehabilitate expected utility theory from the plague of expectation gaps. In Making Ado Without Expectations, Mark Colyvan described two extensions of decision theory which attempt to get around the undefined expectations problem. His own, Relative Expectation Theory (RET), which can be thought of as computing the opportunity cost of taking action \(A\) over \(B\), and UBC professor Paul Bartha’s Relative Utility Theory (RUT), which operates directly on an agent’s preferences rather than utility function.
Unfortunately, neither RET or RUT overcome the problem of expectation gaps, because both theories are based on adding products of real numbers, and therefore inherit all of the problems ordinary arithmatic encounters when dealing with undefined terms. Colyvan again:
Expected utilities are sums of products of utilities and probabilities. If the utilities and probabilities are well-defined real numbers, ordinary real-number arithmetic applies. However, if any of these conditions fail, we have a recipe for problems for expected utility theory, in which utilities are numerical quantities.
- Mark Colyvan, Making Ado Without Expectations
In Section 6 of Making Ado, Colyvan provides a list of six additional expectation gaps that any decision theory based on adding products of real numbers would have to overcome, one of which being the unmeasurable-set expectation gap highlighted in my previous piece.
~
I wear two hats when I think about decision theory. When I’m wearing my “machine learning researcher” hat, I think about decision theory as simply a tool (as opposed to The Rule). When viewed as a tool, none of these paradoxes arise because I can ignore The Pasadena game on the grounds that it doesn’t help me achieve my goal.
But when I’m wearing my “human being” hat, I only use decision theory in certain rare circumstances, like when I’m playing poker. And here I’m using decision theory mainly as a loose metaphor, thinking something like “this is risky, but I’m going to do it anyway, because I could win a lot”.
For important decisions, beyond simple ones made while playing games of chance, I use tools I’ve learned from critical rationalism, which are things like talking to people, brainstorming, and getting advice. This is how I make decisions in real life. Paradoxes only arise when one is seeking The Rule, a single infallible formula they can ask questions to, that will always tell them what to do, that they can trust completely, that will never lead them to error, if only wielded correctly. I view the plague of paradoxes that await anyone searching for such a rule as evidence no rule exists.
So far I’ve argued against the absolutist-subjectivist view of probability on grounds that it doesn’t work in practice, that it leads to more paradoxes than it solves, and that it isn’t a fundamental law of rationality, because alternatives exist.
I think the situation is worse than this, though, in that even if it was obtainable, a perfect decision rule wouldn’t be desirable, because it would mean the end of critical discussion. To drive this point home, I invite those seeking a rule that tells you how to make “correct decisions” to consider the following thought experiment.
~
Let’s imagine you get more than you’ve ever wished for, and that tomorrow morning, an algorithm is discovered that makes perfectly correct decisions. Let’s also assume that someone has fixed the intractability problem and that the algorithm is able to be run on a machine that not only produce correct decisions, but produces them near-instantaneously. And let’s also imagine this machine contains a measuring device that painlessly measures both your current mental state and the exact amount of evidence you are receiving through your senses at any given moment, and is able to divine until the end of time every consequences of a given action and its corresponding utility value.
So now you have a device that will tell you exactly what to do, at every moment in time. And while you may not know what it’s going to say next, there’s one thing you know with absolute certainty - you won’t need to think critically about anything it says, because the machine is perfect. Combined, its omniscience is infinitely superior to our poorly evolved mammalian ape-brains, and if our intuitions ever clash with anything it tells us to do, well, so much for our intuitions.
Now, such a machine would likely have to be instantiated in a super-intelligent computer of some sort in order to store and process the large amount of incoming sensory data. For simplicity, let’s say the Super Intelligence tells you what to do at ten second intervals, via a surgically implanted auditory interface embedded under the skin behind your ear. You have now made yourself something that gives you orders, makes decisions on your behalf, and which you are bound to be obedient to because of it’s perfection.
So the question is: Would you always do what you’re told?
~
To spell out my intentions with the thought experiment a little more directly - my goal here is to try to imagine a scenario which I suspect most people would find intuitively repulsive, but then wonder how they got there, because each step was a step towards their stated goals. I wanted to force a choice between having to obey an oracular decision maker, and critical thinking skills, because there is a fundamental conflict between the idea of criticism on one hand, and the idea of perfection on the other. How could one possibly be critical of that which is perfect?
In other words, my thesis is this: Even if it was obtainable, an oracular decision rule wouldn’t be desireable, because such an infallible fount of perfect decisions would disable critical thinking, and this is the heart of authoritarianism. One could falsify this theory by finding an authoritarian regime not built around the idea of perfection (for example, perfect socialist/aryan/religious utopias).
A friend to whom I put this question answered by saying
I highly doubt someone would just say “Oh nice, tell me what to do!” I think they would maintain their critical faculties in practice.
which is of course the answer I had hoped for. Of course its critical thinking skills that are important - not blindly asking for perfect instructions from an authority figure. So to a hypothetical questioner who says
What are we supposed to do without decision theory? How else do you expect us to make decisions?
the answer is:
By thinking critically, like you do in practice.
Yudkowsky says that rationality should be math, and that math is as inescapably governing as physics. But as we have seen, all decision theories based on summing products of real numbers will suffer from expectation gaps. Also, there is no requirement enshrined into the laws of nature that a theory of human decision making must be mathematical - this is the fallacy of reductionism, which says that theories at lower strata of reality (like math or physics) are somehow superior to theories at higher strata (like biology or economics).
As far as I know, the only way out of the ocean of paradox described above is to abandon the search for a perfect decision rule, and adopt an evolutionary approach instead. Memetic evolution, more specifically. Decisions are just ideas at the end of the day, and rather than thinking of them as a weighted sum of products of other ideas, we can think of them as the organic products of evolutionary pressures.
And so to return to my friend’s concern, that
But without such a rule, you make it too easy for yourself to do the thing you decry: make stuff up (in this case, vague suggestions about how people should deal with uncertainty) that lets you endorse whatever conclusions you want to endorse.
Yes, it’s completely true that decisions made according to this framework are made up out of thin air (so it is with all theories) - we can view this as roughly analogous to the mutation stage in Darwinian evolution. Then, once generated, we subject the ideas to as much criticism from ourselves and others as possible - we can view this as roughly analogous to the selection stage. To paraphrase David Deutsch, it’s a mistake to conceive of decision-making as simply a process of selecting from existing options according to a fixed formula, because this omits the most important element of decision-making - the creation of new options. Explaining this in greater detail is the focus of the 13th chapter in Beginning of Infinity, titled “Choices”.
Regarding the “endorsing whatever conclusion one wants” concern - I’ve discussed this on the blog here before (seems I can’t get away from the subject), but this is why rational criticism is central to the project, and indeed why it’s called “critical rationalism” in the first place. It’s criticism that prevents me from endorsing whatever conclusions I want to endorse. The reason why things like conversation and debate are considered paramount in critical rationalism is because this is how you can improve your ideas - by expressing them to others, in order for others to spot flaws in your reasoning.
Criticism is also non-arbitrary. This you can prove to yourself by announcing to your family you’ve decided to take up recreational drinking and driving. They are (hopefully) going to criticize this decision, and steer you, in a non-arbitrary way, towards better decisions.
Criticism is also an intrinsically creative enterprise, because there are no authoritarian dictates governing human thought - only the recognition that the products of the human mind will contain some admixture of truth and error, and therefore one must always be thinking critically about what one hears. The search for a perfect decision rule to govern human action is the search for authoritarianism, and I think it should be given up for this reason.
But does it follow that regarding climate change we should throw our hands up in the air and do nothing because it’s “impossible to predict the future”? Or that climate change policy faces some deep technical challenge?
Why can’t we be longtermists while being content to “predict that certain developments will take place under certain conditions”?
In the next piece I’ll discuss knowledge, long and short-term prediction, “complex cluelessness”, and the role of data in science.