The paradoxes and problems with Cox's Theorem (Part 3/4)
And, in any case, there are arguments for the claim that we must assign probabilities to hypotheses like âThe die lands on 1â and âThere will exist at least \(10^{16}\) people in the future.â If we donât assign probabilities, we are vulnerable to getting Dutch-booked and accuracy-dominated âŚ
⌠So, to summarize the above, we have to assign probabilities to empirical hypotheses, on pain of getting Dutch-booked and accuracy-dominated. And all reasonable-seeming probability assignments imply that we should pursue longtermist interventions. (last italics mine)
Anyone who questions the idea of assigning âcredencesâ to beliefs will quickly learn about Dutch books. Consider an agent who assigns probability \(0.9\) to the belief Jon Hamm will enslave us in a white egg-shaped gadget, and probability \(0.9\) to the belief Jon Hamm will not enslave us in a white egg-shaped gadget. The dutch book argument says this is irrational, because such a person would gladly bet \(\$0.90\) to win \(\$1.00\) if Jon Hamm enslaves us in a white egg-shaped gadget, and simultaneously bet \(\$0.90\) that this wonât happen. Thus the agent is going to lose money because having paid a total of \(\$1.80\), theyâre guaranteed to make only \(\$ 1.00\) in return.
This is a Dutch book - a set of bets in which financial losses/gains are guaranteed. In order to be considered rational according to the Dutch Book argument, having assigned a credence of \(0.9\) to the belief Jon Hamm will enslave us in a white egg-shaped gadget, an agent must assign a credence of \(0.1\) to the belief Jon Hamm will not enslave us in a white egg-shaped gadget . More generally, the belief in \(p\) and \(\neg p\) must sum to one, and obey all the other rules of the probability calculus as well. (Accuracy-domination arguments are the same as Dutch books because every accuracy-dominating point corresponds to a dutch book. See the formal proof here.)
In my previous post, I argued against bayesian absolutism in favor of the instrumentalist perspective, which views probability theory as a useful tool in certain circumstances. This is the perspective held by most bayesian statisticians, and is best summarized by George Box in the aphorism âAll models are wrong, but some are usefulâ.
The instrumentalist view is a threat to the idea of âcredencesâ, and so is often met with appeals to Dutch book-style arguments, which all say the probability calculus is a tool we must use, if we are to be rational. A Law of Rationality in fact - a set of rules by which our thinking must abide. For example:
Old School statisticians thought in terms of tools, tricks to throw at particular problems. Bayesians - at least this Bayesian, though I donât think Iâm speaking only for myself - we think in terms of laws ⌠Bayesian theorems are elegant, coherent, optimal, and provably unique because they are laws. (emphasis in original)
The laws of probability are laws, not suggestions, but often the true Law is too difficult for us humans to compute. (emphasis via hyperlink in original)
Core tenet 3: We can use the concept of probability to measure our subjective belief in something. Furthermore, we can apply the mathematical laws regarding probability to choosing between different beliefs. If we want our beliefs to be correct, we must do so. (emphasis via hyperlink in original)
(Kaj Sotala, linking to Eliezer Yudkowsky)
To Bayesians, the brain is an engine of accuracy: it processes and concentrates entangled evidence into a map that reflects the territory. The principles of rationality are laws in the same sense as the Second Law of ThermodynamicsâŚ
âŚRemember this, when you plead to be excused just this once. We canât excuse you. It isnât up to us. (emphasis mine)
Against this position I will make four arguments:
And Iâll end on an optimistic note, by offering an alternative framework for decision making. If you want to skip ahead, hereâs a little table of contents.
(Note that none of these arguments âprove too muchâ because they only attack the subjectivist view of probability, but leave the instrumentalist view unharmed.)
A primary justification for the view that âthe laws of probability are laws, not suggestionsâ is a desire to avoid paradox. Consider this quote from Eliezer Yudkowsky in My Bayesian Enlightenment, where he explains why one must strictly adhere to âThe Rulesâ of rationality:
Here was probability theory, laid out not as a clever tool, but as The Rules, inviolable on pain of paradox. If you tried to approximate The Rules because they were too computationally expensive to use directly, then, no matter how necessary that compromise might be, you would still end doing less than optimal. ⌠Paradoxes could not coexist with [E. T. Jaynesâs] precision. Not an answer, but the answer. (emphasis in original)
But as we have seen with The Pasadena Game, paradoxes lurk inside the framework also. For example, contrast the previous quotation from Yudkowsky with the following from Will MacAskill:
Robert Wiblin: It sounds like, if you take the expected value approach, then things might be quite straightforward. You give different probabilities to different moral theories. You look at how good or bad different actions are given those moral theories. Is there even that much more to say about this? It seems kind of easy.
Will MacAskill: If only that were true. I think thereâs not just one bookâs worth of content in terms of addressing some of the problems that this raises. I think itâs probably many volumes. First, thereâs the question how do you even assign credences to different moral views. Thatâs already an incredibly difficult question. Itâs not the question I address because thatâs just the question of moral philosophy, really.
Secondly, supposing you do have a set of credences across different moral views, how do you act? ⌠A third problem is the fanaticism problem, which is the worry that perhaps under moral uncertainty, what you ought to do is determined by really small credences in infinite amounts of value or huge amounts of value.
- Our descendants will probably see us as moral monsters. What should we do about that?
So from Yudkowsky we learn that we cannot violate the laws without running into paradox, only to learn from MacAskill that using the very same laws results in even more paradox! So many, in fact, they could fill a multi-volume book. Hence, paradoxes lurking outside bayesian epistemology are the reason one can never leave it, but paradoxes lurking inside are exciting research opportunities.
Paradoxes are also euphemistically referred to as âproblemsâ - which implies they have a solution, and should therefore be worked on - later on in the interview:
Robert Wiblin: Is this like the 2 envelope paradox? Itâs really reminding me of that.
Will MacAskill: You do get 2 envelope style problems under moral uncertainty âŚ
Robert Wiblin: If you havenât heard of the 2 envelope paradox itâs a real treat. Iâll stick up a link to that, and you should definitely check it out. Itâs one of my favorite paradoxes._
So not only is the âparadoxâ reworded to âproblemâ, but we learn this in fact represents an entire style of âproblemâ with the field of moral uncertainty.
Other paradoxes within bayesian epistemology include the Necktie paradox, the St. Petersburg paradox, Newcombâs paradox, Ellsberg Paradox, Pascalâs Mugging, Bertrandâs paradox, The Mere addition paradox (aka âThe Repugnant Conclusionâ), The Allais Paradox, The Boy or Girl Paradox, The Paradox Of The Absent Minded Driver (aka the âSleeping Beauty problemâ), and Siegelâs paradox.
By way of contrast, Popper recognized that when a question you were asking lead to paradox, that meant you were asking the wrong question. This lead to the fruitful methodological approach of replacing bad questions (i.e ones that lead to paradox) with better ones (i.e ones that donât). Two examples of this approach are replacing âHow do we prove induction?â with âHow do we detect mistakes?â, and replacing âHow do we have a perfectly representative democracy?â with âHow do we remove bad rulers without violence?â.
One might also imagine replacing âHow do we make perfect decisions?â with âHow do we design systems that are resilient to bad decisions?â.
But letâs return to the subject at hand - where is the flaw in the Dutch book argument? To answer this, we must first discuss Coxâs Theorem, upon which all these claims are premised. For this weâll enlist the assistance of three papers I consider to be essential reading for anyone interested in the subject of uncertainty quantification. They are:
Colyvan states Coxâs theorem, as well as the three assumptions underlying it, as follows:
Theorem 1 (Cox). Any measure of belief is isomorphic to a probability measure.
This holds if you are willing to make the following three assumptions:
The first assumption is so important Iâve colored it red and will name The Credence Assumption in order to refer to it later. The credence assumption states that every belief an agent has is to be assigned a real number, and Coxâs theorem states that if youâre willing to make that assumption (plus assumptions 2 and 3), then the beliefs of a ârational agentâ will obey the laws of probability (where ârational agentâ is defined circularly to mean âany agent abiding by Coxâs assumptionsâ.)
Note that the credence assumption is far from settled science. Others have noticed problems with it as well, as Horn highlights:
We must mention, however, that [the credence assumption] is not without controversy. In fact, it is arguably the most fundamental distinction between the Bayesian and other approaches to plausible reasoning.
By representing plausibilities with a single real number we implicitly assume that the plausibilities of any two propositions are comparable. That is, given any two propositions A and B and a state of information X, either A and B are equally plausible, or one is more plausible than the other. Some consider this assumption of universal comparability unwarranted, and have explored weaker assumptions. Fine [13] gives one summary of approaches that do away with universal comparability, whereas Jaynes [8, Appendix A] argues for universal comparability on pragmatic grounds.
The most common objection to universal comparability is a more fundamental objection to representing oneâs degree of certainty or belief in a proposition with a single value. (emphasis mine)
- Kevin S. Van Horn, Constructing a Logic of Plausible Inference: A Guide to Coxâs Theorem
The reason why everything written about the supposed âLaws of Rationalityâ is a myth is because there is no law of nature requiring you to make the credence assumption.
I choose not to assume it. Have I suddenly turned in my rationality card? Am I suddenly unable to make decisions? Am I any less rational for recognizing this assumption leads inextricably to paradoxes and irrationality, and therefore wanting to try something else? What kind of definition of rationality would force someone to make such an irrational assumption?
Every single piece of writing about Core tenets or The Rules telling people how they must think, should have a giant red ââŚprovided youâre willing to grant the credence assumption!â bumpersticker slapped at the bottom. All of Yudkowskyâs authoritative commandments above, plus everything ever written about Dutch Books and accuracy-domination, disappear once you stop granting that assumption.
This is how we know the âlaws of rationalityâ arenât actually laws - because they disappear once we stop believing in them. Real laws of nature are prohibitions - they exclude certain physical transformations from taking place (like traveling faster than the speed of light), and importantly, donât disappear when one changes their assumptions.
Much of your argument seems to turn on rejecting the ways that longtermists typically think about uncertainty. But without an explicit alternative, you make it too easy for yourself to do the thing you decry: make stuff up (in this case, vague suggestions about how people should deal with uncertainty) that lets you endorse whatever conclusions you want to endorse.
- friend, private communication.
So as we have seen, there is nothing in the laws of nature forcing us to assign numbers called âcredencesâ to our beliefs. Some acknowledge this, but claim that while it may not be perfect, itâs the best weâve got. There is just so much uncertainty out there, we have to deal with it somehow. Despite its many imperfections, what explicit alternative is there?
Here are some alternatives.
Multi-valued representations of uncertainty: Instead of making the credence assumption, Glenn Shafer proposed a two-dimensional representation of uncertainty. This is motivated by the consideration that because the probability calculus requires \(P(A) = 1 - P(\neg A)\), one cannot become less certain about a proposition without becoming more certain about its negation.
So for example, let \(A\) be some far-away dystopian nightmare scenario, like extermination of humanity by AI. A modest and humble longtermist might wish to express their lack of confidence here by using a tiny credence value - say \(0.00001\%\). But then, is this same humble longtermist willing to confidently state they are \(99.99999\%\) certain humanity wonât be exterminated by AI? Like Newtonâs third law, the probability calculus requires every unit of uncertainty \(p\) have a (non) equal and opposite amount of certainty \(1 - p\). Some people find such overconfident-sounding probability assignments objectionable.
A two-valued representation of uncertainty, like the Dempster-Shafer theory, lets one express uncertainty about both \(A\) and \(\neg A\), where the latter is defined as doubt, (i.e the degree of belief in \(\neg A\)). Had a young Yudkowsky picked up Shaferâs A Mathematical Theory of Evidence instead of E.T. Jaynes, everyone on Less Wrong would be representing their uncertainty using credence tuples - âI am \((0.0001 \bigotimes 0.01)(A)\) certain about \(A\)â.
Alternative Logics: Mark Colyvan points out that Coxâs Thereomâs implicitly assume Aristotelian two-valued logic, where a proposition is either \(A\) or \(\neg A\). But this fails to represent an important domain of uncertainty - non-epistemic uncertainty (I refer the reader to section 2 of The Philosophical Significance of Coxâs Theorem for an entertaining discussion on non-epistemic uncertainty). In Section 3 Colyvan highlights the problem with this implicit assumption:
The assumption of classical logic is particularly troublesome if Coxâs theorem is to be wielded as a weapon against non-classical systems of belief representation. And, I should add, that some commentators do put Coxâs theorem to such a purpose.
For example, Lindley [36] draws the following conclusion (from a similar theorem): ââThe message is essentially that only probabilistic descriptions of uncertainty are reasonableââ (p. 1) and Jaynes [31] suggests that ââthe mathematical rules of probability theory [âŚ] are [âŚ] the unique consistent rules for conducting inference (i.e., plausible reasoning) of any kind (p. xxii).â
But Coxâs result is simply a representation theorem demonstrating that if belief has the structure assumed for the proof of the theorem, classic probability theory is a legitimate calculus for representing degrees of belief. But as it stands it certainly does not legitimate * only * classical probability theory as a means of representing belief, nor does it prove that such a representation is adequate for all domains. (emphasis added)
- Mark Colyvan, Making Ado Without Expectations
Alternative logics one might use to construct a belief theory include fuzzy logics, three-valued logic, and quantum logic, although none of these have been developed into full-fledged belief theories. The point is only that we shouldnât complacently assume Bayes theorem is the only way to represent belief and uncertainty.
Making Do Without Expectations: Some people have tried to rehabilitate expected utility theory from the plague of expectation gaps. In Making Ado Without Expectations, Mark Colyvan described two extensions of decision theory which attempt to get around the undefined expectations problem. His own, Relative Expectation Theory (RET), which can be thought of as computing the opportunity cost of taking action \(A\) over \(B\), and UBC professor Paul Barthaâs Relative Utility Theory (RUT), which operates directly on an agentâs preferences rather than utility function.
Unfortunately, neither RET or RUT overcome the problem of expectation gaps, because both theories are based on adding products of real numbers, and therefore inherit all of the problems ordinary arithmatic encounters when dealing with undefined terms. Colyvan again:
Expected utilities are sums of products of utilities and probabilities. If the utilities and probabilities are well-defined real numbers, ordinary real-number arithmetic applies. However, if any of these conditions fail, we have a recipe for problems for expected utility theory, in which utilities are numerical quantities.
- Mark Colyvan, Making Ado Without Expectations
In Section 6 of Making Ado, Colyvan provides a list of six additional expectation gaps that any decision theory based on adding products of real numbers would have to overcome, one of which being the unmeasurable-set expectation gap highlighted in my previous piece.
~
I wear two hats when I think about decision theory. When Iâm wearing my âmachine learning researcherâ hat, I think about decision theory as simply a tool (as opposed to The Rule). When viewed as a tool, none of these paradoxes arise because I can ignore The Pasadena game on the grounds that it doesnât help me achieve my goal.
But when Iâm wearing my âhuman beingâ hat, I only use decision theory in certain rare circumstances, like when Iâm playing poker. And here Iâm using decision theory mainly as a loose metaphor, thinking something like âthis is risky, but Iâm going to do it anyway, because I could win a lotâ.
For important decisions, beyond simple ones made while playing games of chance, I use tools Iâve learned from critical rationalism, which are things like talking to people, brainstorming, and getting advice. This is how I make decisions in real life. Paradoxes only arise when one is seeking The Rule, a single infallible formula they can ask questions to, that will always tell them what to do, that they can trust completely, that will never lead them to error, if only wielded correctly. I view the plague of paradoxes that await anyone searching for such a rule as evidence no rule exists.
So far Iâve argued against the absolutist-subjectivist view of probability on grounds that it doesnât work in practice, that it leads to more paradoxes than it solves, and that it isnât a fundamental law of rationality, because alternatives exist.
I think the situation is worse than this, though, in that even if it was obtainable, a perfect decision rule wouldnât be desirable, because it would mean the end of critical discussion. To drive this point home, I invite those seeking a rule that tells you how to make âcorrect decisionsâ to consider the following thought experiment.
~
Letâs imagine you get more than youâve ever wished for, and that tomorrow morning, an algorithm is discovered that makes perfectly correct decisions. Letâs also assume that someone has fixed the intractability problem and that the algorithm is able to be run on a machine that not only produce correct decisions, but produces them near-instantaneously. And letâs also imagine this machine contains a measuring device that painlessly measures both your current mental state and the exact amount of evidence you are receiving through your senses at any given moment, and is able to divine until the end of time every consequences of a given action and its corresponding utility value.
So now you have a device that will tell you exactly what to do, at every moment in time. And while you may not know what itâs going to say next, thereâs one thing you know with absolute certainty - you wonât need to think critically about anything it says, because the machine is perfect. Combined, its omniscience is infinitely superior to our poorly evolved mammalian ape-brains, and if our intuitions ever clash with anything it tells us to do, well, so much for our intuitions.
Now, such a machine would likely have to be instantiated in a super-intelligent computer of some sort in order to store and process the large amount of incoming sensory data. For simplicity, letâs say the super-intelligence tells you what to do at ten second intervals, via a surgically implanted auditory interface embedded under the skin behind your ear. You have now made yourself something that gives you orders, makes decisions on your behalf, and which you are bound to be obedient to because of itâs perfection.
So the question is: Would you always do what youâre told?
~
To spell out my intentions with the thought experiment a little more directly - my goal here is to try to imagine a scenario which I suspect most people would find intuitively repulsive, but then wonder how they got there, because each step was a step towards their stated goals. I wanted to force a choice between having to obey an oracular decision maker, and critical thinking skills, because there is a fundamental conflict between the idea of criticism on one hand, and the idea of perfection on the other. How could one possibly be critical of that which is perfect?
In other words, my thesis is this: Even if it was obtainable, an oracular decision rule wouldnât be desireable, because such an infallible fount of perfect decisions would disable critical thinking, and this is the heart of authoritarianism. One could falsify this theory by finding an authoritarian regime not built around the idea of perfection (for example, perfect socialist/aryan/religious utopias).
A friend to whom I put this question answered by saying
I highly doubt someone would just say âOh nice, tell me what to do!â I think they would maintain their critical faculties in practice.
which is of course the answer I had hoped for. Of course its critical thinking skills that are important - not blindly asking for perfect instructions from an authority figure. So to a hypothetical questioner who says âWhat are we supposed to do without decision theory? How else do you expect us to make decisions?â the answer is: âBy thinking critically, like you do in practice.â
Yudkowsky says that rationality should be math, and that math is as inescapably governing as physics. But as we have seen, all decision theories based on summing products of real numbers will suffer from expectation gaps. Also, there is no requirement enshrined into the laws of nature that a theory of human decision making must be mathematical - this is the fallacy of reductionism, which says that theories at lower strata of reality (like math or physics) are somehow superior to theories at higher strata (like biology or economics).
As far as I know, the only way out of the ocean of paradox described above is to abandon the search for a perfect decision rule, and adopt an evolutionary approach instead. Memetic evolution, more specifically. Decisions are just ideas at the end of the day, and rather than thinking of them as a weighted sum of products of other ideas, we can think of them as the organic products of evolutionary pressures.
And so to return to my friendâs concern, that
But without such a rule, you make it too easy for yourself to do the thing you decry: make stuff up (in this case, vague suggestions about how people should deal with uncertainty) that lets you endorse whatever conclusions you want to endorse.
Yes, itâs completely true that decisions made according to this framework are made up out of thin air (so it is with all theories) - we can view this as roughly analogous to the mutation stage in Darwinian evolution. Then, once generated, we subject the ideas to as much criticism from ourselves and others as possible - we can view this as roughly analogous to the selection stage. To paraphrase David Deutsch, itâs a mistake to conceive of decision-making as simply a process of selecting from existing options according to a fixed formula, because this omits the most important element of decision-making - the creation of new options. Explaining this in greater detail is the focus of the 13th chapter in Beginning of Infinity, titled âChoicesâ.
Regarding the âendorsing whatever conclusion one wantsâ concern - Iâve discussed this on the blog here before (seems I canât get away from the subject), but this is why rational criticism is central to the project, and indeed why itâs called âcritical rationalismâ in the first place. Itâs criticism that prevents me from endorsing whatever conclusions I want to endorse. The reason why things like conversation and debate are considered paramount in critical rationalism is because this is how you can improve your ideas - by expressing them to others, in order for others to spot flaws in your reasoning.
Criticism is also non-arbitrary. This you can prove to yourself by announcing to your family youâve decided to take up recreational drinking and driving. They are (hopefully) going to criticize this decision, and steer you, in a non-arbitrary way, towards better decisions.
Criticism is also an intrinsically creative enterprise, because there are no authoritarian dictates governing human thought - only the recognition that the products of the human mind will contain some admixture of truth and error, and therefore one must always be thinking critically about what one hears. The search for a perfect decision rule to govern human action is the search for authoritarianism, and I think it should be given up for this reason.
But does it follow that regarding climate change we should throw our hands up in the air and do nothing because itâs âimpossible to predict the futureâ? Or that climate change policy faces some deep technical challenge?
Why canât we be longtermists while being content to âpredict that certain developments will take place under certain conditionsâ?
In the next piece Iâll discuss knowledge, long and short-term prediction, âcomplex cluelessnessâ, and the role of data in science.