Derivation of Probability

2017-01-08

Explain probability derived by Cox's theorem
Under three assumptions, a system describing the degree of belief is identical to the laws of probability.

There are many approach to handling uncertainty. Probability is one such method that has been extremely successful. There are at least three different ways of inventing probability. Here, we'll visit how Cox devised it.

Let's define a concept of plausibility or belief. Our goal is to take a concept that people will take for granted and dress it up with mathematical attire. If you think that human's degree of belief cannot stand above a rational foundation, then think of it as an exercise for creating an artificial intelligence system. We intend do come up with a system of belief that will lead to rational agent under uncertainty.

Let $B(x)$ denote the degree of belief of a statement $x$. This describes the plausibility of the statement $x$. As is customary in proposition logic, we'll denote the negation of $x$ as $\bar{x}$. Finally, we'll use $B(x|y)$ to denote 'the truthiness of $x$, given that $y$ is true'.

Now, the three axioms:

The degree of belief can be represented by a real number.
The degree of belief $B(x)$ of the statement $x$ and its negation $B(\bar{x})$ can be described by a monotonically decreasing function $f$, i.e., there exists a monotonically decreasing function $f$ where $B(x) = f[B(\bar{x})]$.
The degree of belief of the statements $x, y$ (which represents $x \land y$), is affected by the degree of beliefs of $x|$ and $y$. To be more specific, there exists a continuous strictly decreasing function $g$ that satisfies $B(x, y) = g(B(x|y), B(y))$.

The first axiom seems innocuous in that we use real numbers to measure things all the time. This axiom implies that two degree of beliefs can be compared to one another. We'll assume that more plausible statement will have a higher value for the degree of belief.

Now, let's look at the second axiom. The statement $x$ and $\bar{x}$ basically asks the same question in a different way. If the statement $x$ was true, $\bar{x}$ automatically becomes false and extending this seems natural. If we allow $B(x|y)$ and $B(\bar{x}|y)$ to be independent in some sense, we need two values to describe the degree of belief of a statement $x$. This also invalidates the first axiom. The part where we require $f$ to be monotonically decreasing implies that if, for some reason, we think that $x$ being true turns out to be more plausible than before, then $x$ being false shouldn't also be more plausible.

The last axiom is a bit more tricky. By enumerating all the cases where the function $B(\cdot)$ can be decomposed into different functions of $x$ and $y$, we can check whether the proposed axiom makes sense.

마지막 세 번째 공리에서 왜 $B(x, y)$가 $B(x|y)$와 $B(y)$의 함수이어야 하는지에 대해 살펴보자. $x$와 $y$로 이루어진 다양한 경우를 모두 살펴보고 이게 말이 되는지 생각해보면 된다. $B(x, y)$를 구성하는 요소로 가능한 항은 $B(x), B(y), B(x|y), B(y|x)$이다. 개별 항을 사용해서 만들 수 있는 함수 $g$의 개수는 총 열다섯 가지이다. $B(x, y)$와 $B(y, x)$의 대칭성에 의해 이는 총 아홉 개로 줄어든다. 각각에 모순이 존재하는지를 통해 이를 하나씩 제거하면 최종적으로 다음의 네 가지가 남는다.

$B(x, y) = g(B(x|y), B(y))$
$B(x, y) = g(B(x|y), B(y), B(x))$
$B(x, y) = g(B(x|y), B(y), B(y|x))$
$B(x, y) = g(B(x|y), B(y), B(y|x), B(x))$

이 중 우리는 처음 것을 선택하기로 한다. 이는 첫째로 직관적으로 그럴싸한 선택이기 때문이고 둘째로 나머지는 첫 번째 선택의 확장형이며 추가적인 정보 없이 이를 굳이 선택할 이유가 없기 때문이다.

직관적으로 $x \land y$가 참이기 위해 $y$가 참이어야 한다. 그리고 $y$가 참임을 가정했을 때 $x$가 참임이 성립해야 한다. 그러므로 $B(x, y)$를 계산하기 위한 함수에 $B(y)$와 $B(x|y)$가 포함되어야 한다. 반면 만약 $B(y)$가 거짓이라면 $B(x, y)$도 거짓이어야 하며 이는 $x$에 대해 알고 있는 정보와 무관하게 결정된다. 그러므로 $B(y)$와 $B(x|y)$를 알고 있다면 굳이 $B(x)$를 추가로 알 필요는 없는 것으로 보인다.

다른 추가적인 요소를 살펴보자. $g$가 강한 단조증가함수임은 어떤 정보에 의해 $x$에 대한 믿음이 증가하고 다른 모든 것은 그대로라면 $x, y$에 대한 믿음도 역시 증가해야 함을 의미한다. $g$가 연속함수임을 가정하는 것은, $x$가 아주 조금 더 그럴싸해졌다면 $x, y$ 역시 아주 조금만 더 그럴싸해져야 함을 뜻한다.

위의 세 가지 가정을 받아들이는 믿음의 정도를 측정하는 체계는 sum rule과 product rule을 따르게 된다. 즉, 이 체계는 확률 체계로 전사onto 사상된다. 이 체계에서 $P(F) = 0, P(T) = 1, 0 \le P(x) \le 1$이고, $P(x) = 1 - P(\bar{x})$이며 $P(x, y) = P(x|y)P(y)$이다.

이에 대한 증명은 매우 지루하므로 받아들이기로 하자.

이렇게 만들어진 확률 체계는 우리가 흔히 접하는 확률 체계와는 조금 다르다. 일단 우리가 만든 확률 체계에서는 사건event이라는 개념 대신 명제proposition가 사용되었다. 확률 체계를 구성하는 방식이 믿음의 정도를 표현하는 수학적 체계를 만든 것이기 때문에 빈도에만 확률을 부여해야 한다는 제약도 없다. 새 확률 체계에서 명제의 부정negation과 명제의 논리곱conjuction이 나타내는 믿음의 정도가 만들어졌기 때문에 기존에 익히 알려진 논리 체계의 확장이라고 생각할 수 있다(논리 체계를 구축하는 데에 부정과 논리곱만 있으면 된다). 즉, 이렇게 만들어진 확률 체계는 베이지안Bayesian 확률론이 임의의 명제 혹은 정보의 상태에 확률을 부여하는 것에 정당성을 가져다준다.

한 가지 염두에 두어야 하는 점으로 우리가 만들어낸 체계가 주어진 가정하에서 유일하게 만들어졌다는 점이다. 위에서 언급한 공리 위에서 만들어질 수 있는 추론 체계는 확률 이론과 동형사상isomorphic mapping 관계에 있다. 즉, 퍼지 이론이든 뉴럴넷이든 베이지안 확률 체계를 따르거나 위의 공리를 위반할 수밖에 없다는 것이다. 물론 베이지안 확률 체계를 따르지 않는 기법이라 할지라도 특정 상황에서 여전히 유용할 가능성은 있다.

Reference

위의 논리 전개는 구멍이 꽤 있으며 그렇지 않더라도 공리의 정당성에 대한 의문은 꾸준히 제기되었다. 다만 현재로써는 최종적인 결론인 베이지안 기법의 보편성을 의심할 좋은 이유가 있어 보이지는 않는다.

Cox의 원논문은 Probability, Frequency and Reasonable Expectation (1946)이다. 위에서 유도한 확률 체계의 배경과 증명을 살펴보고 싶다면 E.T. Jaynes의 Probability Theory: The Logic of Science (2003)의 1, 2장을 참고하기 바란다. 제시된 공리 외에 숨어있는 암묵적인 가정 및 이에 대한 논의와 증명은 Kevin S. Van Horn의 Constructing a Logic of Plausible Inference: A Guide to Cox's Theorem에 잘 나와 있다.

또 다른 확률 체계로 Kolmogorov가 만들어낸 측도measure 이론 기반의 확률 체계가 있다. 이에 대한 소개는 Jacod와 Protter의 Probability Essentials를 추천한다.

추론 체계가 확률 체계를 따라야 하는 또 하나의 당위성으로 de Finetti가 제안한 dutch book 논증이 있다. 이 논증에 따르면 도박에서 개인을 상대로 반드시 돈을 딸 수 있는 dutch book이 존재하지 않으려면 개인의 믿음이 coherent해야만 한다. 이 coherence에 대한 필요충분조건이 개인의 믿음의 체계가 확률 체계를 따르는 것이다.