I can’t count how many times I have looked up the formal (measure theoretic) definitions of conditional probability distribution or conditional expectation (even though it’s not that hard :weary:) Another such occasion was yesterday. This time I took some notes.

From conditional probability → to conditional distribution → to conditional expectation

Let and be two real-valued random variables.

Conditional probability

For a fixed set (Feller, 1966, p. 157) defines conditional probability of an event for given as follows.

By (in words, “a conditional probability of the event for given ”) is meant a function such that for every set

where is the marginal distribution of .

(where and are both Borel sets on .)

That is, the conditional probability can be defined as something that, when integrated with respect to the marginal distribution of , results in the joint probability of and .

Moreover, note that if then the above formula yields , the marginal probability of the event .


For example, if the joint distribution of two random variables and is the following bivariate normal distribution

then by sitting down with a pen and paper for some amount of time, it is not hard to verify that the function

in this case satisfies the above definition of .

Conditional distribution

Later on (Feller, 1966, p. 159) follows up with the notion of conditional probability distribution:

By a conditional probability distribution of for given is meant a function of two variables, a point and a set , such that

  1. for a fixed set

    is a conditional probability of the event for given .

  2. is for each a probability distribution.

It is also pointed out that

In effect a conditional probability distribution is a family of ordinary probability distributions and so the whole theory carries over without change.

(Feller, 1966)

When I first came across this viewpoint, I found it incredibly enlightening to regard the conditional probability distribution as a family of ordinary probability distributions. :smile:


For example, assume that is an integer-valued and non-negative random variable, and that the conditional probability distribution of for given is an F-distribution (denoted ) with and degrees of freedom. Then the conditional probability distribution of can be regarded as a family of probability distributions for , whose probability density functions look like this:

Probability density functions of (Y|X=x) for different values x

In addition, as pointed out above, if we know the marginal distribution of , then the conditional probability distribution of can be used to obtain the marginal probability distribution of , or to randomly sample from the marginal distribution. Practically it means that if we randomly generate a value of according to its probability distribution, and use this value to randomly generate a value of according to the conditional distribution of for the given , then the observations resulting from this procedure follow the marginal distribution of . Continuing the previous example, assume that follows a binomial distribution with parameters and . Then the described simulation procedure estimates the following shape for the probability density function of , the marginal distribution of :

Probability density function of Y

Conditional expectation

Finally, (Feller, 1966, p. 159) introduces the notion of conditional expectation. By the above, for given a value we have that

(here denotes the Borel -algebra on ), and therefore, a conditional probability distribution can be viewed as a family of ordinary probability distributions (represented by for different s). Thus, as (Feller, 1966, p. 159) points out, if is given then the conditional expectation “introduces a new notation rather than a new concept.”

A conditional expectation is a function of assuming at the value

provided the integral converges.

Note that, because is a function of , it is a random variable, whose value at an individual point is given by the above definition. Moreover, from the above definitions of conditional probability and conditional expectation it follows that

Example [cont.]

We continue with the last example. From the properties of the F-distribution we know that under this example’s assumptions on the conditional distribution, it holds that

A rather boring strictly decreasing function of converging to as .

Thus, under the example’s assumption on the distribution of , the conditional expectation is a discrete random variable, which has non-zero probability mass at the values and .

From conditional expectation → to conditional probability

An alternative approach is to define the conditional expectation first, and then to define conditional probability as the conditional expectation of the indicator function. This approach seems less intuitive to me. However, it is more flexible and more general, as we see below.

Conditional expectation

A definition in 2D

Let and be two real-valued random variables, and let denote the Borel -algebra on . Recall that and can be represented as mappings and over some measure space . We can define , the conditional expectation of given , as follows.

A -measurable function is the conditional expectation of for given , i.e.,

if for all sets it holds that

where is the marginal probability distribution of .

Interpretation in 2D

If and are real-valued one-dimensional, then the pair can be viewed as a random vector in the plane. Each set consists of parallels to the -axis, and we can define a -algebra induced by as the collection of all sets on the plane, where is a Borel set on the line. The collection of all such sets forms a -algebra on the plane, which is contained in the -algebra of all Borel sets in . is called the -algebra generated by the random variable .

Then can be equivalently defined as a random variable such that

where denotes the indicator function of the set .

A more general definition of conditional expectation

The last paragraph illustrates that one could generalize the definition of the conditional expectation of given to the conditional expectation of given an arbitrary -algebra (not necessarily the -algebra generated by ). This leads to the following general definition, which is stated in (Feller, 1966, pp. 160-161) in a slightly different notation.

Let be a random variable, and let be a -algebra of sets.

  1. A random variable is called a conditional expectation of relative to , or , if it is -measurable and

  2. If is the -algebra generated by a random variable , then .

Back to conditional probability and conditional distributions

Let be a random variable that is equal to one if and only if . The conditional probability of given can be defined in terms of a conditional expectation as

Under certain regularity conditions the above defines the conditional probability distribution of .


  1. Feller, W. (1966). An introduction to probability theory and its applications (Vol. 2). John Wiley & Sons.