Conditional expectation of continuous random variables

Conditioning on events of positive probability

Assume that \(X\) is a uniform random variable on \([0,1]\). We want to calculate the expected value of \(X\) given that \(X\) is bigger than \(\frac12\). Intuitively the answer is \(\frac34\) because if \(X > \frac12\) and \(X\) is uniform, then \(X\) should be uniform on \(\left[\frac12,1\right]\).

To make this more precise we should identify one important event in this example. Let \(A\) be the event that the random variable \(X\) is bigger than \(\frac12\). Then we want to calculate \(\mathbb E\left[\left.X\right|A\right]\). We can now define the conditional cumulative distribution function of \(X\) given the event \(A\) in the following way \[F_{X|A}(t)=\mathbb P\left(\left.X\leq t\right|A\right)=\frac{\mathbb P\left(\left\{X\leq t\right\}\cap A\right)}{\mathbb P\left(A\right)}\] We can now use that \(A=\left\{X\geq \frac12\right\}\). Let us first consider the case \(t < \frac12\). Then \(A\cap \{X\leq t\}=\emptyset\), hence \(F_{X|A}(t)=0\) for \(t < \frac12\). Similarly, if \(x > 1\), then \(\{X\leq t\}\cap A=A\) and \(F_{X|A}(t)=1\). Assume now that \(t\in\left[\frac12,1\right]\). We rewrite the the equation for \(F_{X|A}(t)\) as \[F_{X|A}(t)=\frac{\mathbb P\left(\frac12\leq X\leq t\right)}{\mathbb P\left(A\right)} =\frac{\left(t-\frac12\right)}{\frac12}=2\left(t-\frac12\right).\] Now we have formally obtained that conditioned on the event \(\left\{X\geq \frac12\right\}\) the random variable \(X\) has uniform distribution on \(\left[\frac12,1\right]\). Now it is easy to calculate its expected value and obtain \(\frac34\).

Definition. The conditional probability mass function of the random variable \(X\) given the event \(D\) is the function \(f_{X|D}:\mathbb R\to[0,1]\) defined as \[f_{X|D}(k)=\mathbb P\left(\left.X=k\right|D\right).\]

Conditioning on events of the form \(\{X=\alpha\}\) where \(X\) is a random variable with continuous distribution

We will now condition on random variables instead on events. This is one type of problem that we want to solve:

Problem 1. If \(A\) and \(B\) are independent normal random variables with \(A\sim N(3,16)\) and \(B\sim N(5,36)\), what is the conditional probability density function of \(A\) given that \(A+B=10\)?

The probability of the event \(\{A+B=10\}\) is equal to \(0\) because the random variable \(C=A+B\) is a random variable with continuous distribution.

However, we can still calculate conditional probabilities on events of the type \(\{C=0\}\). The above problem will be solved later in this document.

Our goal is to develop a way to calculate conditional distributions where conditioning is performed over certain events whose probability is \(0\). We will not be able to condition over all events of probability \(0\); for example we will never be able to condition over the empty set. However, there are special events of zero probability that arise from random variables. If \(X\) is a random variable with continuous distribution, then we will look at the events of the type \(\{X=\alpha\}\) where \(\alpha\) is some constant real number. Such events have probability \(0\). However, we will approximate them by events \(\{\alpha\leq X\leq \alpha+\varepsilon\}\) that have non-zero probability whenever \(\varepsilon > 0\). Then we will let \(\varepsilon\to 0\).

Assume that \(Y\) is another random variable with continuous distribution. We want to find the conditional probability density function of \(Y\) given that \(\{X=\alpha\}\). We will first find the conditional cumulative distribution function.

Assume that \(t\) is a real number. We define the conditional probability density function as \begin{eqnarray*} F_{Y|X=\alpha}(t)=\mathbb P\left(Y\leq t|X=\alpha\right)=\lim_{\varepsilon\to0}\frac{\mathbb P\left(Y\leq t, \alpha\leq X\leq\alpha+\varepsilon\right)}{\mathbb P\left(\alpha\leq X\leq \alpha+\varepsilon\right)}. \end{eqnarray*} The denominator of the last fraction is \(F_X(\alpha+\varepsilon)-F_X(\alpha)\) where \(F_X\) is the cumulative distribution function of \(X\). We can use the joint cumulative distribution function \(F_{X,Y}\) of random variables \(X\) and \(Y\) to express the numerator of the fraction as \[\mathbb P\left(Y\leq t, \alpha\leq X\leq\alpha+\varepsilon\right)=\mathbb P\left(Y\leq t, X\leq \alpha+\varepsilon\right)-\mathbb P\left(Y\leq t, X\leq \alpha\right)=F_{X,Y}(\alpha+\varepsilon,t)-F_{X,Y}(\alpha,t).\] The conditional cumulative distribution function of \(Y\) given \(\{X=\alpha\}\) now becomes \begin{eqnarray*} F_{Y|X=\alpha}(t)&=&\lim_{\varepsilon\to0}\frac{F_{X,Y}(\alpha+\varepsilon,t)-F_{X,Y}(\alpha,t)}{F_Y(\alpha+\varepsilon)-F_Y(\alpha)} =\lim_{\varepsilon\to0}\frac{\frac{F_{X,Y}(\alpha+\varepsilon,t)-F_{X,Y}(\alpha,t)}{\varepsilon}}{\frac{F_Y(\alpha+\varepsilon)-F_Y(\alpha)}{\varepsilon}} = \frac{\frac{\partial}{\partial x}F_{X,Y}(\alpha,t)}{f_Y(\alpha)}. \end{eqnarray*} The probability density function of \(Y\) given \(\{X=\alpha\}\) is \[f_{Y|X=\alpha}(t)=\frac{d}{dt}F_{Y|X=\alpha}(t)=\frac{\frac{\partial^2}{\partial y\partial x}F_{X,Y}(\alpha,t)}{f_Y(\alpha)}=\frac{f_{X,Y}(\alpha,t)}{f_Y(\alpha)}.\]

Remark. Observe that for discrete random variables we would have exactly the same equation as the last one; except that in discrete case \(f_{X,Y}\) would not be joint probability density function. Instead, it would be joint probability mass function. The same would hold for \(f_Y\).

Notation. It is common to use notation \(f_{Y|X}(t|\alpha)\) instead of \(f_{Y|X=\alpha}(t)\).

Problem 2. The joint density of \(X\) and \(Y\) is given by \[f(x,y)=\frac{Cx}{y^2}e^{-y^2-\frac{x^2}{y^2}},\quad x > 0, y > 0.\]

(a) Determine the constant \(C\).
(b) Calculate the conditional expectation \(\mathbb E\left[X|Y=y\right]\).

(a) The constant \(C\) can be determined from the requirement \begin{eqnarray*} 1&=&\iint_{(0,+\infty)^2}f(x,y)\,dxdy =\int_0^{+\infty}\frac{Ce^{-y^2}}{y^2}\left(\int_0^{+\infty}xe^{-\frac{x^2}{y^2}}\,dx\right)\,dy.\quad\quad\quad (1) \end{eqnarray*} The probability density function of \(Y\) is the integrand \[f_Y(y)=\frac{Ce^{-y^2}}{y^2}\left(\int_0^{+\infty}xe^{-\frac{x^2}{y^2}}\,dx\right) .\] The conditional density \(f_{X|Y=y}(x)\) satisfies \begin{eqnarray} f_{X|Y=y}(x)=\frac{f(x,y)}{f_Y(y)}.\end{eqnarray} Using the observation \(\frac{x}{y^2}e^{-\frac{x^2}{y^2}} =-\frac12\cdot\frac{\partial}{\partial x}\left(e^{-\frac{x^2}{y^2}}\right)\) we can evaluate the integral and obtain \begin{eqnarray*} f_Y(y)&=&\frac{C}{2} e^{-y^2}\left.\left(-e^{-\frac{x^2}{y^2}}\right)\right|_{x=0}^{x=+\infty}=\frac{C}{2}e^{-y^2}. \end{eqnarray*} If we place this back to (1) we obtain \begin{eqnarray} 1=\frac{C}2\int_0^{+\infty}e^{-y^2}\,dy.\quad\quad\quad\quad\quad\quad(2)\end{eqnarray} We will evaluate this integral using the substitution \(y=\frac{u}{\sqrt 2}\). Then we have \(dy=\frac{du}{\sqrt 2}\) and \begin{eqnarray*} \int_0^{+\infty}e^{-y^2}\,dy&=&\frac{1}{\sqrt 2} \int_0^{+\infty}e^{-\frac{u^2}2}\,du\newline&=&\frac 1{\sqrt 2}\cdot\sqrt{2\pi}\cdot \left(\frac1{\sqrt{2\pi}}\int_0^{+\infty}e^{-\frac{u^2}2}\,du\right)\newline &=&\sqrt{\pi}\mathbb P\left(Z\geq 0\right), \end{eqnarray*} where \(Z\) is a standard normal random variable. We know that \(\mathbb P\left(Z\geq 0\right)=\frac12\) hence \begin{eqnarray} \int_0^{+\infty}e^{-y^2}\,dy&=&\frac{\sqrt{\pi}}2.\quad\quad\quad\quad\quad(3) \end{eqnarray} From (2) we get \(C=\frac{4}{\sqrt{\pi}}\).
(b) The conditional expectation \(\mathbb E\left[X|Y=y\right]\) can be evaluated using the conditional density of \(X\) given \(\{Y=y\}\). Assuming that \(x > 0\) and \(y > 0\) we obtain \begin{eqnarray*} f_{X|Y=y}(x)&=&\frac{\frac{Cx}{y^2}e^{-y^2-\frac{x^2}{y^2}}}{\frac{C}2e^{-y^2}} =\frac{2x}{y^2}e^{-\frac{x^2}{y^2}}. \end{eqnarray*} The expectation can be now calculated using substitution \(z=\frac{x}{y}\). Then we have \(dx=y\,dz\) and the bounds are \(0 < z < +\infty\). \begin{eqnarray*} \mathbb E\left[X|Y=y\right]&=&\int_0^{+\infty}xf_{X|Y=y}(x)\,dx=\int_0^{+\infty}\frac{2x^2}{y^2}e^{-\frac{x^2}{y^2}}\,dx\newline &=&2y\int_0^{+\infty}z^2e^{-z^2}\,dz. \end{eqnarray*} We now use integration by parts with \(f=z\) and \(dg=2ze^{-z^2}\,dz\). Then we have \(df=dz\) and \(g=-e^{-z^2}\). The integral becomes \begin{eqnarray*} \mathbb E\left[X|Y=y\right]&=& \left.-2yz^2e^{-z^2}\right|_{z=0}^{z=+\infty}+y\int_0^{+\infty}e^{-z^2}\,dz. \end{eqnarray*} Using the result we obtain in (3) the calculation can be finished in the following way \begin{eqnarray*} \mathbb E\left[X|Y=y\right]&=&\frac{y\sqrt{\pi}}2. \end{eqnarray*}

Problem 3. The joint density of the random variables \(X\) and \(Y\) is given by \[f(x,y)=Cx^2e^{-xy}\cdot 1_{[0,1)}(x)\cdot 1_{[0,x)}(y),\] where \(C\) is a constant.

(a) Determine the constant \(C\).
(b) Determine the probability of the event \(Y > \frac{X}2\).
(c) Determine the conditional density of \(X\) given \(Y=y\).

(a) We need to find the constant \(C\) for which \(\int_{0}^1\int_0^xCx^2e^{-xy}\,dydx=1\). We will use the substitution \(z=xy\) to evaluate the inner integral \(\int_0^x x^2e^{-xy}\,dy\). The variable \(y\) can be expressed in terms of \(z\) as \(y=\frac zx\) which gives us \(dy=\frac1x\,dz\) and the bounds of integration become \(0\leq z\leq x^2\). Therefore we obtain \(\int_0^x x^2e^{-xy}\,dy= \int_0^{x^2}xe^{-z}\,dz=x-xe^{-x^2}\). We now have \begin{eqnarray*}1&=&\int_{0}^1\int_0^xCx^2e^{-xy}\,dydx\newline&=&C\int_0^1\left(x-xe^{-x^2}\right)\,dx\newline&=&C \cdot\left.\left(\frac{x^2}2+\frac12e^{-x^2}\right)\right|_{x=0}^{x=1}=\frac{C}{2e}.\end{eqnarray*} Thus \(C=2e\).
(b) The probability of the required event can be calculated by evaluating similar integrals as in the part (a) of the problem. Again, we will use the same substitution \(z=xy\) in the inner integral. \begin{eqnarray*} \mathbb P\left[Y> \frac{X}2\right]&=&2e\int_0^1\int_{\frac x2}^x x^2e^{-xy}\,dy\,dx\newline&=&2e\int_0^1x\int_{\frac{x^2}2}^{x^2}e^{-z}\,dz\,dx\newline&=& 2e\int_0^1x\left(e^{-\frac{x^2}2}-e^{-x^2}\right)\,dx\newline &=&2e\left.\left(-e^{-\frac{x^2}2}+\frac12e^{-x^2} \right)\right|_{x=0}^{x=1}\newline&=&2e\cdot\left(-e^{-\frac12}+1+\frac1{2e}-\frac12\right)\newline &=&e-2\sqrt e+1=\left(\sqrt e-1\right)^2. \end{eqnarray*}
(c) Once \(Y=y\) is fixed, then the support of \(f_{X|Y}(x|y)\) must be \((y,1)\). \begin{eqnarray*} f_{X|Y}(x|y)&=&\frac{f(x,y)}{\int_y^1 f(x,y)\,dx}=\frac{2ex^2e^{-xy}\cdot 1_{[0,1)}(x)\cdot 1_{[0,x)}(y)}{2e\int_y^1 x^2e^{-xy}\,dx}\newline&=&\frac{x^2e^{-xy}\cdot 1_{[0,1)}(x)\cdot 1_{[0,x)}(y)}{\int_y^1 x^2e^{-xy}\,dx}. \end{eqnarray*} We will use the integration by parts to evaluate the integral in the denominator. We will consider the following two cases:
- Case 1: \(y\neq 0\). We use integration by parts with \(u=x^2\) and \(dv=e^{-xy},dx\) to obtain that \(du=2x\,dx\) and \(v=-\frac1ye^{-xy}\). The integral becomes \begin{eqnarray*}\int_y^1 x^2e^{-xy}\,dx&=&\left.-\frac{x^2e^{-xy}}{y}\right|_{x=y}^{x=1}+\frac2y\int_y^1xe^{-xy}\,dx \newline&=& -\frac1ye^{-y}+ye^{-y^2} \newline &&+\frac2y\left(\left.-\frac{xe^{-xy}}y\right|_{x=y}^{x=1}+\frac1y\int_y^1e^{-xy}\,dx\right).\end{eqnarray*} Furthermore, we obtain \begin{eqnarray*}\int_y^1 x^2e^{-xy}\,dx &=&-\frac1ye^{-y}+ye^{-y^2}-\frac2{y^2}e^{-y}+\frac2ye^{-y^2}-\left.\frac2{y^3}e^{-xy}\right|_{x=y}^{x=1}\newline &=&-\frac1ye^{-y}+ye^{-y^2}-\frac2{y^2}e^{-y}\newline &&+\frac2ye^{-y^2}-\frac2{y^3}e^{-y}+\frac2{y^3}e^{-y^2}.\end{eqnarray*} Connecting beginning and end we conclude \begin{eqnarray*}\int_y^1 x^2e^{-xy}\,dx &=&e^{-y^2}\left(y+\frac2y+\frac2{y^3}\right)-e^{-y}\left(\frac1y+\frac2{y^2}+\frac2{y^3}\right). \end{eqnarray*} We finally obtain \[f_{X|Y}(x|y)=\frac{x^2e^{-xy}\cdot 1_{[0,1)}(x)\cdot 1_{[0,x)}(y)}{e^{-y^2}\left(y+\frac2y+\frac2{y^3}\right)-e^{-y}\left(\frac1y+\frac2{y^2}+\frac2{y^3}\right)}.\]
- Case 2: \(y= 0\). The integral in the denominator becomes \[\int_0^1x^2\,dx=\frac13.\] The conditional density is \[f_{X|Y}(x|0)=3x^2e^{-xy}\cdot 1_{[0,1)}(x)\cdot 1_{[0,x)}(y).\]

Bivariate normal random variables

When dealing with bivariate normal random variables, we often can use a trick to avoid dealing with conditional probability density functions. The trick is the following: If \(X\) and \(Y\) have bivariate normal distribution then \(X\) can be expressed as \(X=\alpha Y+\beta Z\) where \(Z\) is independent of \(Y\). Alternatively, \(Y\) can be expressed as \(Y=\gamma X+ \delta W\), where \(W\) is independent of \(X\). You would need to calculate the constants \(\alpha\) and \(\beta\) (or the constants \(\gamma\) and \(\delta\) if you choose to express \(Y\) in terms of \(X\) and \(W\)). These constants are found by solving a system of equations obtained from the covariance matrix.

To illustrate the technique, we will solve the problem 1 from the beginning of this document.

Problem 1. If \(A\) and \(B\) are independent normal random variables with \(A\sim N(3,16)\) and \(B\sim N(5,9)\), what is the conditional probability density function of \(A\) given that \(A+B=10\)?

There exist independent standard normal random variables \(Q\) and \(R\) such that \(A=3+4Q\) nad \(B=5+6R\). Then the random variable \(C=A+B=8+4Q+3R\) has a normal distribution. The normal variable \(4Q+3R\) is normal with expectation \(0\) and variance \(4^2+3^2=25\). Therefore there exist a standard normal random variable \(S\) such that \(5S=4Q+3R\). With this notation we have \(A+B=8+5S\). The event \(\{A+B=10\}\) can now be written as \(5S=2\) or \(S=\frac25\). We need to calculate \[F_{A\left|\left\{S=\frac25\right\}\right.}(t)=\mathbb P\left(A\leq t\left|S=\frac25\right.\right)= \mathbb P\left(3+4Q\leq t\left|S=\frac25\right.\right)=\mathbb P\left(Q\leq \frac{t-3}4\left|S=\frac25\right.\right). \] Recall that \(Q\) is a standard normal random variable. Therefore \(\mathbb E\left[Q\right]=0\) and \(\mathbb E\left[Q^2\right]=1\). We can now calculate the covariance between \(Q\) and \(S\). \[\mbox{cov}(Q,S)=\mathbb E\left[Q\cdot \frac{4Q+3R}5\right]= \frac45\mathbb E\left[Q^2\right]+\frac35\mathbb E\left[QR\right]=\frac45+\frac35\cdot \mathbb E\left[Q\right]\cdot\mathbb E\left[R\right]=\frac45.\]

Since \(Q\) and \(S\) bivariate normal distribution, with expectations \(0\) and covariance matrix \(\left[\begin{array}{cc}1&\frac45\newline \frac45&1\end{array}\right]\) there exists a standard normal random variable \(T\) independent of \(S\) and two real numbers \(\alpha\) and \(\beta\) such that \[Q=\alpha S+ \beta T.\] We need to determine \(\alpha\) and \(\beta\) from the conditions \(\mathbb E\left[Q^2\right]=1\) and \(\mathbb E\left[QS\right]=\frac45\). The last condition gives us immediately the value for \(\alpha\). Namely, \[\frac45=\mathbb E\left[QS\right]=\mathbb E\left[(\alpha S+\beta T)\cdot S\right]=\alpha\mathbb E\left[S^2\right]+\beta\mathbb E\left[ST\right]=\alpha\cdot 1+\beta\mathbb E\left[S\right]\cdot\mathbb E\left[T\right]=\alpha+\beta\cdot 0\cdot 0=\alpha.\] Therefore \(\alpha=\frac45\). From \(\mathbb E\left[Q^2\right]=1\) we calculate \(\beta\). \begin{eqnarray*} \mathbb E\left[Q^2\right]&=&\mathbb E\left[\left(\frac45 S+ \beta T\right)^2\right]=\frac{16}{25}\mathbb E\left[S^2\right]+2\cdot\frac45\cdot\beta\mathbb E\left[ST\right]+\beta^2\mathbb E\left[T^2\right]\newline &=&\frac{16}{25}\cdot 1+2\cdot \frac45\cdot \beta\cdot \mathbb E\left[S\right]\cdot\mathbb E\left[T\right]+\beta^2\cdot 1. \end{eqnarray*} From \(\mathbb E\left[S\right]=\mathbb E[T]=0\) we finally obtain \(\beta^2=\frac{9}{25}\). We may take \(\beta=\frac35\).

We now have \(Q=\frac45S+\frac35T\) where \(T\) is independent of \(S\). The required conditional probability now satisfies \begin{eqnarray*}F_{A\left|\left\{S=\frac25\right\}\right.}(t)&=&\mathbb P\left(Q\leq \frac{t-3}4\left|S=\frac25\right.\right)= \mathbb P\left(\frac45S+\frac35T\leq \frac{t-3}4\left|S=\frac25\right.\right)=\mathbb P\left(\frac45\cdot \frac25+\frac35T\leq \frac{t-3}4\left|S=\frac25\right.\right). \end{eqnarray*} We now use that \(T\) is independent of \(S\) to obtain \begin{eqnarray*}F_{A\left|\left\{S=\frac25\right\}\right.}(t)&=&\mathbb P\left(\frac45\cdot \frac25+\frac35T\leq \frac{t-3}4\left|S=\frac25\right.\right)= \mathbb P\left(\frac8{25}+\frac35T\leq \frac{t-3}4 \right)=\mathbb P\left(\frac35T\leq \frac{t-3}4-\frac8{25}\right)\newline &=& \mathbb P\left(T\leq \frac{25t-75-32}{60}\right)=\mathbb P\left(T\leq \frac{25t-107}{60}\right)\newline &=&\frac1{\sqrt{2\pi}}\int_{-\infty}^{\frac{25t-107}{60}}e^{-\frac{s^2}2}\,ds. \end{eqnarray*}

The conditional probability density function satisfies \[f_{A\left|\left\{S=\frac25\right\}\right.}(t)=\frac{d}{dt}F_{A\left|\left\{S=\frac25\right\}\right.}(t)=\frac1{\sqrt{2\pi}}e^{-\frac12\cdot\left(\frac{25t-107}{60}\right)^2}\cdot\frac{5}{12}.\]

Problem 4. Assume that \(X\) and \(Y\) are bivariate normal random variables with mean \(0\) and covariance matrix \(\Sigma\). Assume further that \(\Sigma=\left[\begin{array}{cc} 1& \rho\newline \rho&1\end{array}\right]\). Evaluate \(\mathbb E\left[\left.e^X\right|Y=y_0\right]\).