Combinations of random variables

 
 
Combinations of random variables blog post.jpeg
 
 
 

What happens when we combine linear random variables

Linear combinations of random variables

We just reviewed what happens when we shift or scale a data set by a constant value. But now we want to look at what happens when we combine two data sets, either by adding them or subtracting them.

Krista King Math.jpg

Hi! I'm krista.

I create online courses to help you rock your math class. Read more.

 

For example, let’s say I have two variables: how much time I spend each day walking and biking, ???W??? and ???B??? respectively. And let’s say that I have data on my walking and biking habits for a full year, and I’ve already found the mean and standard deviation for both variables.

 
mean and standard deviation of both variables
 

But now I want to know the mean for the sum of my walking and biking time together. In other words, I spend some time walking and biking each day, so I’d like to get an average for my total daily activity time.

We’ll call the total activity ???A???, which means we’re looking for ???\mu_A??? (or the expected value ???E(A)???). We know that ???A=W+B???. Here’s the rule we need to remember: when we want to find the mean of the sum, we just find the sum of the means. So because ???A=W+B???,

???\mu_A=\mu_W+\mu_B???

???\mu_A=1.1+0.6???

???\mu_A=1.7???

But if we want to find the standard deviation of these two variables, we can’t simply add the standard deviations together. In other words, ???\sigma_A=\sigma_W+\sigma_B??? is not a valid equation.

Instead, to find the standard deviation of the total activity, we need to square the two standard deviations. Remember that this is really giving us the variation for both walking and biking.

???\sigma^2_W=0.2^2=0.04???

???\sigma^2_B=0.1^2=0.01???

Then we add these together to get the sum of the variances, which gives us the variance for total activity.

???\sigma^2_A=\sigma^2_W+\sigma^2_B???

???\sigma^2_A=0.04+0.01???

???\sigma^2_A=0.05???

Then to find standard deviation for total activity, we take the square root of both sides.

???\sqrt{\sigma^2_A}=\sqrt{0.05}???

???\sigma_A\approx0.22???

Instead of the sum, we could also find the difference in my walking and biking times. We could define a new variable for the difference and call it ???D???. Then the difference is ???D=W-B???, and the expected value of the difference would be

???E(D)=\mu_D=\mu_W-\mu_B???

???E(D)=\mu_D=1.1-0.6???

???E(D)=\mu_D=0.5???

And the standard deviation of the difference would be

???\sqrt{\sigma^2_D}=\sqrt{\sigma^2_W+\sigma^2_B}???

???\sqrt{\sigma^2_D}=\sqrt{0.04+0.01}???

???\sqrt{\sigma^2_D}=\sqrt{0.05}???

???\sigma_D\approx0.22???

One important thing to note is that, regardless of whether we’re finding the sum of the variables, or the difference of the variables, in both cases we take the sum of the variances ???\sigma^2_W+\sigma^2_B???. We don’t use the sum of the variances for the sum, and the difference of the variances for the difference; we always use the sum for both.

When we find the mean of the sum or difference of variables, it doesn’t matter whether or not the variables are dependent or independent. In other words, if the variables are dependent, we can find a valid mean of their sum or difference. And if the variables are independent, we can find a valid mean of their sum or difference.

But in order to find the standard deviation of the sum or difference of two variables, the variables must be independent. So we can summarize what we know about the formulas this way:

 
mean and standard deviation of the combination
 


Combinations of normally distributed variables

When we combine variables that are both normally distributed, the combination will be normally distributed as well.

So if we’re given the mean and standard deviation of two normally distributed variables, we can calculate the mean and standard deviation of the new combination.

But then, since the combination is normally distributed, we can use what we know about the probability under normal distributions to answer probability questions about the combination.

 
 

How to find the mean and standard deviation of a combination of random variables


 
Krista King Math Signup.png
 
Probability & Statistics course.png

Take the course

Want to learn more about Probability & Statistics? I have a step-by-step course for that. :)

 
 

 
 

Answering probability questions with random variable combinations

Example

A popcorn company fills each of its variety popcorn tins with three flavors of popcorn: white cheddar, caramel, and chocolate covered. The amount of each flavor of popcorn that gets packed in the tin is normally distributed with a mean of ???1??? pound and a standard deviation of ???0.1??? pounds. The amount of each popcorn flavor is independent from the other flavors.

If ???W??? is the total weight of popcorn in a randomly selected tin, find the probability that the tin contains less than ???3.25??? pounds.

We have three normally distributed variables, one for each flavor. Their means are

mean of each popcorn flavor

Therefore, the mean of the combination (the mean weight of a full tin) is

???\mu_W=\mu_D+\mu_M+\mu_C???

???\mu_W=1+1+1???

???\mu_W=3???

The standard deviations of the three normally distributed variables are

standard deviation of each popcorn flavor
Combinations of random variables for Probability and Statistics.jpg

So if we’re given the mean and standard deviation of two normally distributed variables, we can calculate the mean and standard deviation of the new combination.

To find the standard deviation of the combination (the standard deviation of the weight of a full tin), we’ll find the variance of the combination.

???\sigma_W^2=\sigma_D^2+\sigma_M^2+\sigma_C^2???

???\sigma_W^2=0.1^2+0.1^2+0.1^2???

???\sigma_W^2=0.01+0.01+0.01???

???\sigma_W^2=0.03???

So the standard deviation is

???\sigma_W=\sqrt{0.03}???

???\sigma_W\approx0.1732???

Now that we have the mean ???\mu_W=3??? and standard deviation ???\sigma_W\approx0.1732??? of the normally distributed weight of the full tin, we can answer probability questions about the combined normal distribution. We want to find the probability that the tin contains less than ???3.25??? pounds.

The distance of ???3.25??? from the mean of ???3??? is

???3.25-3=0.25???

Expressed in standard deviations, that’s

???\frac{0.25}{0.1732}\approx1.44???

standard deviations above the mean. If we look up ???z=1.44??? in a ???z???-table, we find the value ???0.9251???. Which means there’s an approximately ???93\%??? chance that the weight of the full tin is less than ???3.25??? pounds.

 
Krista King.png
 

Get access to the complete Probability & Statistics course