Posts tagged statistics
Histograms and stem-and-leaf plots

A histogram, also called a frequency histogram, is just like a bar graph, except that we collect the data into buckets or bins, and then sketch a bar for each bucket. Each bucket needs to be the same size so that they’re capable of holding the same amount of data. A stem-and-leaf plot (also called a stem plot), is just another way to summarize data. It’s similar to a histogram, because both types of charts group together data points, and are good ways to show how many data points fall into a certain category or range.

Read More
Relative frequency tables

We’re familiar now with displaying data in a two-way table. But sometimes it’s helpful to express the data in a two-way table as percentages. If we want to express percentages, then we just need to change the two-way table into a relative frequency table, which is a table that shows percentages instead of actual counts.

Read More
Measures of spread: range and IQR of the data set

We looked at measures of central tendency, which we saw were various ways of representing the “middle” of a data set. But central tendency isn’t the only thing we’re interested in when it comes to data. We also want to know about spread, which is how, and by how much, our data set is spread out around its center. We also call measures of spread measures of dispersion, or scatter.

Read More
All about Venn diagrams

We’ve seen how we can express data in a two-way table, and then translate that data into bar and line graphs. But we can also express data from a two-way table in a different visualization, called a Venn diagram. Venn diagrams help us express the overlap, or not, in categorical data.

Read More
Understanding joint, marginal, and conditional distributions

A joint distribution is a table of percentages similar to a relative frequency table. The difference is that, in a joint distribution, we show the distribution of one set of data against the distribution of another set of data. In this lesson we’ll look at joint, marginal, and conditional distributions.

Read More
Box-and-whisker plots

Box-and-whisker plots (also called box plots) are a great way to represent a data set when we want to show the median and spread of the data at the same time. The median is represented by the line through the center of the data, and the first and third quartiles are represented by the edges of the box.

Read More
Symmetric and skewed distributions and outliers

A density curve is technically the smooth line that encloses a distribution. We call it a distribution because the area under the curve shows us the distribution of our data. In this lesson we’ll look at distributions with different shapes, like symmetric and normal distributions, and skewed distributions.

Read More
Using the addition rule, and union vs. intersection

Sometimes we’ll need to find the probability that two events occur together within one experiment. Remember that an event is a specific collection of outcomes from the sample space. For example, what’s the probability that we roll a pair of 6-sided dice and either get at least one 1, or an even sum when we add the dice together? This is when we use the addition rule for probability.

Read More
"At least" and "at most," and mean, variance, and standard deviation

We can do more than just calculate the probability of pulling exactly 3 red marbles in 5 total pulls. For any binomial random variable, we can also calculate something like the probability of pulling at least 3 red marbles, or the probability of pulling no more than 3 marbles.

Read More
How to solve Bayes' theorem problems

Bayes’ theorem, also known as Bayes’ law or Bayes’ rule, tells us the probability of an event, given prior knowledge of related events that occurred earlier. To simplify Bayes’ theorem problems, it can be really helpful to create a tree diagram. If you’re ever having trouble figuring out a conditional probability problem, a tree diagram is a great tool to fall back on, because it shows all of the sample space of the problem.

Read More
Combinations of random variables

Now we want to look at what happens when we combine two data sets, either by adding them or subtracting them. When we’re combining multiple linear random variables, we can find the mean and standard deviation of the combination using the means and standard deviations of the individual variables.

Read More
Probability with geometric random variables

Remember that for a binomial random variable X, we’re looking for the number of successes in a finite number of trials. For a geometric random variable, most of the conditions we put on the binomial random variable still apply: 1) each trial must be independent, 2) each trial can be called a “success” or “failure,” and 3) the probability of success on each trial is constant.

Read More
Correlation coefficients and the residual

In the last section we talked about the regression line, and how it was the line that best represented the data in a scatterplot. In this section, we’re going to get technical about different measurements related to the regression line.

Read More
Using line graphs and ogives to display data

Line graphs are really similar to bar graphs. In fact, to turn a bar graph into a line graph, all you have to do is connect the middle of the top of each bar to the middle of the top of the bar beside it with a straight line, and you’ll form the line graph. Ogives are like cumulative line graphs.

Read More
Displaying data in frequency tables and dot plots

Think of a frequency table as a table that displays how frequently or infrequently something occurs. A dot plot display can also be used to show the frequency of small data sets. If I want to categorize the vehicles in a parking lot by type, I could do so in frequency table.

Read More
Probability with binomial random variables

Remember that “bi” means two, so a binomial variable is a variable that can take on exactly two values. A coin is the most obvious example of a binomial variable because flipping the coin can only result in two values: heads or tails.

Read More
Transforming random variables by shifting and scaling the data set

Shifting the data set by a constant k means adding k to every value in the data set, or subtracting k from every value in the data set. On the other hand, scaling the data set by a constant k means multiplying or dividing every value in the data set by k.

Read More
All about one-way data tables for statistics

The simplest kind of information we’ll work with in this course is a set of individuals with one or more properties, called variables. The individuals are the items in the data set and can be cases, things, people, etc. When we construct a table, we want to think about whether we have more individuals or more variables. We’ll usually put whichever we have more of down the side of the table, so that the table is taller rather than wider.

Read More
Calculating basic probabilities

Up to now we’ve been talking about statistics, which is all about data, and how to display, summarize, and analyze data. Now we’ll transition into probability, which is all about the likelihood of whether or not some event will occur.

The reason we study statistics and probability together is because when we collect data as part of a statistical study, we want to be able to use what we know about probability to say how likely it is that our results are reliable. So in that way, statistics and probability go hand-in-hand.

Read More