By Susan A. Peters

s.peters@louisville.edu

Did you learn probability as a set of rules to be followed? If you did, the rules may not have made much sense to you. One of the main problems with probability is its counterintuitive nature, yet approaching probability with rules does little to build intuition.

If we truly wish to have students persevere in solving problems – particularly problems in probability – then we need to provide them with the tools that position them for success. One way to do so is by approaching many probability problems with what statistician Roxy Peck calls “hypothetical 1,000” tables.

The hypothetical 1,000 tables are two-way frequency tables for which we assume a population of 1,000, use given probability information to complete some cells in the table, use arithmetic to complete the remaining cells in the table, and use the cell values in the tables to accurately estimate probabilities.

This approach can be used to address grade 7 probability standards related to compound events (7.SP) and high school probability standards related to conditional probabilities and compound events (S.CP). It also engages students with the mathematical practices of problem solving (MP1) and attending to precision (MP6). Let’s look at some problems to examine this approach.

**Typical Probability Problem and Solution**

First, consider the following scenario. In 2017, the Pew Research Center published a report by Kenneth Olmstead and Aaron Smith with results from a 2016 survey of adult internet users to investigate what the public knows about cybersecurity. The center used surveying methods known to produce representative samples and collected data about respondents, including education level. They also asked respondents to answer questions about a variety of topics related to cybersecurity.

The study’s authors found that 35 percent of respondents were college graduates, and 65 percent of the college graduates knew that email is not encrypted by default. Among all of the respondents, only 46 percent of respondents knew this fact.

Now answer the following question using the information from this scenario. What is the probability that an adult internet user would be both college educated and know that email is not encrypted by default?

How did you approach this problem? If you are like many people, you began by recording what you know and what you want to find, using symbols such as the following.

*Let C represent college graduate.*

*Let E represent knowing email is not encrypted by default. *

*Then P(C) = 0.35, P(E) = 0.46, and P(E|C) = 0.65.*

*Find P(E and C).*

At this point, you might consider different probability formulas that you may remember. There are two different multiplication rules: one for independent events – events in which the occurrence of one event is not affected by knowledge about the other event – and one for dependent events.

E and C are not independent. We know that *P(E)* is not the same as *P(E|C)* from the information given to us, so we could use the rule for dependent events to solve the problem – *P(E and C) = P(C)*P(E|C) = 0.35*0.65 = 0.2275*.

**A Problem-Solving Tool for Typical Probability Problems**

Apart from the multiplication rule, you could approach the problem by setting up a table using the given probabilities to consider results for a hypothetical 1,000 people. The scenario describes two variables related to education and email knowledge, and each variable has two possible outcomes. A 2×2 table with these variables might look something like the following. With middle school students, we likely would use only the words displayed in the table, but with high school students, we likely would use the symbols instead.

CCollege Graduate | Not C (C^{C})Not College Graduate | Totals | |

EKnow about email encryption | |||

Not E (E^{C})Not know about email encryption | |||

Totals |

We begin completing the table by considering results for a hypothetical 1,000 people. We know that 35 percent of the people are college graduates. We can complete the totals row in our table because we know there are *1,000*0.35 = 350* college graduates and *1,000 – 350 = 650* non-college graduates. We also know that 46 percent of the people know that email is not encrypted by default. We can complete the totals column of our table because *1,000*0.46 = 460* people would know this information about email and *1,000 – 460 = 540* would not.

CCollege Graduate | Not C (C^{C})Not College Graduate | Totals | |

EKnow about email encryption | 460 | ||

Not E (E^{C})Not know about email encryption | 540 | ||

Totals | 350 | 650 | 1,000 |

We also know that 65 percent of the 350 college graduates, or approximately *0.65*350 = 228* people, would know that email is not encrypted by default. That means that *350 – 228 = 122* college graduates would not know this fact about email.

Notice that without even completing this table, we have enough information to determine the probability that an adult internet user would be both college educated and know that email is not encrypted by default. From the table, we can see that 228 of the 1,000 adult internet users – or 22.8 percent – meet both conditions.

CCollege Graduate | Not C (C^{C})Not College Graduate | Totals | |

EKnow about email | 228 | 460 | |

Not E (E^{C})Not know about email encryption | 122 | 540 | |

Totals | 350 | 650 | 1,000 |

We also have enough information to complete the table (see below) and to answer other probability questions. We can answer questions such as the probability that an adult internet user would be college educated or know that email is not encrypted by default *P(C or E) = (122 + 228 + 232)/1000 = 582/1000 = 0.582*, the probability that an individual who is not a college graduate would know that email is not encrypted by default* [P(E given not C) = P(E│C ^{C} ) = 232/650≈0.35]*, and the probability that an individual who does not know that email is not encrypted by default is not a college graduate

*[P(not C given not E) = P(C*, among others.

^{C}│E^{C}) = 418/540≈0.774]CCollege Graduate | Not C (C^{C})Not College Graduate | Totals | |

EKnow about email | 228 | 232 | 460 |

Not E (E^{C})Not know about email encryption | 122 | 418 | 540 |

Totals | 350 | 650 | 1,000 |

**Attending to Precision**

You might have noticed that we achieved a slightly different answer for the probability that an adult internet user would be both college educated and know that email is not encrypted by default from using the 2×2 table than we did from using the formulas to solve the problem. We need greater precision in our answer from the 2×2 table.

How do we achieve greater precision? We simply increase the number of hypothetical people we consider. For this problem, the difference in our answers resulted from our approximated number of college graduates who know about email encryption, and the differences disappear when we consider a hypothetical 10,000 people. In general, we would keep increasing the number of hypothetical people we consider until we reach our desired level of precision.

CCollege Graduate | Not C (C^{C})Not College Graduate | Totals | |

EKnow about email | 2,275 | 2,325 | 4,600 |

Not E (E^{C})Not know about email encryption | 1,225 | 4,175 | 5,400 |

Totals | 3,500 | 6,500 | 10,000 |

**Middle School and High School Applications**

*Middle School*

How might the hypothetical 1,000 approach be used in middle school? One of the standards in the Statistics and Probability domain for 7th-grade students relates to finding probabilities for compound events: “Find probabilities of compound events using organized lists, tables, tree diagrams, and simulation.” We can use the approach to help students calculate probabilities for compound events.

A common problem given to middle school students in relation to this standard is to find the probability that when two coins are tossed, both coins will land on heads. We can use the hypothetical 1,000 approach to solve this problem by considering results for 1,000 tosses of two coins! We would begin with a table such as the following.

Coin 1 | Totals | |||

Heads | Tails |
|||

Coin 2 | Heads | |||

Tails | ||||

Totals | 1,000 |

Assuming that both coins are fair, we would expect half of each coin’s tosses to land on heads and half to land on tails. As a result, we can complete the Totals row and column in the table. Not only can we complete the totals entries, however, we can complete the entire table because what remains is to now consider the outcome of 500 tosses for each coin.

Again, if each coin is fair, we would expect half of the tosses to land on heads and half to land on tails. From the completed table, then, we can see that of the 1,000 tosses of two coins, 250 result in both coins landing on heads, or 25 percent. Thus, the probability that both coins will land on heads is 1/4.

Coin 1 | Totals | |||

Heads | Tails | |||

Coin 2 | Heads | 250 | 250 | 500 |

Tails | 250 | 250 | 500 | |

Totals | 500 | 500 | 1,000 |

Students can use the hypothetical 1,000 approach to solve many of the compound probability problems that we ask middle levels students to solve, although we would need to expand the table if more than two outcomes are possible for either variable. The approach provides an intuitive means for students to calculate probabilities without memorizing formulas and use of this approach positions students for success with meeting high school probability standards.

*High School*

How might the hypothetical 1,000 table be used in high school? One of the standards in the Conditional Probability and the Rules of Probability domain for high school students relates to using two-way tables to find conditional probabilities: “Construct and interpret two-way frequency tables of data when two categories are associated with each object being classified. Use the two-way table as a sample space to decide if events are independent and to approximate conditional probabilities.”

We examined several examples of using a two-way table to find conditional probabilities when we first considered the hypothetical 1,000 approach. In addition to using the approach to find conditional and compound probabilities, we can use the approach to develop the addition rule and the multiplication rule for students to gain greater understanding of the rules and to increase their probability of success with solving problems using the rules.

To consider development of the rule for calculating conditional probabilities and the general multiplication rule, we will refer to the table we created using data for the variables of college graduation and knowing about email encryption, displayed again below. Also, consider the probability that an individual who is not a college graduate would know that email is not encrypted by default *[P(E given not C) = P(E│C ^{C}) = 232/650≈0.35]*.

From the table, the two values we use to calculate the probability are *P(E and C ^{C})* and

*P(C*, and we see that

^{C})*P(E│C*. If we use the table to calculate additional conditional probabilities such as

^{C}) = P(E and C^{C})/P(C^{C})*P(E│C)*and

*P(C│E)*, we would see that

*P(E│C) = (P(E and C)/(P(C))*and

*P(C│E) = (P(E and C))/(P(E))*.

From these probabilities, we can also see that* P(E and C) = P(C)P(E│C) = P(E)P(C│E). *Calculating additional probabilities using this table and others should lead students to generalize their observations and develop the rule for calculating conditional probabilities and the General Multiplication rule: *P(A and B) = P(A)P(B│A) = P(B)P(A|B)*.

CCollege Graduate | Not C (C^{C})Not College Graduate | Totals | |

EKnow about email | 228 | 232 | 460 |

Not E (E^{C})Not know about email encryption | 122 | 418 | 540 |

Totals | 350 | 650 | 1,000 |

To consider development of the addition rule using these same data, remember that the probability that an adult internet user would be college educated or know that email is not encrypted by default is *P(C or E) = (122 + 228 + 232)/1000 = 582/1000*. The values used in this calculation are in the table above.

Notice, however, that we could rewrite the probability as *P(C or E) = (122 + 228 + 232)/1000 = 122/1000 + 228/1000 + 232/1000 = P(C and E ^{C}) + P(C and E) + P(C^{C} and E) = P(C) + P(C^{C} and E)*. An alternative way of writing

*P(C*is

^{C}and E)*P(E) – P(C and E)*. The resulting formula then becomes

*P(C or E) = P(C) + P(E) – P(C and E)*. If we look at adding

*P(C)*and

*P(E)*from the table, we can see that the value of 228 is added twice in our calculations, which why we need to subtract

*P(C and E)*. Although the addition rule,

*P(A or B) = P(A) + P(B) – P(A and B)*, might be more difficult for students to develop, the image of the table and solving probability problems using the table likely will help them to remember the rule.

**Conditions for Using the Hypothetical 1,000 Approach**

When can we use the hypothetical 1,000 approach? In general, to use the approach with a 2×2 table, we need to know three of the probabilities associated with two events with two possible outcomes. With respect to events C and E from our original example, we would need to know the probabilities of each event occurring *[P(C) and P(E)*] and one of the compound probabilities that one or both of the events would occur [either *P(C or E)* or *P(**E and C)*].

If, however, the two events are mutually exclusive or independent, we only would need to know the probability of each event occurring [*P(C)* and *P(E)*]. Technically, if the events are mutually exclusive, we already know the probability of both events occurring because the two events cannot occur at the same time [*P(E and C) = 0*].

In the case of the two independent events, we can find the probability that both events will occur by using the multiplication rule [*P(E and C) = P(E)P(C)*]. Students should have sufficient information to use the approach and find probabilities using the information provided in most traditional probability problems.

Probability does not need to be a word that evokes images of complicated formulas or that prompts nightmares. The hypothetical 1,000 approach shifts the focus from formulas to calculate probabilities to the meanings of the probabilities being calculated. The approach arms students with a tool that they can use to persist in solving complex probability problems while providing opportunities for students to attend to precision. The approach can be used to address standards at both the middle school and high school levels to allow students to naturally transition from solving relatively simple probability problems to solving complex problems.

*Susan A. Peters is an associate professor in the Department of Middle and Secondary Education at the University of Louisville. She teaches prospective middle and high school mathematics teachers and is interested in statistics education and mathematics teacher education.*