True confessions:

I thumbed through Unit 6 when the books were first delivered and saw it included scatter plots and two way frequency tables. No problem. I’ve taught those things before. And scatter plots means line of best fit, so we’ll be reviewing equations of line and slope shortly before testing. Perfect.

And I put the book away.

In our first year we ended up moving unit 6 to last, because unit 8 seemed more important to get in before testing. So it turned out we barely touched any of unit 6 the first year, and definitely did not take time to get the big picture of the unit.

So here I am in year 2, finally digging into Unit 6. And I can say there was a lot I didn’t get first time around. Let’s begin with the title: Associations in Data. This title fits the two ideas I formally thought of as separate into one overarching concept, and that understanding frames everything you are doing in the unit. (I wish this was a little more explicitly discussed in the unit overview – maybe next edition??).

In grade 6 students work with displaying and analyzing numerical data around a single attribute. They recognize that data can be described in terms of it’s central tendency and it’s spread, and use line plots, box plots, and histograms to display the data and make visible these important features. Early in the unit, students are asked to recall these terms as ways to display single variable data sets.

What is different in 8th grade is the addition of a second set of data that may or may not be associated in a predicable way with the first set. When the data is numerical, we can use a scatter plot to see if there is a predictable pattern or association between set 1 and set 2. If the data is categorical, we can use the two way frequency table and relative frequency to determine whether there is an association we can use to make predictions.

**Numerical Data and Associations**

Using numerical data on a scatter plot we can ask does the daily high temperature have any connection to the number of snow cones sold on that day? We might say:

- As temperature increases, the number of snow cones sold increases.

We might describe the association using more precise mathematical vocabulary:

- There is a positive linear association between temperature and number of snow cones sold.

Or we might describe the association using the equation of a line of best fit:

- The relationship between temperature, T, and cones sold, c, can be modeled using the equation
*T*= 2*c*-70.

Or potentially just in terms of the slope of the line of best fit:

- Two additional cones are sold for each 1 degree increase in temperature.

In all cases we are implying that there is an association between that two sets that allows us, knowing something about 1 set, to make a prediction about the other set.

(So far, nothing dramatically different from what I expected, except perhaps the opportunity to describe the relations ship in terms of just the slope of the line of best fit.)

Below are some additional practice and review problems we made to focus on this part of the unit. Questions circled in red emphasized using the slope (and units) to describe the association. Our first time working through the unit test we were worried that students might struggle with that. After working through the unit I see this idea comes up a fair amount, but I do like the chance to re-emphasize the meaning of slope of a line in terms of units.

**Graphs That Don’t Begin At (0,0)**

Sometimes it is just not convenient for a graph to begin at (0,0). A graph where the x axis is labeled with the year is a perfect example.

Watch for situations in the unit where this comes up. The first is in the lesson 2 summary, then twice in lesson 3 and 5 times in lesson 4. This same graph and context from lesson 2’s summary comes up in the lesson 3 and lesson 4 summaries. The fact the graph is familiar when you summarize 3 and 4 lets students focus on the mathematics from that day. Don’t skip the Lesson 2 graph – be sure to deal with the “not (0,0)” issue when it first arises.

Below part of our review problem 6 is shown, where we took some time to focus on the fact that not all graphs begin at (0,0). (see green circle). There is a spot on the test where the line of best fit exits the left side of the graph above the x axis, but the y intercept is actually negative. We added this problem to create a discussion about y-intercepts that don’t show on the piece of graph that is given.

**Categorical Data and Associations**

Lessons 9 and 10 focus on categorical data. My teaching in the past has focused primarily on displaying this data using bar graphs and two way frequency tables, but in this curriculum, the question is consistent throughout the unit: Is there an association between the two sets of data? In categorical data, this can be interpreted: If we know a subjects answer to question A, can we predict their answer to question B? How reliable is that prediction?

Lesson 9 begins with a chance for students to notice such and association. Students who do not play sports are more likely to watch a fair amount of TV.

9.2 introduces students to 3 ways that data might be displayed: two way frequency tables, bar graphs, and segmented bar graphs. There is a short card sort included in 9.2 to let student practice matching data sets that are displayed differently. In activity 9.3, students learn to find relative frequencies from data in a two way frequency table. they are asked to make conclusion about whether an association exists based on these relative frequencies.

The next day, they will actually create segmented bar graphs with these relative frequencies and most students will find the visual helpful in determining the association. If the percents are very close to the same in each segmented bar, there is no association to help make a prediction about one thing given the other.

Here are a few practice and review problems we made for this section of the unit:

The questions circled in blue really ask the same thing in two different ways, and are meant to create a discussion around what it means for their to be an association between the data. Several of the other questions offer chances to continue this conversation.

Links to questions pictured in this post: review as pdf