# Show me the data

Recently, Council distributed a Powerpoint of some interesting data to faculty on campus and asked faculty to discuss with their departments some implications of the data. Those are reasonable requests. I’m glad the College is asking the faculty to think about data.

As soon as I started looking at the Powerpoint, I found that I had questions. For example, the data were given as percentages, rather than total numbers. While there are times that percentages are useful, I think the underlying counts can also tell an important story. For one set of data for which I had partial information, the underlying counts told a very different story. In particular, while two percentages were clearly decreasing, the corresponding counts were stable.

I also had other questions. The data were given with trend lines that represent about thirty years of data. I’m glad to see thirty years of data, but I’d also like to see what the ten-year trend lines are. The data were shown with annual counts. I’d like to see the data smoothed out a little, averaging over three-year periods [1].

And so I did the logical thing, I asked for the underlying data.

Response number one: Tell us what other graphs you’d like to see. I explained and then indicated that it’s not that I want to see particular graphs; it’s that I best understand data by exploring it in multiple ways, not just by looking at a set of graphs.

Response number two: There will be a presentation on the data some time in the future. You can ask questions then. We have to discuss the data early next week; I haven’t heard about a presentation. And I don’t expect that those presenting will be able to create the graphs I want in the middle of a presentation.

Response number three: Nope, we won’t release the raw data.

I’m trying to figure out why they won’t release the data so that we can be responsible thinkers. Given some of my previous experiences with people on the other side of Park Street, my initial inclination was that they don’t really understand what it means to work with data. So I wrote a parable.

Let’s suppose you’ve asked me to think about themes in Moby-Dick. Instead of the book, you’ve given me a three-page summary written by a staff member.

I know that Melville is careful in his use of language. A paragraph that I found shows that he also makes interesting use of metaphor. So I ask to have access to the original.

You ask What else would you like the staff member to summarize? and tell me Don’t worry, there will be a talk on Melville.

But seeing what others have written or hearing what they say about Melville is not the same as reading Melville.

You wouldn’t ask me to write about Melville with only Cliff Notes. Why do you expect me to think about data without the actual data?

Liberally educated people should be able to read data, just as they should be able to read Melville. While I expect that some of the more quantitative faculty may be more enthusiastic about looking at the data, I also know that my colleagues from across campus can look at data equally well. Why does our administration have so much trouble understanding that?

There more I think about that question, the more I wonder whether there are other reasons they are unwilling to share the data. Are there perhaps some problematic issues that the data might reveal that they don’t want us to discover? I’ll assume not. I just want the data to think carefully about the things that they’ve asked us to think carefully about.

I also know that the data don’t stand by themselves. They are a chance to have further conversations. Where do we seem to see differences? Why do we think they are differences? What might account for them? In the one case in which we’ve received the raw data, I’ve already had some interesting discussions about some of these issues with folks across campus.

[1] Have you figured out by now that I won’t be telling you what kinds of data we have? I think I can still discuss my general concerns without letting you know.

Version 1.0 of 2017-10-26.