Skip to main content

A new CSC 151, revisited

I’ve been working on the new CSC 151 for much of the summer. Today I gave a report to our Data Science workshop [1] on our new Data Science model for CSC 151, our introductory course. I owe a more condensed written version of that presentation to our grants officer. This musing serves as a draft of my report. I may continue tomorrow with a longer attempt to transform my notes [2] into narrative.

In this project, we are adding topics from Data Science topics to CSC 151, Grinnell’s introductory course in Computer Science. The primary focus of CSC 151 will remain computer science, but we will apply CS principles to problems in data science that most draw upon computational thinking, focusing on three big pictures issues: a data science process, ways of thinking about data, and a cyclic use then build process.

Process. Data scientists employ various versions of the scientific method to their work. We will focus on an exploratory data process, one in which the researcher starts with a collection of data (e.g., found on the Internet or previously gathered) and explores the data to see what questions they reveal. After finding or gathering data, one develops a question or questions, writes programs that transform the data into useable forms, represents and stores the data, writes programs that clean the data (e.g., standardizing formats, removing entries with incomplete data), analyzes the data, visualizes the data, and presents any preliminary insights from the exploration. This process is both cyclic and back-and-forth. It is cyclic in that the data scientists will often go back to the beginning after completing the process, returning once again to find and gather more data. It is back-and-forth in that each stage may send the data scientist back to a prior stage or stages. For example, a visualization may lead to another analysis, and an analysis may lead the data scientist to revisit how they cleaned the data.

Because this is a course in computer science and not in statistics, the mechanisms for exploring data will be relatively shallow. We will send students interested in deeper analysis to other courses in Statistics and Data Science.

Thinking about data. Throughout the course, we will emphasize two key aspects of thinking like a data scientist. First, we will encourage students to challenge the data. A good data scientist approaches results skeptically and asks questions like What other tests might I do to confirm the preliminary result I’ve found? What data are missing and might those missing data have affected my results? How might having those data change my results? How were the data gathered?. Second, as noted in the section on Process, we will also encourage students to actively explore data sets, to think creatively about what they might find in a data set.

Use then build. Just as statistics students gain power and understanding by developing and using statistical tests that they first use as built in functions in programs like Excel and Stata, computer science students benefit from learning how to implement that standard functions that they might use to do computational data science. In the first half of the semester, students will use pre-written procedures, such as map, reduce, and filter, as they work with data. In the second half of the semester, students will implement the procedures they used in the first half of the semester.

Finally, the course will retain a mid-size project that students complete in teams toward the end of the semester. The project will require students to identify and explore their own data sets.

We expect that this revised version of CSC 151 will not only continue to help students develop skills in computational thinking, but also expand those skills to reflect key issues and ideas in data science.

[1] Yes, there are those damn workshops again.

[2] I just wrote notes; I did not make a presentation.

[3] Audience members, in the original.

Version 1.0 of 2017-08-07.