Reading CS Principles, Take 1

For the next eight days or so, I’m participating in the reading [1] of the first annual CS Principles AP exam [2]. I’m pretty sure that my contract says that I can’t say anything in detail about the exam or the grading, but that I am allowed to speak generally about the experience [3]. I’m not sure how much I’ll have to write, but we’ll see.

Why did I sign up for AP reading when, well, grading is my least favorite part of the job? In part, because Henry Walker regularly says great things about his experience grading AP exams [4]. In part, because I wanted to learn more about CSP [5]. In part, because I thought it would be interesting to see how you achieve consistency in using a rubric.

Henry has always mentioned the community that AP grading brings; it’s a nice way to bring together high school and CS faculty who are committed to teaching. It has definitely been interesting to talk to and hear the perspective of a variety of different people. I got to set with a group CSP of partners this evening from places like Project Lead the Way, the Harvard CS 50 group, CodeHS, and CSMatters. It was useful to hear their perspectives. I look forward to more conversations with a variety of people over the coming days.

For some reason that I can not yet fathom, I was asked to be a table leader. That means I got to arrive early to learn more about CSP (yay!), how to manage a table of eight readers (yay?), and to get HR training (!yay). Like the readers, we start by learning how to apply the rubric [6].

I’ve never been a huge fan of rubric-based grading, as I don’t always see a direct correlation between quality of work and score from the rubric. I also don’t believe things add up to 100. For example, if someone does awesomely on two out of three categories on a large assignment, they probably deserve an A, even if they only do adequately on the other third.

Nonetheless, I do understand the advantage of rubrics, particularly for things like the AP. In particular, I like that the CSP rubric is available to students far in advance of the exam, which should give them a good understanding of what they should and should not do on various parts of the exam.

Henry had a story about when he used to grade the Calculus AP. If I call correctly, when they started out in training, two experienced faculty using the same rubric could give the same exam a grade anywhere between three and nine out of twelve. Why? One reader might say Their solution is good, except for this one, comparatively minor, mistake. Another might say If they make that mistake, they really don’t understand the concept. But after some examples followed by careful discussion of the meaning of all the entries on the rubric, people start scoring the same [7].

Learning how to apply a rubric consistently has been, well, interesting. I’m starting to see the same consensus develop among the group of sixteen or so table leaders of which I am a part. We’re thinking more carefully about what is and is not appropriate for each item in the rubric and we’re learning to apply the rubric in the way that the leaders want us to apply it [8]. By the time we’re done with practice, we’ll almost certainly be on the same page. And then we’ll get to help train another one-hundred and twenty-eight readers. Fortunately, we’ll have heard discussions about most of the questions on the training set, which means we’ll have the knowledge to explain some decisions that may puzzle the readers as much as they puzzled us [9].

The AP Folks have a lot of experience with this kind and scale of grading, so it’s also interesting to see what structures are in place to keep readers consistent. Once they finish training, they will start each day with a set of sample exams to make sure they remember how to apply the rubric. Throughout the day, they will receive a few common exams that let us see whether people are grading consistently and, if not, show us where people vary. Each table leader will also recheck randomly selected submissions from each person at their table.

But what about the concept of spending a week grading? We’ll see. It feels like it will be less painful [10] than my normal grading, and not just because I’m a table leader. Applying a rubric is very different than providing detailed feedback to a student. The latter is more valuable, but requires much more effort. I spend that effort when I grade my students. We don’t provide feedback for the AP students. It’s also kind of fun to be part of a large grading endeavor: We will be reading 47,000 exams over the next week [11].

We also have a few evening activities planned. We’ll get a lecture on accessibility from the legendary Richard Ladner. We’ll hear about Cloud computing from Laurie White at Google. We might just hang and chat some nights. They also offered to find time to train us on the other rubric one day or night. I also have some on my own plans. For example, I’m planning to head to the art museums on some evening when they are open [16]. I may also head to Half-Price books [17].

Will I critique the sample solutions to the exam Create section from the sample solution pages? That’s to be seen. But you can have fun on your own You can see both sample solutions and the rubric at http://apcentral.collegeboard.com/apc/public/exam/exam_information/231726.html. If you’d like a challenge, apply the Create rubric to this solution.

[1] aka grading

[2] There was a prototype last year. And we’re calling this year year zero. So I’m using first as a compromise.

[3] If I’m wrong and they decide to fire me, I’ll get home early.

[4] CS A, the now-discontinued CS AB, and, at one time, Calculus.

[5] Shorthand for CS Principles.

[6] I say the rubric. There are actually two rubrics for CSP, one for the explore task and one for the create task. Readers and table leaders work on one or the other.

[7] More or less.

[8] Not necessarily how we would prefer to apply it, but in a way that is consistent across readers.

[9] There was an even smaller group that got here a few days before us and developed the first set of examples for us to examine.

[10] Or differently painful.

[11] 47,000 exams. There are 128 readers for the Explore task [12]. Each will be grading 375 exams. The readers have approximately one day of training and six days of grading. That’s 62.5 exams per day [14].
We’ll assume that readers work seven hours per day [15]. That’s about nine exams per hour. I guess that doesn’t sound too bad. I’ll time myself tomorrow.

[12] There are more for the Create task since evidence suggests that the Create task takes longer to grade.

[14] Hmmm … that’s about the number of CSC 151 exams I’ll be grading each time this fall. And my exams have more than one problem.

[15] 8am to 5pm, one hour for lunch break, two fifteen-minute snack breaks, and a bit of administrative time.

[16] Let’s see … the Nelson-Atkins Museum of Art is open until 9 p.m. on Thursdays and Fridays and the Kemper Museum of Contemporary Art is also open until 9 p.m. on Thursdays and Fridays. I really enjoyed the Kemper the last time I came to town and should go back. But it’s also nice to visit a new museum..

[17] Sorry Micki!

Version 1.1 of 2017-06-14. (Version 1.0 released on 2017-06-09.)

The opinions stated herein are those of Samuel A. Rebelsky and do not necessarily reflect those of Grinnell College, Grinnell's Computer Science Department, the Rebelsky family, CMD-IT, SIGCAS, SIGCSE, any other organizations I am or have been affiliated with, or even most other sentient beings.

Check accessibility with WAVE.

SamR's Assorted Musings and Rants: Reading CS Principles, Take 1 by Samuel A. Rebelsky is licensed under a Creative Commons Attribution 4.0 International License.

This Web site was built using Markdown, some custom scripts, Twitter Bootstrap, and the Bootswatch Readable Theme.