Mastery grading (#1112)

Topics/tags: Teaching, somewhat rambly, unedited

This fall has brought a host of changes to how I teach. Some parts you probably expect. I now teach remote online classes as opposed to in-person classes [1]. I’m also teaching in Grinnell’s strange new seven-week terms [2,3] rather than in our traditional semesters, which means that work is accelerated and that I’ve had to cut some topics.

I’m also making other changes. Among those changes is a significant change to the way I test and assess students. Traditionally, I’ve given long, complicated, take-home exams that force students to learn new material. Most students don’t like them at the time they take them, but many report retrospectively that they were good learning experiences [4].

This term, I’m adopting a form of grading one of my colleagues uses that I’ve been calling mastery grading, which is the term I think my colleague uses. Here’s how I think of mastery grading: I identify easily measurable course goals. Throughout the semester [5], I give students sets of short problems, with each problem focused on one of the course goals. We call these problems learning assessments or LAs for short [7]. I grade each LA on a simple credit/no-credit scale. Students get credit for the problem if their work on the problem shows they’ve mastered the associated topic. Students get no credit if they don’t show that mastery. There is no partial credit.

It sounds harsh, doesn’t it? But that’s because I’ve left out an important aspect: If a student does not get credit on an LA at one point in the semester, they’ll have additional chances to demonstrate mastery. For the topics on the first set of learning assessments [8], students will be able to try another problem on the same topic on the second SoLA, the third, the fourth, and the optional fifth. I expect that most students won’t need five tries [10], but the option is there if they need it. For the second set of topics, they’ll have up to four tries. For the third set, they’ll have up to three tries, and for the last, they’ll have up to two tries.

That is, it’s not important when in the semester [11] a student masters a topic, it’s just important that they master it [12]. And students need not master every topic. To earn an A in the class, students must demonstrate mastery of twenty-six of the twenty-eight topics. They must also do some other things, such as show good work on homework assignments, but mastery of topics is a key factor. To earn a B in the class, they need only demonstrate mastery of twenty-four [14]. And to earn a C, they need only master twenty-one of the topics.

I’ve also extended this try again policy to other aspects of the course: Students can re-take quizzes (once) and redo the weekly mini-projects to improve their grades on those projects. I don’t generally permit students to redo lab writeups or reading responses, but those are fairly straightforward, and students are not required to complete all of those.

Traditionally, I’ve had bad experiences allowing students to redo work. Redos often mean that I can’t provide sample solutions. In CSC-207, where I’ve used a different form of mastery grading on exams [15], students used to spend a lot of time redoing problems only to find that it made no difference to their grade [16]. So I stopped permitting them.

Redos add to the burden of the class. I go over individual LAs with students who miss more than one or two on a SoLA. My graders and I put a lot of comments on each mini-project so that the next version can be excellent. And I have to write twice as many quizzes and a whole bunch of learning assessments.

This weekend, I was scheduled to write twenty-seven learning assessments in preparation for the third SoLA: six from the first group that some students missed on SoLA 1 and SoLA 2 [17], seven from the second group that students missed on SoLA 2 [18], seven from the third SoLA, and seven sample problems to help students prepare for the third SoLA. I ended up writing thirty-one or thirty-two. Why? Because I wrote extra sample problems on some topics, including vectors and list recursion. Writing that many problems took me, well, all day. I started at about 8:00 a.m. I finished writing all but two problems at 5:30 p.m. I took a break to make lunch. I spent a bit of time responding to email. But most of the day was writing problems, checking answers, and such.

I’m not thrilled at having to spend that much time writing LAs [20]. But looking at how the LAs work in practice, I think it’s probably worth it. Students are less upset when they have difficulty with something; they know they’ll have another option. When I talk through LAs with students, I hear a lot of Oh, I understand it now. I’ll get it the next time through. I don’t have to deal with arguments about partial credit. As long as I can explain why a particular answer fails to demonstrate mastery, and help them understand what would show mastery, most students are okay with trying again [21]. When a student forgets to complete a problem [22], they accept that the policy is try again next time [23].

And with small, focused, problems, I find that I grade comparatively quickly, particularly since I’m doing everything on Gradescope, which makes it easy for me to re-use comments I’ve made. I’ll muse more about Gradescope, including it’s too many flaws, on another day. Of course, this is the first time I’ve had twenty-six different LAs, so I may find that I’m wrong, but many LAs have only a few students taking them [24].

There’s another positive, or at least I think there will be another positive. At the end of the semester, I can be relatively confident that a student who got credit for, say, twenty-six LAs has mastered (or at least understood) a lot of material. While I’ll still want to see that mastery demonstrated in other ways, I won’t need to second-guess my grading. On the other hand, if a student has missed a lot of LAs, I shouldn’t feel too much frustration about not giving them a high grade [25].

And even though writing LAs is taking me a long time, I expect that I will soon have a relatively large library of LAs that I can reuse from time to time. This will also be the only semester in which I write so many sample LAs as I can reuse samples from semester to semester [26].

Of course, not all topics have natural LAs. I did say at the beginning that I do this only for the easily measurable course goals. As those who’ve read my musings for awhile know, I believe that many of the most important goals in a course a not easily measurable. Has the course changed a student’s confidence, their willingness to try something different or to speak up in class, their tolerance for failure, their ability to collaborate with others, their respect for difference? The type of mastery grading I’ve described doesn’t measure these kinds of things [27]. It certainly doesn’t measure the key goals of a Grinnell education. I can’t think of quick assessments that ensure that our students are prepared for the honorable discharge of the duties of life or to use their knowledge and their abilities to serve the common good. I haven’t heard of anyone who has a way to measure those things, either.

That worries me a bit. Not that we can’t measure those things. But rather that I’ll end up focusing more on the things that I can measure instead of the things that I consider most important. That is, while I care that my students can, say, think recursively or design careful tests, I care more that they develop as human beings. Nonetheless, the world expects me to grade on the former, rather than the latter, so perhaps it’s okay.

Have I drunk the Kool Aid?

At times, it feels like I have. In musing about the topic, I’ve been asking myself how this form of grading would work in less technical disciplines. And I’ve found myself saying things like Well, one of your learning goals might be ’Can write an appropriately thoughtful and complex thesis. If they don’t do it on the first essay, they can try again on the second, or third, or fourth. Did they form a cohesive argument? Did they transition appropriately? Did they make use of materials in the form appropriate for a scholar of the discipline? All of those could be yes/no questions. Rather than giving grades on essays, we could use checklists, along with some free-form commentary to help them improve in general. Perhaps I’ll try that strategy if I ever get to teach Tutorial again [28]. I’ve already given up my long-standing tradition of having more comments on Tutorial papers than students have text [29]. Checklist grading may be the inevitable next step.

As long as I’ve drunk the Kool Aid, I might as well take advantage of it. The next time I’m asked whether I assess my course, and how the assessments tie to the learning goals of the course, I can point to the LAs and give data about the LAs.

On the other hand, I’m still not positive how the measurable course goals tie to the department’s goals [30], nor how those tie to the broader institutional goals. Perhaps I should ask the Dean’s office how they measure those more important characteristics.

My muse informs me that before I wrap up, I must return to the question of equity. Why is mastery grading equitable? From my perspective, it has to do with the chance to try again with no penalty. Students come in with different backgrounds, different experiences with computing, different experiences with math. Some find the first few weeks easy. Some find it complex. But it feels like they are much closer toward the end of the semester, at least if they keep trying. Some of my best end-of-semester students have never programmed before. This kind of strategy lets them earn the grades they deserve, and do so without too much stress (or so I hope).

That’s not to say that it doesn’t have its flaws. Students who take multiple assessments on the same topic have more work to do. But the assessments are short (under twenty minutes), so it’s not an insurmountable difference [31]. I suppose I could achieve similar goals by only having end-of-semester assessments. However, I expect that strategy would fail miserably. In my experience, students learn best when they have regular encouragements to learn and try and in which they receive some kind of feedback.

What should you take from all of this? [32] What should I take from all of this? I like this technique of grading and assessing students because it seems more equitable, because it seems to decrease stress on students, and because it seems to give me better evidence of student learning. I don’t like this technique because it adds to my workload [33], because I feel like I’m failing to measure the things I care about most, and because I’m not really sure that it’s more equitable.

Further exploration is warranted.

[1] I have taught remote classes before, but in-person is my primary mode of teaching.

[2] A term is a seven-week block for a class.

[3] The strange new term I prefer for terms is semisemester.

[4] There’s also a rush of dopamine and a sense of accomplishment when you finish one of the harder problems.

[5] Term [6].

[6] Semisemester.

[7] As a lifelong Celtics fan, I dislike the LA acronym. Maybe I’ll start calling them AoLs, for Assessments of Learning.

[8] SoLA, a term I like because there are many awesome associated meanings of the word sola. In Finnish, it means mountain pass, which is a good metaphor for what the SoLAs are doing: Helping students navigate a complex terrain of knowledge. In Esperanto, it means alone, which reinforces how students do their SoLAs [9]. Sola plans are used to make pith helmets, which protect your. Mastery grading is a way to protect students. Stuff like that.

[9] Most work in CSC 151 is collaborative. LAs are not.

[10] Or least I hope most students won’t need five tries.

[11] Term.

[12] As long as they master it some time during the term.

[14] And do some other work, too.

[15] CSC-207 exams traditionally involve four complex problems. If you get all four completely correct, you earn an A. If you get three completely correct, you earn a B. If you get two completely correct, you earn a C. If you get one completely correct, you earn a D.

[16] In general, students did not test enough edge cases, which meant that their code would not pass my tests. Or maybe they did test the edge cases, and found they could not get through them.

[17] In one case, only one student missed it on both SoLA 1 and SoLA 2. But I still have to write the problem.

[18] Yes, that’s right. At least one student missed each problem on the second SoLA. The problems students missed varied. Surprisingly, the recursion problems were the ones that many students did the best on. Perhaps that’s because I have high standards for testing and documentation [19].

[19] Students who took CSC-151 will be surprised to learn that we’ve dropped the six P’s. The consensus was that the preconditions and postconditions added too much burden to most students.

[20] Now that I think of it, LA is a great acronym for them. It’s something that most people dislike. (See the Celtics comment above.) Or perhaps that’s not a good way to think of them. People should enjoy LAs; it’s a chance to demonstrate mastery.

[21] And, rarely, again.

[22] Or a whole SoLA, as happened to one person.

[23] Students seem to realize that I’ll go over skipped problems, too.

[24] Perhaps I’ll report back on the grading experience in a future musing.

[25] Somewhere in the back of my head, my mother is saying Different students might show mastery in different ways. I’ll need to think about that again.

[26] I may change some of the course goals, which will entail new sample LAs. But I don’t usually change the goals much.

[27] There may be some assessments for growth mindset. But is it the student’s failure or mine if the course does not develop a growth mindset? That’s hard to tell.

[28] I’d like to say when I teach Tutorial again, but as I look at the next few years, I don’t see myself getting to teach Tutorial again any time soon.

[29] Yes, I fall into the school of You show care for a student by putting a huge amount of comments on their work. But I’ve learned that the level of comments I prefer intimidates students rather than helps them.

[30] Those goals need to be rewritten.

[31] Admittedly, twenty minutes here and there can be a huge burden to some and almost nothing to others.

[32] How much it adds is yet to be determined. That may have to wait until we’re back on a semester schedule and I’ve used the technique a few times.

[33] TL;DRs go at the end, right?

Version 1.0 of 2020-12-05.

The opinions stated herein are those of Samuel A. Rebelsky and do not necessarily reflect those of Grinnell College, Grinnell's Computer Science Department, the Rebelsky family, CMD-IT, SIGCAS, SIGCSE, any other organizations I am or have been affiliated with, or even most other sentient beings.

Check accessibility with WAVE.

SamR's Assorted Musings and Rants: Mastery grading (#1112) by Samuel A. Rebelsky is licensed under a Creative Commons Attribution 4.0 International License.

This Web site was built using Markdown, some custom scripts, Twitter Bootstrap, and the Bootswatch Readable Theme.