Skip to main content

Coarse-grained grading

On exams in my introductory course, I traditionally do what I think of as fine-grained grading; I look at details both small and large and take off as appropriate. That approach is particularly important in an introductory class because students need to work on both small and big things. Seeing notes (and, at times, losing points) for problems with style, or efficiency, or missed edge cases, or whatever helps students remember that what they code the write looks like can be almost as important as what it does.

But there are also times that students don’t need such detailed feedback and it’s not worth my time or theirs for me to give it. Hence, a few years ago [1] I decided to add a different style of grading to some of my examinations, which I call coarse-grained grading.

In the coarse-grained format, an exam [2] will have four or five problems, sometimes with subproblems. I don’t pay attention to minor errors in grading. I may still add a note or two, but I don’t take off points. Instead, I look at the big picture. If the student gets all of the problems mostly correct (except for a few minor issues) and I think that five minutes of discussion would resolve any issues on each problem, they’ ve clearly mastered the material and they earn an A. If they get all but one mostly correct, they earn a B. If the get all but two mostly correct, they earn a C. If they get all but three mostly correct, they earn a D. If they at least attempt the exam and get fewer correct, they earn an F.

If I recall correctly, I started this approach to grading after reflecting on the two very different numeric scales used for grading. Most frequently, we use a 100-point [4] scale for grading, with 90-100 as an A, 80-90 as a B, and so on and so forth. But in computing student grade-point-averages [5], we use a four-point scale. In most cases, there’s not a difference between the two. If we average a C and an A on the percent scale, we get a B [6]. If we average a C and an A on the four-point scale, we get a B [7]. But the two scales have a very different effect when a student gets a zero. In the percent scale, the average of an A and a zero is an F [8]. But in the four-point scale, the average of an A and a zero is a C. For a whole course, I think it makes sense to average in zeroes in the percentage form. But for an exam, that seems overly harsh. So my first coarse-graded exams were, in effect, a four-point scale.

Fairly soon after starting this approach, I found that I sometimes wanted to give five problems, rather than four. The change seemed minor. The bigger change involved t giving half credit for not mostly correct, but not totally horrible. I can’t quite remember exactly why I started that policy, but it seems fairer to students and fits in better with my concept of what grades mean. A student who makes a valiant, but not mostly correct, attempt on each problem probably deserves a C rather than an F. The half-credit solution helps with that.

I find that this model also matches my discipline, or at least my approach to my discipline, fairly well. And no, it’s not just that I work in binary. Think about it. A program is either correct or incorrect. No one wants a program that works correctly 90% of the time [10]. Similarly, a proof is either correct or it is not correct. You do not usually partially prove something. A student might make a slight error in a step in the proof, or be less elegant than I would like, but those issues do not make the answer less than mostly correct [11]. An algorithm either achieves its goal or does not [12].

Most students seem comfortable with this approach, but some challenge their grades. This past semester, students in my 300-level course who submitted code with memory leaks [14] found that they got at most half-credit on a problem. Some found that harsh. I found it generous. Code that leaks memory would be completely rejected in the professional world [15]. That makes the code wrong.

The design of some problems can also lead to some challenges. Suppose, for example, that a problem has four subproblems, and a student gets three of the four mostly correct and one of the four mostly wrong. What score do they get on the problem? Some students expect that they should get a 3/4 or even full credit. But getting one-fourth of the problem wrong suggests to me that they have not completely mastered that material, and therefore do not receive full credit. Since my base scale is you’ve mastered it (mostly correct) or you have not mostly mastered it (anything else), half credit for such a solution could even be considered generous.

While I do get some challenges to my coarse-grained grading, I find that I get fewer challenges than when I do fine-grained grading. Since I no longer take off individual points, Students no longer ask why I took off 3 (or 2 or however many) points for something [16].

But that’s not why I do coarse-grained grading. Coarse-grained grading helps enforce the concept that you’ve either demonstrated that you understand the material or you haven’t [17]. And I think that’s something that is important for students to understand. You know what? I should probably share this musing with students the next time I give a coarse-grained exam [18].

Postscript: I’m trying to figure out how the coarse-grained model relates to competency-based grading. In some ways, they are similar: Students must demonstrate mastery of a concept to pass. However, as I understand it, competency-based grading looks more at levels (e.g., you’ve reached C (or B) level competency on this topic) while I focus more on you know it or you don’t. And I’d probably pass a student who showed that they knew three of the four core topics in a course while struggling with the forth. I think a competency-based grading system would focus on what they did not know. But, hey, I’m not competent in my understanding of competency-based grading.

[1] Okay, probably about a decade ago, but that’s a few years compared to the 25+ years I’ve been teaching.

[2] Most typically, a take-home exam [3] or a multi-hour in-class final exam.

[3] I should probably muse about why I primarily give take-home examinations.

[4] Or percentage.

[5] gpas, to use the TLA.

[6] An A is about 95. A C is about 75. (95+75)/2 = 85. 85 is a B.

[7] An A is 4. A C is 2. (4+2)/2 = 3. 3 is a B.

[8] (95+0)/2 = 47.5, which is less than 60. That makes it an F.

[9] (4+0)/2 = 2, which is a C.

[10] Most people don’t even want a program that works correctly 99.999% of the time.

[11] When things are going well, I address the stylistic issues in grading homework assignments.

[12] I struggle a bit with grading algorithms in this approach. It’s much easier to convince yourself that an incorrect algorithm is correct than it is to convince yourself that an incorrect program is correct. And, even in 300-level classes, most students are still developing the mental toolkit for falsifying algorithms. It may be that these issues with algorithms led to the half-right score; I can’t recall.

[14] A program that leaks memory allocates memory for a task and then does not de-allocate the memory when it is finished. Programmers working in most modern languages trust the language to handle memory management. But I often make my students program in C because I think they need to pay attention to issues like this.

[15] Or should be completely rejected.

[16] This approach also avoids the you took off 2 points on my friend’s exam and 3 points for the same answer on my exam issue. While there is some benefit to explaining why the two answers are not really the same, I rarely find it a good use of my time or the students’.

[17] I was going to say you’ve either mastered the material or you haven’t.

[18] The next coarse-grained exam I have scheduled is the final exam in CSC 151. It’s probably more important for my upper-level classes, and I’m not scheduled to teach an upper-level class with a take-home exam [19] for some time. I hope I’ll remember.

[19] Or any exam, for that matter.

Version 1.0 of 2018-01-11.