zemlan.in

On Code Reviews

The code is done, now to the hard part

Maybe, I’m the only one, but I’m annoyed by jokes like this:

10 lines of code = 10 issues.

500 lines of code = "looks fine."

Code reviews.
I Am Devloper on Twitter

I mean, the joke by itself is fine — i-know-that-feel.png, #relatable, all that. But it contains several examples of a bad behavior both of code authors and of code reviewers:

So, if code review problems are so widespread, then we should not just point at them, but also try to address them somehow. One could do that either via external requirements/limitations or via internal motivation

To require “be better” is dumb. Automatically rejecting code review because of metrics — not an option, because “understandability” can’t be measured by a script. So we’ll disregard “improve code reviews via external requirements/limitations”

One could influence another’s internal motivation by an example, but after years of attempts, this seems too slow and not that reliable. Maybe, a personal example can be better communicated in a post…


Aside: this post was originally written as a series of posts in Ukrainian in the beginning of 2021. Many things have changed since then (like I'm currently working in a smaller company and a team than in 2021), but the general message and advice still seems relevant (albeit somewhat meandering)

Also, to make writing process easier, this post pretends as if Git and Github are the only ways to collaborate while writing code. If you’re lucky enough to use Fossil or Phabricator or Gitea or whatever… Okay…


What Are Code Reviews For?

Time and Place

I’ve only worked in rather mature product companies. When a company works on a products for years, the time and effort investment into code review is justified not only because “that’s a correct thing to do”, but also because it’ll make future support easier

Really young companies, who might never get to that “future”, are taking a risk that, after all those reviews, they’ll remain with a “correct” but useless code

I don’t know how it is to work at an agency/outsource/outstaff. On one hand, if a client will return for a follow-up work, previous code reviews should help to remember the codebase faster. On the other, client might never return

So, code reviews might be completely useless (or even harmful) on your current workplace. Or they were on your new colleague’s previous workplace…

Motivation

Okay, let’s assume you’re working in a relevant team, at a relevant time, and you really want to get better at writing and commenting pull requests. To know the direction of improvements, you’ll need to understand what’s the goal of code reviews

So, what’s the goal of code reviews?

I was told that’s what we do here

“Due to historical reasons” is a bad path to improvements. It’ll lead you to where you’ve started. There even won’t be a story to tell (unless you like stories where nothing happens)

So that idiots won’t break anything in my perfect project

Those “idiots” have opened a PR because they need to fix or to improve something. If your perfect project is holding on together only thanks to your gatekeeping, you should invest some time into writing tests and setting up linters

To catch what’s difficult or impossible to cover with tests

Warmer… But “opinions leaders” will soon get tired of staring at hundreds and thousands of line of code, looking for the same problems that trip colleagues unfamiliar with the repo

At the same time, novices of the repo (the repo specifically — they might have a lot of experience outside of it) might not know that {placeholder} will create problems, while experts have encountered {placeholder} so many times that they instinctively avoid it

So, the goal of code reviews is to let authors and reviewers to share their understanding of the project with each other:

In other words, you need two to tango have a good code review

Opening a Pull Request

The Repo With A Thousand Faces

When writing a pull request (PR), it’s useful to be guided by the idea of “invest as much time and/or effort as you want others will invest into reviewing”

If the author won’t invest time into gathering everything in a nice package, then reviewers would need to do that themselves. Even if reviewers know about a project’s quirk, they might not remember about it when reading the PR, because their mind is busy with connecting the dots that author decided not to connect

Github Stories

If the goal of code reviews is to share and get others’ thoughts and opinions, first your colleagues need to understand what’s happening. Knowing that, you might arrive at a bad and a good conclusion:

To make it easier for colleagues to understand a PR and to share their relevant experience, they need to know “how it was”, “what’s changing”, “why it has to change”, “how it changes”, and “what it becomes”. Changes in the code itself tells about some of these

For example, the left/red part of a diff tells us “how it was”, while the right/green one — “what it becomes”. The rest of the questions we’ll need to address in prose

If that prose is a collection of dry facts, then reviewers would have to connect them themselves. To make it easier, PR’s author can unite these facts into a transformation story in which PR’s repo is the main character

We understand stories and can “extract” their gist, because we’re listening/reading/watching them our whole lives. Plus, our brain perceives code as a puzzle, so we might accidentally ignore consequences of a change — a puzzle is solved, the end, nothing will happen to it… Because of that, it makes sense to tell stories while writing a PR. You’ll have ample opportunity to so do1

Words Mean Things

You can start writing the tale long before pressing the “New pull request” button — in the code itself. Even without comments, you can describe “the main cast” in the names of files, functions, and variables. A good name of a thing will describe what and why it is, will show its connection with other parts of the code, and will help with discussing it

That’s why it is important to avoid generic names (like utils/data.js), same names for unrelated things, and to pay attention to existing names — if feature is already called wunderwaffle, don’t begin suddenly calling it terrifictiramisu

The name of the branch will help you to focus the PR and to avoid distracting both you and reviewers with “extras”. For example, if you’re in the branch named JD-48/speed-up-uploads, it’ll be easier to stop yourself from updating all dependencies of the JD project or from changing uploads in the ER project

The Medium is the (Commit) Message

If you’ve messed up and the branch already includes too many unrelated changes (which would still be helpful in this or another project), you can play with the content of the commits. Instead of including all of the file’s changes in the same commit, nothing stops you from adding changes line-by-line. Fans of CLI can do it with git add --patch or git add --interactive; personally I got used to Tower’s GUI for that

How to understand which lines belong to one commit, and which — to another? Think about where these lines happen in the PR’s “story” and describe that in the title of the commit, while limiting yourself to git-recommended 50 symbols2. Maybe, that story is a short anecdote and fix A when B will be enough. Maybe, it has multiple “characters” and you should first introduce them to the reader (add X to do Y), and then to each other (use X in Z)

Describe your changes in imperative mood, e.g. «make xyzzy do frotz» instead of «[This patch] makes xyzzy do frotz» or «[I] changed xyzzy to do frotz», as if you are giving orders to the codebase to change its behavior. Try to make sure your explanation can be understood without external resources. Instead of giving a URL to a mailing list archive, summarize the relevant points of the discussion.

SubmittingPatches - The core git plumbing

Additionally, many projects have set up an integration between Git and their ticket manager, so that commits/branches linked to tickets and vice versa. Because of that, it usually makes sense to mention the ticket number in the title of the commit, for example, #JD-48 cache responses from X. Format (XX-00, $XX-00, #XX-00) depends on your project, but any prefix before numbers is enough for Github to display ticket links in its UI

Many projects also follow conventional commit and use tags like feat and chore. I personally think these tags are quite useless for a product without public CHANGELOG, but the habit of categorizing change might help with keeping PRs and commits focused

Retelling the Story

If you’ve confused the order of the commits, you can change it with git rebase --interactive which will make it easy to reorder/rewrite/combine/delete altogether commits by editing a text file with a list of operations and commits

For example, you’ve been working in the JD-48 branch, where you’ve made five commits. Local commit history currently looks like this (git log --graph --oneline --all, newer commits at the top) :

* f4593f9 (HEAD -> JD-48) add tests for X-Y integration
* 04d0fda optimize Y
* ba46169 integrate X with Y
* fa1afe1 tests for X
* 5928aea setup X
* 300a500 (master, origin/master, origin/HEAD) …

git rebase --interactive "5928aea^" will open these lines in your text editor3

pick 5928aea setup X
pick fa1afe1 tests for X
pick ba46169 integrate X with Y
pick 04d0fda optimize Y
pick f4593f9 add tests for X-Y integration

Each line is an operation to do when creating a new version of the branch. By default, rebase takes and applies every commit (pick). Additionally, you can skip a commit (drop), combine it with the earlier one (squash), apply it with an edit (fixup); you can reorder commits (by reordering lines in the file) and execute arbitrary shell-commands (exec)

For example, if you want to:

you’ll need to edit the “script” generated by git rebase --interactive to be that:

pick 5928aea setup X
pick ba46169 integrate X with Y
drop 04d0fda optimize Y
pick fa1afe1 tests for X
squash f4593f9 add tests for X-Y integration

Or you can stop boasting and just use your mouse to rewrite the history, for example in aforementioned Tower

After the branch and the commits are polished, it’s time to work on the title and the description of the PR. They are written, more or less, similarly to commits, but with more of a bird view

What to Write in Pull Request’s Description?

Usually, the most difficult part of the description for me is to begin. Blank page <textarea> blocks a lot of people

Some repos have PULL_REQUEST_TEMPLATE.md, which can help with writer’s block, but private projects frequently either lack it or it is rather useless as a prompt:

# Summary
...

---

- [ ] <!-- some-continuous-integration-checkbox --> ci-bot, do something

Titles and descriptions of commits also might help with the blank <textarea>. If they are not enough, begin with “So, there’s noun and it is verb” and extend this intro with context

That context might include:

Speaking of links… Ten minutes learning Markdown4 will help you with readability/writability of the description. For example, compare how different link styles read:

There’s [`<input type="color">`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/color),
but we [can’t use it](https://caniuse.com/#feat=input-color) yet in all of
[our supported browsers](https://kb.example.com/supported-browsers)
There’s `<input type="color">`, but we can’t use it yet in all of our supported browsers

Links:
- https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/color
- https://caniuse.com/#feat=input-color
- https://kb.example.com/supported-browsers
There’s [`<input type="color">`][mdn], but we [can’t use it][caniuse] yet in all of [our supported browsers][kb]

[mdn]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/color
[caniuse]: https://caniuse.com/#feat=input-color
[kb]: https://kb.example.com/supported-browsers

Pull Request Assignees

When all the prose is written, it needs an audience. And you should know your audience

You are Reviewer Zero

Author of the PR is the reviewer zero. Of course, nobody stops you from switching to another task immediately after you’ve passed the baton to your colleagues, but reading familiar code in another environment (in code review UI instead of the code editor one) or at another time (when all the “obviously” are forgotten) will help you notice an mistake or some weird thing on your own

I sometimes do this as well – after publishing my PR, I review it myself (on GitHub, you can comment on your own PR, but without approving it). This allows me to add comments/context specifically to the code that it's related to

— a good comment to the previous section

It’s especially comfortable to do in a draft pull request, which already kinda exists and has CI checks ran against, but hasn’t announced itself to code owners

Who to Assign to

In small projects with a tight team, the question of “who to assign review to” answers itself — you either assign it to everyone (because there are not many of you), or you know the person responsible for this or that part of the codebase and assign it to them

In more… dynamic projects and teams, due to scale and worse communication, answering that question is more difficult. Often, it’s solved by including either whole teams or “opinion leaders” (team/tech leads, senior engineers, etc.) in the code owners list. But either option creates its own problems

Teams

Assigning a team risks making every team member think that reviewing your pull request is someone else’s task. Having a rotating “on call” person doesn’t save you — for example, if the pull request was opened Friday evening, when I was on call, Monday morning I’ll ignore it thinking that it’s now Julia’s problem because she’s on call today. At the same time, Julia might just as justifiably ignore it herself — pull request is from Friday, and on Friday she wasn’t on call

Plus, new team members might be too shy to comment — what if they won’t notice something or will ask a “stupid” question. If the PR is assigned to the whole team, they can just wait one out

Assigning to a team might also have the reverse effect, when everyone is trying to quickly write a comment without going into details5. In theory, pull request’s author could spend time and effort to answer every single passerby, but it’s hard, exhausting, and might lead to conversations with management if it thinks that “your task is to solve problems with the code, not to mentor junior developers in other teams”

Opinion leaders

Having a dedicated person who reads through every pull request has its positives, of course, stuff like more consistent style and higher chance to notice a conflict with project’s implicit assumptions

But it’s absurdly to expect (or require) one person to thoughtfully go through everything at the same speed, with which the rest of the team writes code. It’s impossible to get all three in the “quantity/speed/quality” triangle. You’re lucky even if you’ll manage to get two…

I don’t know who to decidedly solve issues of either/both approaches. But I’m sure that:

Code review & response

So, your colleague has written and described the code, you’ve gotten a notification about her pull request. What’s next?

To Listen

Every pull request has reasons it exists. Aside from obvious external “fix a bug”, “add a feature”6, and “update dependencies”, the reason might be internal, for example:

Reviewers might disagree with pull request’s reasons or how they were handled, but for a good code review, they need to know them and to align them with project’s goals if necessary. So you should think about what reasons guided the author. For external reasons, you can look at the ticket description or background conversations. Internal ones can be deciphered from commit history or pull request’s description

If author didn’t leave you with much breadcrumbs, you can help yourself by filtering files, looking author’s past changes, doing a local checkout of the branch, attempting to rewrite “a strange solution” with all your “this shouldn’t be like that”…

Or, as the laaaaaast resort, you can just ask them

To Speak

Just like every change in the pull request, its every comment has its reasons. For example, comments can be written…

To Learn

The closest to the main goal of code review “genre” of comments is “why is it written precisely like that?”, but it isn’t used often enough. Even if all participants already know the reason, but it wasn’t mentioned explicitly anywhere, the need to formulate the answer to the question will spotlight accidental excessive complexity

If they think they know the reason, then explicitly writing your point of view might highlight when that “know” was only an assumption. Assumptions lead to bugs, so sometimes it’s better to clarify even something “obvious”. If that “obvious” was already written down somewhere, it’s completely okay to get a link to a comment that has the answer or to a commit with an explanation

Sadly, this genre is also the most dangerous. After reading “why is it written precisely like that?” in a bad mood, I might wind myself up thinking “how can they doubt my work?” and/or “seem like here, I can’t do this precisely like that, so I’ll rewrite everything ASAP”. Because of that, it’s important to let pull request’s author know that the question doesn’t have any hidden meaning

It’s really difficult to transmit the tone/intonation in the text alone, so, to make sure the text reads closer to how it sounds in your head, you’ll need to somehow compensate that lack of tone/intonation. For example, by rephrasing the question, emphasising with italics, just writing “I’m sorry, I’m not implying anything here”, or by adding a drop of emotions with emoji

Sarah's
Scribbles
Sarah's Scribbles
Tumblr Blog

To Correct

A much more popular genre of pull request comments is “that’s wrong”. Many people think that these comments are the point of reviewing the code. Agreed, an additional pair of eyeballs will help discovering problems sooner, but before critiquing the changes it’s important to first understand the reasons behind them. Without that, both sides of code review will make each other’s mood worse — either with bad advice or with surface-level changes to resolve that annoying comment

Chesterton's Fence:

"If you don't see the use of it, I certainly won't let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it."
Lars "Sweet Leaf" Doucet on Twitter

Even if you know, what was the reason behind this change, you don’t have to ask to fix every single line in which you’ve noticed a problem

Firstly, it might not be an actual problem, so asking “why?” would work better. Secondly, one remembers their own experience better than someone else’s, so if a problem isn’t critical and/or will quickly show itself in runtime and/or simple to hide from users, ten minutes of panic would teach code author a lot. For example, that’s how young me learned to hide new features behind feature toggles and to invalidate caches

When problem is actually worth a comment, it’s better to leave similar comments all at once, to lessen the time until the whole pull request is merged/closed. In other words, prefer “comment-comment-fix-fix” over “comment-fix-comment-fix”

In addition to pointing at a problem, reviewer can also suggest a solution, be it just text, pseudocode, ```suggestion```, or something else

If you have a good relationship with pull request’s author and you are secure about each other’s skills, you could checkout pull request’s branch and make suggestions in a form of commits. Where to push those commits — to the new branch (with a PR to a PR7) or to the same branch — depends on original branch’s author. This path, just like “let them make mistakes”, is risky, but it does exist

Returning to the lack of tone in text… If a person might misread an innocent “why?”, then “I’ve written code for your pull request” or bright red “@username requested changes” will be taken personally sooner or later. So you need to intentionally soften them, or, at least, remind everyone that this is critique of the code, not of the person

But even “@username requested changes” is better than silence

To Format

What you shouldn’t comment are semicolons, indents, and other formatting. If formatting is really important, then, instead of writing a “we put spaces inside the curly braces”, spend your time on setting up simple automatic formatter8 and on running linter on CI9

Without those, reviewer has to work to keep their focus on the pull request’s gist, instead of getting districted by surface-level and usually useless for a user things like “sort imports alphabetically”

To Bear in Mind

Reviewers’ comments don’t have to address exclusively pull request’s author, but other colleagues too

For example, to bring attention to difficulties when integrating with some other system’s API. Or to more-than-appropriate amount of manual/boilerplate changes where refactoring/types/automation would simplify and shorten future pull requests. Or to similarities between the new code and code in another project (with a possibility of code share)

Immediately addressing those comments, in the same pull request, would probably over-blow its scope, but, after a few of such reminders, it’ll be easier to remember about them during a future spring planning

To Encourage

If opening a pull request only ever leads to neutral and relatively negative comments, then of course people would avoid code review — “I’ve tried so hard, coded, worked around roadblocks, just to get so close to the finish line and get drowned with ‘that’s strange’, ‘that’s wrong’, ‘that’s bad’…”

To counterbalance all that, it’s useful to, at least occasionally, praise good changes, whether it’s a particularly clean solution, a good idea, or a banal “thank you, it’s been annoying me too, but I couldn’t get myself to address it”


Without comments, code review is just a rubberstamping. Only with “that’s wrong, fix that” comments — harmful gatekeeping. I hope that this post showed that with some care, both side of code review can be a little bit nicer and useful

Thank you for your attention


  1. I want to emphasis that everything here are opportunities to make pull requests better, not requirements for them. If personal communication and a few words in title work for your PR, nobody insists on you spending hours to write an epic about a three lines change 🙏 ↩︎

  2. When 50 symbols is not enough, you could and should extend the title with additional description (by adding a line break in git CLI or by using a dedicated text input in a GUI), whether it was written manually or generated from a template (thanks to Mark for the advice) ↩︎

  3. If vi is not your preferred text editor, you can change $EDITOR in your ~/.bashrc/~/.zshrc/~/.config/fish/config.fish to launch, for example, Sublime Text (subl -w) or any other text editor ↩︎

  4. Whether it’s Gruber’s “original”, more detailed GitHub Flavored Markdown/CommonMark, or another language altogether (as long as it’s the language your code review tool’s using) ↩︎

  5. Thanks to Katya for reminding about this case ↩︎

  6. Performance and documentation are features too ↩︎

  7. BRAAAAM ↩︎

  8. Preferably with minimum configurations, to not get tempted into hours long discussion of vital questions like “double quotes or single one?”. For example, check out eslint:recommended/prettier/standard in Javascript’s ecosystem, black in Python’s, gofmt in Go’s, and .editorconfig almost everywhere ↩︎

  9. And, please, don’t let git hooks justify the lack of linter on CI. Hooks might not be ran because of one reason or another, for example if the commit was created in Github’s web UI. So, without linter on CI, your main branch will have broken formatting more often than you’d like (and you’d probably like it to be “never”) ↩︎