Using Code Reviews for Technical Interview Screens

True Story Follows

Recently my manager gave some leeway to spend some time working on an experiment that I thought would help the interview process. A valuable step in the hiring funnel is to accurately screen incoming candidates to try and minimze everyone’s time and effort throughout the interview. This includes both the candidate and the company.

In order to save time and costs, it’s best to get a reasonably good signal before the on-site interview. Companies typically might do this by having recruiters conduct technical screens, multiple phone interviews with engineers, or have the candidate complete homework assignments.

In talking to Chase “T-Bone” Seibert, who works at a company, that, to my understanding, helps clients generate really good PIN numbers for their debit cards, one thing he’s tried recently during interview is to print out a pull request on GitHub of some bad code added to a code base, and the candidate has to conduct a code review.

I wanted to take this a step further and make it part of the initial screen (before or after a phone interview). In this way, you can essentially get most of the benefit of a homework problem by giving candidates a chance to express broad ideas and not have a time constraint that might cause nervousness and obstruct the hiring signal. The added bonus though is that completing a code review is generally less time consuming and generally less stressful. And if you can have multiple candidates conduct the same code review, you might be able to begin establishing a bar of what constitutes a good candidate and a bad one.

One problem though, at least the one that I’m trying to solve, is that in order to make this process scale and really save time, it needs to be automated so that a unique repository can be created only visible to the candidate, and the same pull request is generated for every interview. Fortunatley, GitHub has its own API that’s very robust and easy to use…if you can understand the low level operations of git in the first place.

The Script

The script itself to generate the code would be fairly boring if I just threw it up here. So to describe what’s happening at a high level, we just log into GitHub, generate an API key, and use that to post a few operations:

  • Create a new unique, repository (preferably private, which requires a paid account)
  • Add files to the repository and set HEAD on master to that commit.
  • Create a new branch from master.
  • Add a differential to the new branch on the existing files that were added (this part was harder)
  • Create a pull request

So to paint the scenario it words, the repo starts out like this:

step1empty

No repositories are present. Now we just run a quick script put together that executes the scenario described above.

step2runscript

After the script is run we can see that a new private repo has been created with a unique UUD. At this point we would get the candidate’s GitHub username and add him or her as a collaborator.

step4_everything_exists

Now we can just navigate to the pull requests and we can see that two seperate files were modified in the latest pull request.

step3repo_populated

Modifying files in a Commit with the GitHub API

Everything was really straightforward until it came to modifying files. At this point actually had to get a reference to a recent commit and set up a new tree of files and reference the old tree. In case anyone reaches that case and stumbled here, I’d be happy to share code, but otherwise my code was pretty nasty here.

I followed a tutorial on the internets about how to go about modifying a file, but in short:

  • For every file that you want to modify, POST to Github to the create blob endpoint with the file contens and save the sha hash for the blob.
  • For every sha that was returned, you’ll need to amend that to a json tree that you create and specify the old file path that you’re overwriting.
  • In order to post this new data as a differential, you’ll need a reference to the current tree. Get a reference to master, to get the URL of the latest commit.
  • Make a GET request to the latest commit and get the latest tree structure of the repo. All you need is the tree’s sha hash.
  • POST to the create tree endpoint with your newly created tree as the tree and the recently fetched tree sha as the base tree.
  • The new tree sha is returned from the last response. The maser commit sha is known from the initial request to the reference for master. Use the GitHub API to create a new commit with your new tree sha as the tree and the master commit sha as the parents.
  • After you POST the commit, you have the sha of the commit. Now post a reference to the head of the branch you’re working on (not mentioned in this confusing process, but I created all these new files in a different branch) with the new commit sha as the sha parameter.

What’s Left: Make a Good Interview Problem

Now all we need to do is make some really terrible code for candidates to leave feedback on. However, I found that this isn’t as easy as you’d think. The goal is to create something where you’d glean enough information about the candidate to reasonably be able to say no. The worst interview questions are those in which the answer becomes trivial if you can spot the “riddle”. You don’t want the code review to be an Easter Egg hunt full of gotchas. You want a question in which design might be poor, and the candidate is able to spot that and explain why. You want enough items to leave feedback on that there’s a full range of easy, medium, and hard things to catch.

This is still an experiment, so we’ll see if it becomes something worthwhile.

The End