Benefits of Git Rebase

In the first article in this series (Git: Rebase vs Merge) I covered the tactics of rebasing.  I discussed what merge commits are, and how to avoid them with rebasing.  In this post I'll cover the benefits of rebasing, including how its use speeds up finding hard to track down bugs via git blame and git bisect.

Is Rebase Really Worth It?

I worked on one large project that discouraged rebasing a while ago.  In short, this was the result:

The repository was insanely complicated to look at with gitk (or SourceTree in this case).  Trying to understand what was going on across the program was virtually impossible.

The recalcitrant developer (hey, someone forced me to learn these stupid SAT words, might as well inflict them on others), at this point, might simply respond: "So what?  As long as my stuff works."  And perhaps it's true, just like the old joke:

Patient: Doctor, it hurts whenever I do this.
Doctor: Don't do that.

Merge Commits: Ugly, Painful, or Both?

But even if someone never looked at the repo with gitk, or a fancy git GUI, in the project above roughly 50% of all commits were merge commits!

That meant 50% of the history of the project was useless noise distributed like a shotgun blast into the true history.  Furthermore, some of those merge commits hid sneaky little bugs introduced when merge conflicts were poorly resolved.

Granted, 50 percent may be high for most projects.  The actual number depends on how many developers, how often they commit, and how often they merge.  For example if developers pull from develop daily, a given project will get one merge commit per developer per day.  If developers commit 3 times per day (seems kind of average from my observations), then 25% of commits will be merge commits.

The recalcitrant developer (stupid SAT) might again at this point respond: "25% of commits are merges, so what?  Just don't look at the commit history!  And if forced to, just ignore the merge commits!"

However, there are two specific cases where a messy history may yet affect a fast and loose, merge happy team.

Don't Blame Git!

Have you ever looked at a file and wondered who wrote a particular line of crap?  In my case 99% of the time that person is me.  But never mind that.  The tool for this job is clearly git blame.

The command git blame (or git annotate for the more more politically correct) will annotate every line of a particular file with its last commit.  Apart from finding who wrote something, it's also an essential tool for discovering why something was done the way it was when spelunking through larger, or especially older, codebases.

However, merge commits obfuscate git blame.  If an associated commit message is simply "merged develop into feature.awesome" and the developer is no longer around to ask, then we have to go through additional effort to track down history.

For instance, in the example above line 3 "c" was actually created by commit C (b49c7b1), but git blame incorrectly shows the merge commit (d9400d4) as the author.

Git Bisect For The Win!

The second scenario in which merge commits complicate history is in tracking a bug that someone introduced, typically, within the last few days.  One could manually checkout every commit between the good commit and the bad commit, or simply use git bisect.

Git bisect is wonderful for automating the process of finding a commit.  It allows you to specify the last known good commit, and the last known bad commit, and then it performs a binary search of all the commits in between to discover the bad commit as quickly as possible.

Regardless of whether you search manually, or use git bisect, life gets hard as soon as you try to juggle many branches with lots of merge commits.  The automated approach makes navigating many merged branches easier, but either way if you have a fully merge-based project, you are now required to take an additional 25-50% steps.   If each step takes time to build or deploy, these extra steps can quickly add up (trust me, I've had to do this a lot).

For instance consider the following project with three feature branches, seven real commits, and five merge commits.

Now pretend that you've come in on Monday morning to discover that after committing D on Friday, some developers over the weekend committed E and F, and suddenly there's a hard to track down bug.  Git bisect will solve it for you like this:

Lee@lee-xps MINGW64 /c/Temp/deletemegit (feature.awesomeness)
$ git bisect start
 
Lee@lee-xps MINGW64 /c/Temp/deletemegit (feature.awesomeness|BISECTING)
$ git bisect good 880e84a
 
Lee@lee-xps MINGW64 /c/Temp/deletemegit (feature.awesomeness|BISECTING)
$ git bisect bad be49c0d
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[d9400d4c62807046f8ea235170e681b3e8952200] Merge branch 'develop' into feature.awesomeness
 
Lee@lee-xps MINGW64 /c/Temp/deletemegit ((d9400d4...)|BISECTING)
$ git bisect good
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[96cf2877d16183cccce1f822d72626d331b582ef] Merge branch 'develop' into feature.awesomeness
 
Lee@lee-xps MINGW64 /c/Temp/deletemegit ((96cf287...)|BISECTING)
$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 1 step)
[190dadb1046822fc169193e86484a19e3543b783] Merge branch 'feature.2' into develop
 
Lee@lee-xps MINGW64 /c/Temp/deletemegit ((190dadb...)|BISECTING)
$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[ef4a5532ace6075aead0850f088322f98e7afbf1] E
 
Lee@lee-xps MINGW64 /c/Temp/deletemegit ((ef4a553...)|BISECTING)
$ git bisect bad
ef4a5532ace6075aead0850f088322f98e7afbf1 is the first bad commit
commit ef4a5532ace6075aead0850f088322f98e7afbf1
Author: Lee Richardson
Date:   Mon Sep 5 21:56:07 2016 -0400
    E
:000000 100644 0000000000000000000000000000000000000000 d8263ee9860594d2806b0dfd1bfd17528b0ba2a4 A      2.txt
 
Lee@lee-xps MINGW64 /c/Temp/deletemegit ((ef4a553...)|BISECTING)
$ git bisect reset
Previous HEAD position was ef4a553... E
Switched to branch 'feature.awesomeness'
Your branch is based on 'origin/feature.awesomeness', but the upstream is gone.
  (use "git branch --unset-upstream" to fixup)
 
Lee@lee-xps MINGW64 /c/Temp/deletemegit (feature.awesomeness)

Beautiful!  Commit E was the culprit.  The only problem: three of the four steps wouldn't have been needed if the team had been rebasing.

Summary

If your project is still small, and you haven't had to use git blame or git bisect yet, you may not find these arguments compelling.  However, if this is the case I suspect you may also  not find much value in unit testing either.  Unit testing and rebasing both require extra up-front work in order to build quality into your work and set your future self up for success.

More worried about immediate deadlines than your future self?  Consider the developer who will replace you when you leave.  Not worried about her?  Consider your customer at some future point in time when they're attempting to spelunk through your code.  Unless this is a throwaway project, the chances are good that a little extra effort learning a new technique today could save yourself, other developers, and possible a future employer considerable time and energy.  Not a bad investment.