Anecdotes de voyage, vie de famille

Learn Git data structure, then understand what a rebase is.

The title of the git man page is:

git - the stupid content tracker

The unique design of git makes it simple, but hard to use if you misunderstand the stupid data structure it relies on.

If it’s stupid, why do so many developers sweat when an old, bearded colleague gently asks:

“Could you rebase your branch on this integration branch? Thanks, see you later!”

Take 10 minutes to learn how commits are stored and you will be able to comprehend how each git command functions. I promise that everything, (including rebase) will make sense.

Part 1: What is a commit?

The Content of a commit

A commit is just a file. It contains:

  • The diff, which stores the lines added and removed:

    +++ src/main
    @@ -1,15 +1,16 @@
    - void foo() { }
    + void foo(int i) { }
    
  • A reference to the parent commit in the form of a hash (SHA-1). commit 90774924f3b2e08d4b8a7019cd0d99bf09a40f34

  • Some meta data, like the commit message, description, date, author…

Nothing magic.

A commit is identified by its hash

A commit doesn’t have a name or an ID. The only way to reference it is by its hash.

The hash is calculated with all of its content (which includes the reference to its parent, remember that).

Do you add a line break at the and of a file, or fix a typo in a commit message? The hash will change.

A commit is immutable

It’s impossible to modify a commit without changing its hash. Even the best git expert can’t edit a commit. But it’s possible to modify a copy of it. Although, the modified copy will have a different hash.

The action of copy and edit a commit is called a rewrite. I avoid using the phrase edit a commit because it can be confusing.

These copies are typically performed duringrebase.

The .git directory is a stupid bag of commit

A git repo .git contains all the commits that have been pushed. They are identified by their hash. There is no hidden database, no hierarchy, not even a list of commit.

What about branches? Do they store a list of commit? No.

A branch is just a reference on a commit

A branch is just a file which contains the hash of the top commit called HEAD.

For me, that was an eye opener! cat the following file:

$ cat .git/refs/heads/master
075de864a4b8b1aef0583cb8be1b5c92b3ac5ed0

Of course, tags works the same way (see .git/refs/tags).

The tagline “stupid content tracker” is well deserved.

Git is a big hash map of commit. Every commit reference it’s parent by hash. A branch is just a commit with a name.

I think the simplicity of this design is beautiful.

Part 2: What does rebase do?

git rebase master rewrites all the commit of the current branch - on top of master.

# initial situation:
  C -- D # feature
 /
A ---- B # master

$ git branch feature && git rebase master

  C -- D
 /
A --- B # master
       \
        C* -- D* # feature

Let’s look closely at what rebase can do:

  • Finds the nearest common commit: A which is called the fork point.
  • Rewrites every commit of the current branch (feature) C--D on top of master by applying diffs: C* - D*.
  • feature now reference the brand new commit D*.

Rebase must rewrite history

Even if the diffs in C and C* are exactly the same, the hash are different because they have different parents. The consequence is that every commit downstream must be rewritten.

From git perspective, a rewritten commit is completely unrelated to the original one. There is no relation between C and C*.

After a rebase, the branch feature contains new commits.

What happened to the commit C--D ? Nothing, they are still there; unchanged. Commits are immutable. They are ‘lost’, because, on your repo, there is no branch pointing to D.

Never rebase a shared branch

On the server, or other developer machines, the branch feature still references D. If someone else pushes something on it, you are in a problematic situation. No automatic merge or rebase can fix that.

One way to save the day is to use interactive rebase. But that is out of the scope of this post.

Try to avoid this situation by changing your process to clarify the ownership of each branch.

If you are absolutely certain that feature is your branch, you can push --force-with-lease as much as you want.

Always rebase your branch before merge

After a rebase, git will be able to fast forward merge on the target branch without creating a merge commit. That the point of rebasing.

Are you still reading?

Wow, you a real tech person! Hope you found reading this article helpful. Git is a strange beast that deserves to be tamed.