Learn Git data structure, then understand what a rebase is.
The title of the git man page
is:
git - the stupid content tracker
The unique design of git
makes it simple, but hard to use if you misunderstand the stupid data structure it relies on.
If it’s stupid, why do so many developers sweat when an old, bearded colleague gently asks:
“Could you rebase your branch on this integration branch? Thanks, see you later!”
Take 10 minutes to learn how commits are stored and you will be able to comprehend how each git command functions. I promise that everything, (including rebase
) will make sense.
Part 1: What is a commit?
The Content of a commit
A commit is just a file. It contains:
-
The diff, which stores the lines added and removed:
+++ src/main @@ -1,15 +1,16 @@ - void foo() { } + void foo(int i) { }
-
A reference to the parent commit in the form of a
hash
(SHA-1).commit 90774924f3b2e08d4b8a7019cd0d99bf09a40f34
-
Some meta data, like the commit message, description, date, author…
Nothing magic.
A commit is identified by its hash
A commit
doesn’t have a name or an ID.
The only way to reference it is by its hash
.
The hash
is calculated with all of its content (which includes the reference to its parent, remember that).
Do you add a line break at the and of a file, or fix a typo in a commit message? The hash
will change.
A commit is immutable
It’s impossible to modify a commit without changing its hash
. Even the best git expert can’t edit a commit.
But it’s possible to modify a copy of it. Although, the modified copy will have a different hash
.
The action of copy and edit a commit is called a rewrite
. I avoid using the phrase edit a commit
because it can be confusing.
These copies are typically performed duringrebase
.
The .git
directory is a stupid bag of commit
A git repo .git
contains all the commits that have been pushed. They are identified by their hash
.
There is no hidden database, no hierarchy, not even a list of commit.
What about branches? Do they store a list of commit? No.
A branch
is just a reference on a commit
A branch is just a file which contains the hash of the top commit called HEAD
.
For me, that was an eye opener! cat
the following file:
$ cat .git/refs/heads/master
075de864a4b8b1aef0583cb8be1b5c92b3ac5ed0
Of course, tags
works the same way (see .git/refs/tags
).
The tagline “stupid content tracker” is well deserved.
Git is a big hash map of commit. Every commit reference it’s parent by hash. A branch is just a commit with a name.
I think the simplicity of this design is beautiful.
Part 2: What does rebase do?
git rebase master
rewrites all the commit of the current branch - on top of master
.
# initial situation:
C -- D # feature
/
A ---- B # master
$ git branch feature && git rebase master
C -- D
/
A --- B # master
\
C* -- D* # feature
Let’s look closely at what rebase
can do:
- Finds the nearest common commit:
A
which is called thefork point
. - Rewrites every commit of the current branch (feature)
C--D
on top of master by applying diffs:C* - D*
. feature
now reference the brand new commitD*
.
Rebase must rewrite history
Even if the diffs in C
and C*
are exactly the same, the hash
are different because they have different parents. The consequence is that every commit downstream must be rewritten.
From git perspective, a rewritten commit is completely unrelated to the original one. There is no relation between C
and C*
.
After a rebase
, the branch feature
contains new commits.
What happened to the commit C--D
? Nothing, they are still there; unchanged. Commits are immutable. They are ‘lost’, because, on your repo, there is no branch pointing to D
.
Never rebase a shared branch
On the server, or other developer machines, the branch feature
still references D
. If someone else pushes something on it, you are in a problematic situation. No automatic merge or rebase can fix that.
One way to save the day is to use interactive rebase
. But that is out of the scope of this post.
Try to avoid this situation by changing your process to clarify the ownership of each branch.
If you are absolutely certain that feature
is your branch, you can push --force-with-lease
as much as you want.
Always rebase your branch before merge
After a rebase, git will be able to fast forward merge
on the target branch without creating a merge commit. That the point of rebasing
.
Are you still reading?
Wow, you a real tech person! Hope you found reading this article helpful. Git is a strange beast that deserves to be tamed.