Git pull dissected

Using git pull by default can lead to some unexpected results when your branch has been rebased.

By default almost everyone is using “git pull” to update their local branch. When someone has forced pushed the result of a rebased to a repository, executing a simple “git pull” will often give you merge conflicts. What is going on and how should you get to the latest truth to continue working?

Disclaimer

Let’s clear something first. Changing commit history is kindof evil. Git warns you and does not let you do this by default. You are required to force push your changes to the repository. Git rebase typically changes existing commits, i.e. the hash of a commit is changed. Another evil command is to amend a commit to the previous commit. The golden rule is, never rewrite history on public branches.

In my current assignment we always rebase, because we like to have a clean commit history. A single straight history line without any side branches. Every feature team has its own branch they work on and when it is time to integrate we rebase our team branch on top of integrate branch. The commit history of our feature branch is changed and in exactly that situation doing a simplegit pull” will give unwanted results.

Git pull

Assuming you have not made any commits on your local branch, but only did not update for a while. Normally, users execute a “git pull” to retrieve the latest commits from a branch. This is equivalent to:

git fetch
git merge FETCH_HEAD.

This actually incorporates changes to this local branch, since the history diverged from the remote branch.

But wait, you try to merge the current state of the local branch with the remote branch. When any of the X commits conflict with any of the A, B or C commits, it will result in a merge.

  B---C---D local
 /
A---X---X---B'---C'---D'---X remote

Note that the commits marked with a *’* are commits of which the hash has changed.

What you actually want to accomplish is, go back to where the two branches diverted (caused by the rebase that was forcefully pushed), and get up-to-date with the remote branch.

So execute a “git pull --rebase” which is is equivalent to:

git fetch
git rebase FETCH_HEAD

Now remote and local are the same again.

A---X---X---B'---C'---D'---X remote/local

What to do

How would you know that the result of a rebase was forcefully pushed to the remote branch? Of course, the integrator of your team should tell every member the feature branch got rebased on top of the integrate branch, but sometimes that just does not happen or the message does not reach you.

The command “git status” gives you a clue that our local branch has 8 changes, which are not on the remote branch.

$ git status
On branch master
Your branch and 'origin/master' have diverged,
and have 8 and 17 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
nothing to commit, working directory clean

$

Or use “--ff-only” which makes sure git does not merge, but aborts the operations so you can figure out what is going on and choose whether you want to merge or rebase to get up-to-date.

$ git pull --ff-only
fatal: Not possible to fast-forward, aborting.

$

Rebase

Normally you use “git pull --rebase” when your local branch has commit which are not yet pushed to the remote branch. Your local commits are replayed upon the remote branch.

  B---C---D---Y---Y---Y local
 /
A---X---X---B'---C'---D'---X remote

A “git pull --rebase” will result in the following.

                             Y---Y---Y local
                            /
A---X---X---B'---C'---D'---X remote

--autostash

As I described earlier, the command “git pull --rebase” actually performs a “fetch” and “rebase”. Having pending changes always forces me to commit my changes first, but “git rebase” does have a nifty option “--autostash” that stashes any pending changes first and applies the stash when it is done with the “rebase” action.

--committer-date-is-author-date

When having a rebase strategy, the committer date does not have much value anymore. All rebased commits have the same date, i.e. the moment of the rebase action. Using the --committer-date-is-author-date comes in handy to maintain the original author date as committer date, so the history shows a nicer view of the authoring dates instead of integration dates.

Wrap up

Merges always result in a new commit, so merging by default seems a unfortunate choice to me. I prefer to rebase my own commits to keep my history clean. Also rebasing feature branches on to of the integrate branch seems as a good approach to keep your history clean, but all team members should be aware when a rebase has been done.

Please note: we only use the rebase command to put a series of commits at the end of our upstream branch. Rebase allows much more evil tricks.

# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.

Maybe it is time to update the golden rule of rebasing…

Thou shalt never rebase an upstream branch.

Using the “git pull --rebase” by default helps the team to prevent a merge mess. Alternativly use “git pull --ff-only” so you can choose your own follow-up action when the pull fails. You can set this up on the command line with the following command.

git config pull.ff only

Abandon git pull

Some people promote to stop using git pull altogether. In that post the ultimate alias is given. First fetch and prune deleted references, then merge with fast-forward only and when that fails rebase your commits preserving merges.

You can add the following alias to your ~/.gitconfig.

update-from = "!f() { git fetch $1 --prune; git merge --ff-only $1/$2 || git rebase --preserve-merges $1/$2; }; f"

Then you can run git update-from «remote» «branch-name» to do all of this in one step:

$ git update-from origin master
fatal: Not possible to fast-forward, aborting.
First, rewinding head to replay your work on top of it...

Or in git >= 1.7.9, you can specify to always use rebase when doing a git pull:

git config --global pull.rebase true

Background

As Linus stated multiple times, his little project initially never intended to be a source control management system (SCM). More a “distribution and archival mechanism” on which it happend to be possible to make a SCM on top. Maybe git is just too versatile/low-level for novice users and more aliasses like this update-from should be created to introduce git more easily within development teams?

Please leave a comment to show whether you agree or not. Maybe you can provide some other insights about git?


Update 28 feb 2018:

Added autostash and force-with-lease rebase options.

Update 7 mar 2018:

Added config option for git pull to always use rebase.