Git history matters (2) - example


Here are additional notes around my previous post Git history matters [1].

dey commented:

Sounds ok since you have only one commiter.

But the git history becomes unnecessarily complex since several features are coded at the same time or when features require a base branch merge.

So here is a little scenario, where several contributors perform concurrent commits and use base branch merge...

Initial repository

First, D. Doctor initializes a repository:

# Create repository.
mkdir git-complex-example
cd git-complex-example/
git init

# Author commits as D. Doctor by default.
git config user.name "D. Doctor"
git config user.email "d.doctor@example.com"

# Add some commits, so that history is not empty.
git commit --allow-empty -m "Initialized repository."
git commit --allow-empty -m "Refs #1 - README introduces project."
git commit --allow-empty -m "Release 1.0."

Commits in travis branch

Contributor A. Abracadabra starts working on TravisCI.org integration, in a topic branch named travis:

git checkout master  # Make sure we start from master.
git checkout -b travis

# Commit as A. Abracadabra.
git commit --allow-empty -m "Added TravisCI.org configuration." --author="A. Abracadabra <a.abracadabra@example.com>"

Commits in sphinx branch

In the meantime, C. Cachemire adds Sphinx documentation, in a topic branch named sphinx:

git checkout master  # Make sure we start from master.
git checkout -b sphinx

# Commit as C. Cachemire.
git commit --allow-empty -m "Introduced Sphinx documentation. Work in progress." --author="C. Cachemire <c.cachemire@example.com>"
git commit --allow-empty -m "Added about/ section in documentation." --author="C. Cachemire <c.cachemire@example.com>"
git commit --allow-empty -m "Typos." --author="C. Cachemire <c.cachemire@example.com>"

More commits in travis branch

Later, A. Abracadabra adds some commits in his travis branch.

git checkout travis

# Commit.
git commit --allow-empty -m "Added link to continuous integration platform in README." --author="A. Abracadabra <a.abracadabra@example.com>"

Checkpoint

What's the situation?

  • Master has only the initial commits.
  • We have 2 topic branches.
  • None of the topic branches has been merged in master yet.
  • In topic branches, there are several commits.
  • travis branch has one commit before sphinx ones, and one commit after. So a flat chronological view of commits will not suggests the branches.

What does history look like at this time?

# Let's configure an alias to improve log format.
$> git config alias.logp 'log --pretty=format:"%s %Cgreenby %an <%ae>"'

# Inspect "master" branch:
$> git logp --graph master
* Release 1.0. by D. Doctor <d.doctor@example.com>
* Refs #1 - README introduces project. by D. Doctor <d.doctor@example.com>
* Initialized repository. by D. Doctor <d.doctor@example.com>

# Inspect what is in "travis" branch and not in "master":
$> git logp --graph travis...master
* Added link to continuous integration platform in README. by A. Abracadabra <a.abracadabra@example.com>
* Added TravisCI.org configuration. by A. Abracadabra <a.abracadabra@example.com>

# Inspect what is in "sphinx" branch and not in "master":
$> git logp --graph sphinx...master
* Typos. by C. Cachemire <c.cachemire@example.com>
* Added about/ section in documentation. by C. Cachemire <c.cachemire@example.com>
* Introduced Sphinx documentation. Work in progress. by C. Cachemire <c.cachemire@example.com>

Fine, let's continue...

Merge branch sphinx

Now, D. Doctor merges sphinx branch into master, with a merge commit.

git checkout master
git merge --no-ff -m "Refs #2 - Added Sphinx documentation." sphinx

Update and merge branch travis

Let's update the travis branch before we merge it.

# As A. Abracadabra
git config user.name "A. Abracadabra"
git config user.email a.abracadabra@example.com

# Psyko-rebase "gitignore" branch on top of "master" branch.
git checkout travis
git merge master -m "Merge branch 'master' into travis"

Note

See also psykorebase [2] about "merging travis branch on top of master, still in a topic branch".

Finally, D. Doctor merges travis branch.

# As D. Doctor...
git config user.name "D. Doctor"
git config user.email "d.doctor@example.com"

# Merge "travis" branch in "master", with an explicit merge commit.
git checkout master
git merge --no-ff -m "Refs #3 - Enabled continuous integration with TravisCI.org." travis

Raw Git history gets complicated

What's the situation?

  • We have 3 commits explicitely performed on master: initial commit and 2 merges.
  • The 2 topic branches have been merged in master.
  • In topic branches, there are several commits.
  • gitignore branch has one commit before links ones, and one commit after. So a flat chronological view of commits do not suggests the branches.

As expected, the flat raw log is no longer easy to understand:

$> git logp
Refs #3 - Enabled continuous integration with TravisCI.org. by D. Doctor <d.doctor@example.com>
Merge branch 'master' into travis by A. Abracadabra <a.abracadabra@example.com>
Refs #2 - Added Sphinx documentation. by D. Doctor <d.doctor@example.com>
Added link to continuous integration platform in README. by A. Abracadabra <a.abracadabra@example.com>
Typos. by C. Cachemire <c.cachemire@example.com>
Added about/ section in documentation. by C. Cachemire <c.cachemire@example.com>
Introduced Sphinx documentation. Work in progress. by C. Cachemire <c.cachemire@example.com>
Added TravisCI.org configuration. by A. Abracadabra <a.abracadabra@example.com>
Release 1.0. by D. Doctor <d.doctor@example.com>
Refs #1 - README introduces project. by D. Doctor <d.doctor@example.com>
Initialized repository. by D. Doctor <d.doctor@example.com>

And the raw graph is getting weird (although it is explicit):

$> git logp --graph
*   Refs #3 - Enabled continuous integration with TravisCI.org. by D. Doctor <d.doctor@example.com>
|\
| *   Merge branch 'master' into travis by A. Abracadabra <a.abracadabra@example.com>
| |\
| |/
|/|
* |   Refs #2 - Added Sphinx documentation. by D. Doctor <d.doctor@example.com>
|\ \
| * | Typos. by C. Cachemire <c.cachemire@example.com>
| * | Added about/ section in documentation. by C. Cachemire <c.cachemire@example.com>
| * | Introduced Sphinx documentation. Work in progress. by C. Cachemire <c.cachemire@example.com>
|/ /
| * Added link to continuous integration platform in README. by A. Abracadabra <a.abracadabra@example.com>
| * Added TravisCI.org configuration. by A. Abracadabra <a.abracadabra@example.com>
|/
* Release 1.0. by D. Doctor <d.doctor@example.com>
* Refs #1 - README introduces project. by D. Doctor <d.doctor@example.com>
* Initialized repository. by D. Doctor <d.doctor@example.com>

Solution 1: adopt "rebase+squash" workflow

Many users stop at this point and complain:

What a mess! Let's alter history!

Once history has been modified, git log gives a nice readable output:

$> git logp --graph master
* Refs #3 - Enabled continuous integration with TravisCI.org. by A. Abracadabra <a.abracadabra@example.com>
* Refs #2 - Added Sphinx documentation. by C. Cachemire <c.cachemire@example.com>
* Release 1.0. by D. Doctor <d.doctor@example.com>
* Refs #1 - README introduces project. by D. Doctor <d.doctor@example.com>
* Initialized repository. by D. Doctor <d.doctor@example.com>

Solution 2: custom views of history

Come on! The raw history was not a mess at all: it tells what happened, and that's its primary purpose. Let's use custom views to display what we want to...

Use --first-parent option to check "features", i.e. commits explicitely performed on master:

$> git logp --first-parent --graph master
* Refs #3 - Enabled continuous integration with TravisCI.org. by D. Doctor <d.doctor@example.com>
* Refs #2 - Added Sphinx documentation. by D. Doctor <d.doctor@example.com>
* Release 1.0. by D. Doctor <d.doctor@example.com>
* Refs #1 - README introduces project. by D. Doctor <d.doctor@example.com>
* Initialized repository. by D. Doctor <d.doctor@example.com>

The filtered log above is what you make via "rebase+squash" workflows, isn't it?

There is a small difference in the authorship:

  • in the first case, rebase+squash action was performed by D. Doctor, but he preserved authorship of commits in topic branches. We do not have the information "who merged".
  • in the second case, merge commits in master are authored by D. Doctor, who performed the merge. Contributors (A. Abracadabra and C. Cachemire) are mentioned in history of topic branches.

Notice the filtered log above is the only thing you can get once you have rebased+squashed. You altered history, you did not keep the original commits.

Using merge commits, you can see the "nice log", but you also get granted additional features related to original commits. Here are a few:

  • you can inspect commits in a topic branch (see notes below);
  • in discussions, references to commits and code are not broken;
  • you can revert a subset of a feature. Useful in case of a mistake.

Key differences: time, responsibility

As we saw above, the "nice log" can be displayed whatever the workflow. So this result is not the main difference between the workflows. Where is the value?

I think the value of merge-based workflows is the time you spend performing actions:

  • executing a custom log (i.e. git customlog) is as fast as executing git log. Some people will tell git log is the default, so you do not need to learn or setup it. That's true. But I think raw git log is definitely more readable with some styling of your own... so I would recommend having a custom git log output anyway. Just make it fit your needs ;)
  • editing history is longer than merging. When merging, you focus on the merge commit: diff and message.
  • in case of mistake, merge-based workflow can save you time. You can revert a merge and rewind in original history. Whereas with a rebase+squash you only have the squashed result, no way to rewind. You certainly are smart enough to fix the mistake manually, but in some cases it is easier (and safer) with git revert or git reset. The use case may be rare, but the time you spend on it may be big, and the stakes can be big too.
  • in some cases, original history saves you time. Sometimes you need to check details of a topic branch, or some code design has been tried then reverted in topic branch, or some discussion references a commit in topic branch... Again, the use case may be rare, but when it occurs, you appreciate having the full topic branch history. Else you have to remember, guess or whatever. Nothing blocker, but time you could save.

It is also a matter of responsibility: I'd prefer make a mistake in a merge commit than mess up a rebase+squash. Because I know I can revert a merge, and I know I preserved other's hard-work. Whereas doing a rebase+squash, I potentially alter one's contribution.

That's what I called "do not bother with micro commits" in the previous post Git history matters [1].

Merge-based workflows lack tools

I think merge-based workflows are the way to go... but I must admit I do not know how to implement some features they have.

I mean, I understand why people complain about merge and promote "rebase+squash": rebase -i is a powerful Git builtin. It looks quite simple. It produces simple output. And many people use it. Those are good enough reasons to use rebase+squash workflows.

My point is we should try, develop and promote merge-based workflows. It means collecting or making tools that implement the workflow.

git-flow [3] is one solution. There could be others with one master and N topic branches.

And, about reading Git history, there could be tools that produce nice views out of any complex raw history.

To display commits in main branch (i.e. master), the --first-parent option is fine.

But while googling on "git log branch", I found:

Back to our sample scenario above, as of 2014-02-11, I do not know a simple way to get the list of commits in travis branch after it has been merged. I guess it could be displayed, but the only way I know is reading the full and raw log (easier with gitk or gitg).

Let's stop here today... I presume "git tools for merge-based workflows" is a quite long story. Perhaps another post...

Notes & references

[1](1, 2) /git-history-matters-en.html
[2]https://github.com/benoitbryon/psykorebase
[3]https://github.com/nvie/gitflow
[4]https://github.com/datagrok/git-history-visualizations