Fix Version/s: None
The question of what makes a "good commit" arises occasionally. I feel that the following document:
presents a decent guideline. Pasting from it:
Structural split of changes
The cardinal rule for creating good commits is to ensure there is only
one "logical change" per commit. There are many reasons why this is an
- The smaller the amount of code being changed, the quicker & easier it
is to review & identify potential flaws.
- If a change is found to be flawed later, it may be necessary to revert
the broken commit. This is much easier to do if there are not other
unrelated code changes entangled with the original commit.
- When troubleshooting problems using GIT's bisect capability, small
well defined changes will aid in isolating exactly where the code
problem was introduced.
- When browsing history using GIT annotate/blame, small well defined
changes also aid in isolating exactly where & why a piece of code came from.
Things to avoid when creating commits
With that in mind, there are some commonly encountered examples of bad
things to avoid
- Mixing whitespace changes with functional code changes.
The whitespace changes will obscure the important functional changes,
making it harder for a reviewer to correctly determine whether the
change is correct. Solution: Create 2 commits, one with the whitespace
changes, one with the functional changes. Typically the whitespace
change would be done first, but that need not be a hard rule.
- Mixing two unrelated functional changes.
Again the reviewer will find it harder to identify flaws if two
unrelated changes are mixed together. If it becomes necessary to later
revert a broken commit, the two unrelated changes will need to be
untangled, with further risk of bug creation.
- Sending large new features in a single giant commit.
It may well be the case that the code for a new feature is only useful
when all of it is present. This does not, however, imply that the entire
feature should be provided in a single commit. New features often entail
refactoring existing code. It is highly desirable that any refactoring
is done in commits which are separate from those implementing the new
feature. This helps reviewers and test suites validate that the
refactoring has no unintentional functional changes. Even the newly
written code can often be split up into multiple pieces that can be
independently reviewed. For example, changes which add new internal
APIs/classes, can be in self-contained commits. Again this leads to
easier code review. It also allows other developers to cherry-pick small
parts of the work, if the entire new feature is not immediately ready
for merge. Addition of new public APIs or RPC interfaces should be done
in commits separate from the actual internal implementation. This will
encourage the author & reviewers to think about the generic API/RPC
design, and not simply pick a design that is easier for their currently
chosen internal implementation.
The basic rule to follow is:
If a code change can be split into a sequence of patches/commits,
then it should be split. Less is not more. More is more.
In light of increasingly allowing/encouraging rebasing to happen, I
think we should set an expectation of what the commits should look like.
I'd motion the SAT to put this onto the agenda as an addition to our