Atomic Commits
During any project with more than one person, there inevitably comes a time when you come across code that just makes no sense whatsoever. It’s not commented, and git blame shows a gigantic commit spanning tens of files and hundreds of line, often all committed under the somewhat terse of message “fixed some bugs”.
There’s so many lines there, that even if the programmer had attempted to write a good commit message, there would be no way of properly summarising his or her work. What the prior developer should have used was atomic commits.
Atomic commits are the standard way of breaking your development work into bite sized pieces. In this article, I’ll focus on the concept of atomic commits as they pertain to working in a professional development shop on product code. Projects for external use such as libraries and external code work may have their own standards for merging code1, but you should still use atomic commits for internal purposes.
In common version control parlance, a commit indicates a single piece of work, such as filling out a method, writing a test, fixing a smaller bug, or creating the skeleton of a class. Features should be built from several branches, and collated in branches.
At this point, you may be asking the purpose of building a feature or change out of a series of commits, when they are all going to be merged into a single master or develop branch anyway. In the same way that good modular code is made up of a series of small methods and functions that are easy to understand, refactor and reuse, good atomic committed code is easy to read, revert, and cherry-pick. The practice is an investment in the future. When you or another programmer has to look at your code six months from now, it’s far more useful to see to see the following commit rather than a 600 line diff that may do one or many things. This is particularly true when you have to implement some less than elegant code to work around platform limitations.
commit dddaaabbbcccceeeff1111222233333444455556
Author: Edward Murrell <edward@ekm.id.au>
Date: Tue Jan 19 17:10:19 2016T+1100:00+10:00
CARD-123: Fix display bug in UserWidget class outputting "Array" instead of empty string for days column.
In the above example, it may turn out later than the original behaviour was the correct one, and the real bug is further down the rendering chain. Rather than adding another commit that returns the code to it’s (hopefully) working state, you can simply revert the original, and be sure that the code is exactly as it was before. If you’ve bundled everything into a single commit, this is not possible, and you are now faced with the unenviable task of remembering which changes were for which bug, and backing them out manually. This is also particular true when it comes to unit testing. Not only does it clearly indicate the states in red/green testing2, but it means that it’s much easier to hard reset back to a known state when it’s clear that a given approach is not working out.
Finally, it makes code re-use much simpler. If you fix a bug during development3, it’s entirely possible that the bug is blocking another programmer from finishing their work. Rather than merging branches that are not finished, or waiting till they’re ready, you can simply cherry-pick the bug fix to the other branch, and when those branches get merged, git is smart enough to know that the cherry-picks are the same. This also works for a series of commits relating to the creation or re-factor of a class4. For this reason, it’s usually a good idea to isolate each your commits to a single class.
As a side benefit of this practice, it can very useful for picking up work on a branch after a long absence or change in developer. A good history should help tell you what the direction of the code. A red test commit, followed by the creation of an empty class or method immediately indicates the next task to be completed.
All of these scenarios are common during team development. A good atomic commit practice helps reduce the friction during these circumstances.
In the same way that learning to recognise good code takes time, learning to build a good commit history also takes time. As a hint, the implicit or explicit use of the word “and” in a commit is a good indication that a commit should be broken up more. The following is a list of anti-pattern commits.
# Commits that are not very atomic.
Commit #1: Created student view controller and added route entries.
Commit #2: Added InvoiceWebDisplay and it's parent AbstractInvoiceDisplay class.
Commit #3: Fixed all the bugs for missing dollar signs in PDFs generators.
Commit #4: Moved Widget classes out of the prototype namespace, updated method signatures to work in production.
Commit #5: Massive re-factor of the LocomotiveView controller methods to use new database structure.
Commit #1 superficially looks like an atomic commit, particularly when the amount of code required for each of these actions is small, but this is actually concatenating two items together, as indicated by the use of the key word ‘and’. In the event that the route is wrong, or needs to be turned off in a hurry, it won’t be possible to only remove the routes with a simple revert.
Commit #2 - Interfaces, abstract classes, and their implementing or extending classes are often committed together. While this is unlikely to cause issues later, it wouldn’t make sense if you were committing an interface and two or more classes that implemented it. Committing these together is also practice smell that you aren’t testing individual components properly, and are building your interfaces as pale reflections of their classes, rather than as contract about how classes of this type should behave.
Commit #3 is a little more surreptitious. It’s the same bug, but in multiple places. Those generators are probably separate files, and thus each fix probably deserves it’s own commit. That way, if it turns out that one of the fixes was an error, it’s easy to revert. Good commit messages there indicating exactly which PDFs were fixed by the changes would also make it collate a list of tests areas for the testing staff.
Commit #4 is a common mistake. It’s important to separate out the changes caused by house keeping of moving the files from the logic changes required by adhering to the new API. This is especially true when multiple files are involved and/or it’s not possible to get your IDE to do the necessary refactoring.
Commit #5 looks like an atomic commit on the surface. In these cases, it helps to think of the methods as units, rather than the class. It might help to think of the difference between the change found in one commit that only affects a single method, versus the change expected for the whole class that is expressed as a branch and subsequent pull request.
Like many good practices, these practices work best if the the whole team is on-board. If you aren’t seeing this practice in your workplace, introduce the concept at your next retro or developer guild meeting. If need be, you can point them at this article as a guide.
-
How to Get Your Change Into the Linux Kernel kernel.org ↩︎
-
Red/green testing is where a failing (red) test is written testing for the desired outcome before the implementation code. When the code is successfully implemented, the test will go green. This is the basis for Test Driven Development. ↩︎
-
When you fix a bug during development, you should write a test for that bug so that if it ever happens again, you’ll know. ↩︎
-
When this happens, it’s a good indication that you should have created a separate branch for that class, possibly with it’s own card. ↩︎