Location:this issue page
AFAIK, we currently have no policy for how we should handle git history when we move files between repositories. I propose the following:
- Any files should be simply moved without any attempt to transfer the commits that built them.
- Files should be added to the destination repository on a single commit (multiple files may be added on one commit), without being modified in any way from their state in the last commit in the original repository - modifications to work with the new repository should take place on subsequent commits, even if this means the destination package is not buildable or fails tests in the interim. The transfer commit message should have the form "Transfer from <orig-pkg> at <orig-sha1>", indicating the last commit in the original repository where the files were present.
- Files should be removed from the original repository on a single commit that removes the exact same files that were added in the corresponding commit in the destination repository. This should have a commit message of the form "Transfer to <dest-pkg> at <dest-sha1>", referencing the commit where the code is added to the destination repository.
- These commit message SHA1s must be updated if the destination is a ticket branch that is rebased before being merged to master.
- Additional commit message content may be present (and usually should) after the "Transfer..." lines.
The main motivation for this proposal is that I think the obvious alternative - using git filter-branch to prune the history of the original repo down to just those commits that reference the files to be moved, then merging those - is too difficult to be worth our time and produces results that are problematic anyway. Perry Gee and I (but especially Perry) recently put a lot of effort into investigating these procedures for
DM-420, and here's what we found:
- git filter-branch is very tricky to use, and it runs slowly on large repos. While just copying the files takes minutes, using filter-branch to handle even a fairly simple transfer will typically take an expert hours, and a newbie days. (StackOverflow is helpful as usual here, but it's worth noting that there are very different answers for only slightly different situations, and that this subject frequently produces highly-ranked questions with only poorly-ranked answers).
- git filter-branch does not remove empty merge commits, meaning the "pruned" history is actually anything but - it's something like 95% empty merge commits and 5% actual changes. So far, the best solution we've come up with for pruning those out is to do an interactive rebase on the output of filter-branch, which requires manually re-resolving any conflicts that occurred on merges anywhere in the history. Of course, if we remove the merge commits, we also remove the links to JIRA issue numbers we've otherwise been careful to preserve.
- Our git commit discipline was so poor in even the recent past that most of those old commits aren't worth the effort, and it only gets worse as the history extends back to the svn days.
- By its very nature, filter-branch does quite a bit of violence to the original commits. That makes them hard to interpret at best, and at worst destroys the very history we're trying to preserve. I'm all for having users rewrite their own recent git history before merging to master, but automatically rewriting ancient history seems to defeat the purpose of preserving that history.
Instead, my proposal for commit discipline in transfers of code should provide a reliable workaround for the biggest drawback of not transferring the history: the fact that git blame will not longer give sensible results directly when applied to a piece of code that has been transferred. When a line of code we're interested is blamed on a transfer commit, we'll have to clone the original repo, check out the commit mentioned in the transfer commit message, and re-run git blame. That's unfortunate, but I think it's better than any alternative I could think of.