Git does what you tell it

I've spent hours this week helping students fix problems with their git repos. I've now seen the same types of things happen frequently enough, that I thought I'd offer some advice on how to avoid falling down a git-sized hole. Git does exactly what you tell it to do, regardless of whether or not you meant it to do that. It's worth being aware of what you're saying when you talk to it.

1) Never git add files using --all or -A or . or * or any other wildcard. I know why people do this: it feels like it takes forever to type or copy/paste all the file paths you want to add. Why not just say git add . and have it add everything that's changed? The problem is, that 1 in every 10 times you do this, you'll add more than you meant to, for example: a submodule change. Whatever amount of time you think you save by not explicitly typing out all the paths you're adding, I promise you that you'll lose it when you have to go back and try to figure out some mess because you accidentally added things you should not have.

2) Never git commit on your master branch. Branches in git are cheap, and you should make dozens of them. Make hundreds of them. Don't worry about having too many branches. Doing a bug and you want to try some small thing, make a branch off your branch. Want to try something else? Make a branch of that branch. You should be branching all the time: at least once per bug, feature, etc. You want to commit on every branch except your master branch. Here, you want to only pull changes from upstream and merge them in, ideally with a fast-foward merge. Here's your go-to pattern:

$ git remote add upstream <url-to-upstream-repo>
...
$ git checkout master
$ git pull upstream master
$ git checkout -b fix-bug-123

Before you start any bug fix or experiment, get your master branch up to date first, then branch off and start your work there.

3) Never merge master directly into your current branch. This seems like a good idea at the time: there are changes on master that you need in order to keep working, or to test your code fully; but doing so ends up causing headaches later when you want to land your code on master. Imagine you're working on a bug on branch fix-bug-123 and you need to also include what's on master, what do you do? Let's explore some options:

Option 1: rebase fix-bug-123 on master

$ git checkout master
$ git pull upstream master
$ git checkout fix-bug-123
$ git rebase master

Option 2: make a new temporary branch and merge with master:

$ git checkout master
$ git pull upstream master
$ git checkout -b fix-bug-123-master fix-bug-123
$ git merge master

In both cases you will potentially have to deal with merge conflicts. The way you do that depends on whether you're doing a rebase or a merge, but the effect is the same. What's nice about doing the rebase is that you'll probably want to do this later anyway; what's nice about doing the merge on a temporary branch, where you can try things out, is that you don't need to try and keep this work in sync: it's just a way to test things right now while you're working.

What if you make a mistake and do 1) or 2) above, how do you clean things up? Depending on how big a mess you're in, I'd recommend one of the following:

1) If committed work on your master branch, it's probably best to reset (note the -B vs. -b below) your branch to the upstream, so you can pull changes without conflicts. Before you do, drop a branch at your current master so you can get at old work:

$ git checkout -b broken-master master
$ git fetch upstream
$ git checkout -B master upstream/master

Now your master branch has been reset to what the upstream repo is on, and your current work is on broken-master.

2) If you need to get some commits from an old branch, but don't want everything from that branch due to a failed merge, you can cherry-pick commits. When you cherry-pick you tell git to selectively "copy" commits onto your current branch, one by one. If you have a branch with 3 good commits, and you want them all on a new branch off master, you could do:

$ git checkout master
$ git pull upstream master
$ git checkout -b new-branch
$ git cherry-pick <first-commit-sha-you-want-from-old-branch>
$ git cherry-pick <second-commit-sha-you-want-from-old-branch>
$ git cherry-pick <third-commit-sha-you-want-from-old-branch>

3) If you want the effect of some commits on a broken branch, but not the commits themselves, you can try creating a diff and then git apply it on a new branch:

$ git checkout master
$ git pull upstream master
$ git checkout -b bad-branch
$ git diff master HEAD > bad-branch.diff

Now you can hand-edit the file bad-branch.diff to remove any hunks you don't want (e.g., changes to files that shouldn't be there) and save it. Then you do this:

$ git checkout -b new-branch master
$ git apply bad-branch.diff

You'll end-up with a new commit that consolidates all the changes you had on the other branch.

4) If you want to keep going with a current PR, but need to blow away what's there (e.g., you merged master and wish you didn't), you can fix in one of the ways I mention above, then do this:

$ git checkout -b bad-branch-bak bad-branch
$ git checkout bad-branch
...fix bad-branch somehow...
$ git push origin bad-branch -f

To summarize what I'm doing above:

  • git checkout -b <branch-name> [base-commit | HEAD] when you do a checkout with -b you are saying: "Create a new branch named branch-name based on commit/branch base-commit, or use HEAD (current commit), and check it out for me.
    • git checkout -b new-branch makes a branch, new-branch off the current HEAD commit.
    • git checkout -b new-branch HEAD makes a branch, new-branch off the current HEAD commit. Identical to the above, but explicitly says HEAD
    • git checkout -b new-branch another-branch makes a branch, new-branch off same commit that another-branch is on.
  • git checkout -B new-branch another-branch makes or resets a branch, new-branch off same commit that another-branch is on. If new-branch didn't exist, it gets created; if new-branch did exist, it gets moved. Want to blow away your master branch locally and make it match what's upstream? git reset --hard HEAD && git fetch upstream && git checkout -B master upstream/master
  • git fetch upstream downloads commits, branches, etc. but does not merge what is in the upstream repo. It's like forking, but into your current fork.
  • git pull upstream master does a fetch of what's in the upstream master branch and then merges it with your current branch. Want to try doing a merge with what's in the upstream master with your current bug, do a pull into a new temporary branch: git checkout -b merge-attempt && git pull upstream master. Now merge-attempt is your current branch and the upstream master. Want to go back? git checkout - which tells git to go to your previous branch (like doing cd - in bash).