GIT Source Control

GIT is a software configuration management (SCM) tool. This page offers notes on some of the more useful GIT commands and workflows such as merging, rebasing, hunks and patches.

Page Contents

References

  1. GIT SCM.
  2. Pro Git eBook.
  3. Ry's Git Tutorial, Ryan Hodson.
  4. Git Basics - Undoing Things, Git Docs.
  5. Undoing Changes, Atlassian Git Tutorial Page.
  6. How to undo the last commits in Git?, SO Thread.
  7. Awesome page... just found this and it looks great: Visualizing Git Concepts with D3. Visit this page, it is really cool!
  8. Common Git Scenarios.

To Read / To Do

https://stackoverflow.com/questions/35941566/delete-remote-branch-via-git
https://stackoverflow.com/questions/1889559/git-diff-to-ignore-m
https://stackoverflow.com/questions/1822849/what-are-these-ms-that-keep-showing-up-in-my-files-in-emacs
https://stackoverflow.com/questions/1510798/trying-to-fix-line-endings-with-git-filter-branch-but-having-no-luck/1511273
https://stackoverflow.com/questions/20106712/what-are-the-differences-between-git-remote-prune-git-prune-git-fetch-prune

Where Git Finds Your SSH File

I had a little problem on Windows. When I connected from my office location Git would fail to SSH into GitHub. When I connected from home Git would connect to GitHub with no problem whatsoever.

The reason for this is that when connecting from the office my HOME DIRECTORY changed and git was looking for my key files relative to the home directory!

The solution was found in this SO thread answer. The solution is to create a shell script that will involke the ssh client with the a parameter to tell it where to look for key files: give this parameter an obsolute path that is not relative to HOME.

#!/bin/sh
ssh -i /absolute/path/to/key/folder/id_rsa $*

In the above script id_rsa is the key filename. You might have saved your private key with a different file name. Save the script somewhere and give it executable permissions (chmod +x filename).

Then set the environment variable GIT_SSH to the absolute path to the script you just created.

export GIT_SSH=/path/to/your/script.sh

Now your git push operations etc should work just fine.

Create A Public/Private Keypair And Add To Your Keyring

References:

Check your home directory (if on a corporate network it might not be what you expect).

echo $HOME
echo ~

To create SSH key pair:

ssh-keygen

To see your public key type:

cat ~/.ssh/id_rsa.pub

To check that SSH is working...

ssh -T git@bitbucket.org
# or...
ssh -T git@github.com

If you see an error message saying "Could not open a connection to your authentication agent" make sure that your ssh agent is running. If it isn't run eval `ssh-agent -s` to start it [Ref].

For more detailed debugging use:

ssh -vvv -T git@bitbucket.org

To see if the key has been added to the client keyring...

ssh-list -l
# or...
ssh -vT address-of-server

Followed the GitHub instructions to create the private/public RSA key pair. Having obtained this key it can be added to the keyring using the following, but this addition is not permanent.

ssh-add ~/.ssh/some_key_name

So to make the addition permanent I consulted this SO thread. When you created the OpenSSH key pair you probably stored it as ~/.ssh/some_key_name (the private key) and ~/.ssh/some_key_name.pub (the public key). To add these permanently to your keyring edit the file ~/.ssh/config (and if it is empty just create it). Add the following line (replicate once for each key you wish to permenantly add):

IdentityFile ~/.ssh/some_key_name

Debugging A Git Connection

First try connecting over HTTPS:

ssh -T -p 443 git@ssh.github.com

If this works you can then try to debug your SSH connection. You can use GIT_SSH_COMMAND [Ref] as so, replacing the git command with whatever you like:

GIT_SSH_COMMAND="ssh -vvv" git push

If this isn't playing ball, checkout your ~/.ssh/config file. It should contain this [Ref]:

Host github.com
  Hostname ssh.github.com
  Port 443

Git Workspace v.s. Index v.s. Repository

Diagram of Git worspace vs index vs respository

Configuring Git

Setup your user name and email globally:

git config --global user.name "name"
git config --global user.email "email"

You can override the global settings for specific repos by chaning to the repo's root and running:

git config user.name "name"
git config user.email "email"

Setup your editor and diff tool:

git config --global core.editor gvim
git config --global merge.tool vimdiff

To see your config setup:

git config --list

To see a specific key’s config:

git config key-name

For example...

git config user.name

Search (Grep) Your Repo

Using git grep ignores git interal files and generated files, which can make your grepping life a lot easier!

Cheat Sheet Of Basic Git Commands

Creating Git Repos

To create a repo in the CWD:

git init

Cloning Git Repos

To clone a repo:

git clone https://repo_addr [dirname]
git clone git://repo_addr [dirname]

This pulls down data for the repo into dirname, if provided, or to an automatically decided dirname in the CWD otherwise. Note you get the entire repo including all the history copied locally. A remote called "origin" will be automatically added to point to https://repo_addr.

Checking Out Repos

Checking out a commit makes the entire working directory match that commit. This can be used to view an old state of your project without altering your current state in any way. Checking out a file lets you see an old version of that particular file, leaving the rest of your working directory untouched.

git checkout commit-hash | tag-name | branch-name

Git Branches

A branch is lightweight moveable pointer to a commit and the default branch name is "master". HEAD points to current branch head, so changes when you change branch.

To list all the branches in the repo type:

git branch

You can filter the list with the --merged and --no-merged options to see branches that are not merged back in etc.

To create and checkout a new branch use:

git branch new-branch-name
git checkout new-branch-name

The shorthand for the above is:

git checkout -b new-branch-name

To "delete" a branch (branch history still exists and is recoverable):

git branch -d branch-name

To switch branches use:

git checkout branch-name

Now, HEAD will point to new branch and NOT prev branch.

Note that switching branches changes files in your working directory. When you switch branches, Git resets your working directory to look like it did the last time you committed on that branch.

Adding Files To Your Git Repo

You must add files from your workspace into the Git index so that Git knows what to track...
git add .  # adds all files (tracked or not) in cwd
git add -u # puts all changed tracked files into staging area (i.e., wont add new files)
git add -A # does both of the above

Note that if you modify a file after staging, the modifications after the stage will not be committed in this commit unless added again.

Committing Files

Each commit is pointer to snapshot plus author/committer details, msg etc and pointer to prev commit (or commits if this is a result of merge).

To commit all staged changes:

git commit -m "your msg here"

To commit all tracked files, staged or not, use:

git commit -a -m "..."

Any tracked file that is not staged will be automatically staged.

To ammed the last commit, adding in all currently staged files use:

git commit --amend

This will give you the chance to modify the last commit (launches text editor) and will also add any currently staged files into the last commit. For example:

git commit -m "a commit where I forgot to add a file"
git add forgotten_file
git commit --amend

Git Logs: Viewing Your History

To get a summary of changes made with author, date, full commit hash and description:

git log

To do the same but for a particular directory:

git log <options> -- <dirname>

To get a one-line-per-commit summary (short hash and description):

git log --online

To get a log between versions/tags/commits/nranches etc:

git log chng..chng

where chng can be a tag name, a commit's hash, a branch name etc.

To limit the log to the last 5 entries:

git log -5

To view the differences introduced in each commit:

git log -p

To view the logs for the last two weeks:

git log --since=2.weeks

Use "<number>.<time-period-type>". So, for example, the time period could also be "days" or "hours".

To grep the log for commits that introduces a change to the code that added or removed a search string:

git log -S<search-string>

To really pretty-print your logs:

git log --oneline --decorate --graph --all

To show the log with full path names and status of changed files:

git log --name-status

To show the log with abbreviated pathnames and a diffstat of changed files:

git log --stat [-M]

Add the -M option to detect and show moves.

Tagging Files With Git

Tags are "nice" names by which you can identify snapshots of a repo. For example, you will likely tag releases.

git tag -a <tag-name> -m "Description of tag"
        ^^
        -a means create an annotated tag (stores name, date and msg)

To list all the available tags in your repo:

git tag

To search for tags with names matching a given pattern:

get tag -l <pattern>

An important point to note is that git push does NOT transfer tags to remote servers! You must do this explicity using git push origin <tagname>

Undoing Things

This section remains a little cheat sheety in style. I'm going to expand on it in a later section...

Revert unstanged changes:

Kinda like reverting a file in the SVN sense

git checkout -- filename

By refering to the diagram describing the working directory vs the index we can understand what this is doing. We have modified a local file but not staged the changes yet. By checking out the file we're just overwriting the changes in the working directory, which Git, because we haven't staged these changes, knows nothing about.

To unstage a file:

git reset HEAD -- filename

This resets the index entry for "filename" to its state at HEAD, therefore removing it from the index. Note this changes the index entry, which is distinct from your working directory. The file in the working directory is not touched.

To undo all uncomitted changes to tracked files:

git reset --hard

The option --hard resets (to the most recent commit) the index and the working tree.

A Note About Reset:

A git reset actually alters your history and can only work backwards from the current commit. By altering the history you could potentially loose history. Also once a change is pushed to a shared repository, resetting the change afterwards can become troublesome!

If git revert is a "safe" way to undo changes, you can think of git reset as the dangerous method. When you undo with git reset(and the commits are no longer referenced by any ref or the reflog), there is no way to retrieve the original copy — it is a permanent undo. Care must be taken when using this tool, as it’s one of the only Git commands that has the potential to lose your work.

To remove a file from repo withOUT deleting it locally:

git rm --cached myfile.name

To remove untracked files use:

git clean -f

To undo a commit use:

git revert commit-id

The git revert command undoes a committed snapshot. But, instead of removing the commit from the project history, it figures out how to undo the changes introduced by the commit and appends a new commit with the resulting content. This prevents Git from losing history, which is important for the integrity of your revision history and for reliable collaboration.

Listing Files In The Git Repo

To list files currently being tracked on branch master:

git ls-tree -r master [--name-only]

The above will list all files and directories in master, recursively, that are being tracked. Each line output is for one file and consists of the file's latest comment hash and the file path. If --name-only is used then the commit hash is ommitted.

To do the same, but list on the current branch:

git ls-tree -r HEAD --name-only

Diffing With Git

git diff: What you have changed but NOT staged
git diff --staged: What will go into next commit
git diff --cached: Synonym for --staged
git difftool:      To use external diff tool

To view only the names of files changed between two commits:

git diff 99f6eae 0544f6a --name-only

To produce a diff between commits but only for a subset of files, for example do:

diffstart=99f6eae
diffend=0544f6a
wanted=$(git diff $diffstart $diffend --name-only  | grep -v "\(xls\|tsv\)$")
git diff $diffstart $diffend --full-index  -- $wanted

Diffing Between 2 Repos

From this SO thread:

Question:

How can we get the difference between two git repositories?

The scenario: We have a repo_a and repo_b. The latter was created as a copy of repo_a. There have been parallel development in both the repositories afterwards. Is there a way we can list the differences of the current versions of these two repositories?

Answer:

In repo_a:

git remote add -f b path/to/repo_b.git
git remote update
git diff master remotes/b/master
git remote rm b

Git Parlance & Internal Structure

What Are Commits And How Do They Store Snapshots?

The Git Book on the internals of Git (Git Objects) has a good explanation of how Git repos are organised [Ref] [Ref].

To summarise a little:

  1. Commit - stores meta data (author, committer, comment) and pointer to the top level tree snapshot as well as the previous commit (or commits if a merge commit). The commit object itself is identified by a sha1 pointer. Note that we said "top level": the commit is a snapshot of your entire repo.
  2. Tree - corresponds to a directory - collection of pointers (pair of sha1 hash and file name) to file blobs and trees.
  3. Block - a binary blob representing a file with an associated sha1 hash.

Diagram of GIT commits, trees and blobs

Commit-ish & Tree-ish?!

Two terms you might see in the documentation are "commit-ish" and "tree-ish" [Ref].

"Commit-ish" means an identifier that references a commit object in the repo. For example a tag is an identifer that references a commit, as is the sha1 hash for that commit.

"Tree-ish" means an identified that references part of the repo's tree. Again, a tag can be "tree-ish" because it references a commit, from which we can access the files and folders that the commit is a snapshot of.

What is HEAD?

TODO:
    https://stackoverflow.com/questions/5772192/how-can-i-reconcile-detached-head-with-master-origin
    https://git-scm.com/book/tr/v2/Git-Internals-Git-References

The HEAD file is a symbolic reference to the branch you’re currently on. By symbolic reference, we mean that unlike a normal reference, it doesn’t generally contain a SHA-1 value but rather a pointer to another reference

Revert & Reset

As we have seen in previous quotes above, revert is a safer version of reset. The former creates a new commit which represents the changes needed to to revert the former HEAD to a previous commit. It does not change any of the previous commits.

A reset on the other hand has the potention to obiliterate information.

Resetting The Index (aka. The Staging Area)

One of the cheat-sheet snippets we saw earlier was git reset HEAD -- filename, which unstaged a file. From the Git help we get the following explanation:

git reset [-q] [<tree-ish>] [--] <paths>...

This form resets the index entries for all <paths> to their state at <tree-ish>. (It does not affect the working tree or the current branch.)

This means that git reset <paths> is the opposite of git add <paths>.

Output of "git reset --help"

The effect can be visualised as shown in the diagram below:

Diagram showing effect of git reset HEAD -- filename

Moving Your Branch HEAD Around & Maybe Modify The Working Tree + Index

Another form of the git reset command can be used to move the HEAD of the current branch around. In all forms the HEAD of the branch is moved to a specific commit. What happens to the index and woking tree depends on the mode in which it is used...

git reset [<mode>] [<commit>]

This form resets the current branch head to <commit> and possibly updates the index (resetting it to the tree of <commit>) and the working tree depending on <mode>. If <mode> is omitted, defaults to "--mixed". The <mode> must be one of the following:

  • --soft:

    Does not touch the index file or the working tree at all (but resets the head to <commit>, just like all modes do). This leaves all your changed files "Changes to be committed", as git status would put it.

  • --mixed

    Resets the index but not the working tree (i.e., the changed files are preserved but not marked for commit) and reports what has not been updated. This is the default action.

    If -N is specified, removed paths are marked as intent-to-add.

  • --hard

    Resets the index and working tree. Any changes to tracked files in the working tree since <commit> are discarded.

  • --merge

    Resets the index and updates the files in the working tree that are different between <commit> and HEAD, but keeps those which are different between the index and working tree (i.e. which have changes which have not been added). If a file that is different between <commit> and the index has unstaged changes, reset is aborted.

    In other words, --merge does something like a git read-tree -u -m <commit>, but carries forward unmerged index entries.

  • --keep

    Resets index entries and updates files in the working tree that are different between <commit> and HEAD. If a file that is different between <commit> and HEAD has local changes, reset is aborted.

Output of "git reset --help"

Soft Reset: Move HEAD, Touch Nothing!

Mixed Reset: Move HEAD, Touch Only The Index

Diagram of a git reset --mixed command

Hard Reset: Move HEAD, Touch The Index And The Working Tree

Diagram of a git reset --hard command

Branches And Commits (v.s. SVN-like SCMs)

Creating branches in Git vs SVN

Add & Commit A Subset Of Changes: Hunks

I enjoyed learning about this because it is quite often that I find I've fixed two bugs or more in the same file. This can be because the two bugs are so tightly coupled that fixing one depended on another, or it was a quick fix for something I noticed on the fly and was so simple it wasn't worth creating a branch just for this fix. Either way, I end up with a file that contains changes that address multiple bugs/tickets.

Note: you can also accomplish the same thing using an interactive rebase.

Lets test it out. Create a new repo and add a test file...

$ git init learngit
Initialized empty Git repository in C:/Users/jh/Documents/Sandbox/learngit/.git/

$ cd learngit

$ echo "#include <stdio.h>
        return 0;
>
> int main(int argc)
> {
>         return 0;
> }
> " > test.c

$ git commit -m "Create a test file.

Create a test file so that in a future commit I can test patch adds"
[master (root-commit) f573b91] Create a test file.
 1 file changed, 6 insertions(+)
 create mode 100644 test.c

We've created a little git repository and added one file called test/c to it. Very very simple. Now let's spoof making two bug fixes by adding the following comments just above the return 0 statement:

/* This line fixes bug A*/
/* This line fixes bug B*/
/* This line fixes bug A*/

Okay, now I want to make two seperate commits as I'd like to have one commit that deals soley with the changes I've made to fix bug A, and another commit that deals soley with the changes I've made to fix bug B. This way, any one looking back through the logs will be able to easily understand which changes fix which bug.

To do this I use the following command:

git add -p test.c

This launches an