Everyone is welcome here --- except those who have borrowed books from me for and have not returned them yet!

Version control at your fingertips: a quick start with Git

Posted on February 28, 2018 in computer-science

Comparing files or directories

The most basic task is to compare two files. Your text editor may already have such a function built in --- for example, in Emacs, it is accessible through the command ediff or the menu Tools/Compare).

You can also compare two files on the command-line with the command diff:

diff file1 file2

Another command, with a more user-friendly output, is meld:

$ meld file1 file2

(you may need to install the program meld with sudo apt-get install meld)

These tools also work on directories. To quickly check if there is any difference between two directories:

$ diff -r -q dir1 dir2


meld dir1 dir2

If you compare text files that contain natural language, the recommended tool is wdiff which ignores changes in whitespaces (line breaks, etc.). For latex files, latexdiff produces a formatted output that clearly shows the textual differences.

Introduction to version control

A version control system keeps track of the history of changes made to a set of documents and allows to recall specific versions later.

Many use a numbering scheme to keep track of the evolving versions of files, but this is not a good idea, especially when collaborating with several people.
To keep track of changes made to files in a directory, I highly recommend that you use a version control software. Personally, I use git.

You can read the Git Parable to understand the principles of Git. Here, I will describe just a few basic git commands. The Git Book is the definitive documentation.

To install git:

  • For Windows users, first consider replacing Windows with Linux on your computer. If you cannot, download the installer from https://gitforwindows.org and execute it (accept all the proposed default settings, notably the 'Use Git-Bash' and use 'Mintty').

  • For a Debian-based Linux based system:

    sudo apt-get install git gitk

  • For other systems, follow the instructions on https://www.atlassian.com/git/tutorials/install-git

Creating a local repository

From scratch:

mkdir project
cd project
git init
Initialized empty Git repository in /home/pallier/cours/Python/version_control/git-test/.git/

Importing an already existing repository

Alternatively, you can also import an existing repository, either from another directory, or from the Internet:

git clone https://github.com/chrplr/pyepl_examples

If you plan to share your repository, it is a good idea to first create a repository on http://github.com or http://bitbucket.org, and then clone it on your local hard drive. The internet location will be added to the list of remote repositories under the name origin (see below for remote repositories)

Importanly, with git, you can still do version control locally, and only transfer your changes to the remote repository whenever you want, or never, because git is a decentralized version control system and all repositories are equal.

Adding files to the local repository

While working on the projectdirectory, you can signal files to track using the git add command:

echo 'essai1' > readme.txt
git add readme.txt

Note that you can add entire directories, for example:

git add .

And that it is possible to prevent certain files to be tracked (see https://help.github.com/articles/ignoring-files)).

To check which files are currently being tracked (or staged in git's terminology), use the command `git status``:

git status
# On branch master
# Initial commit
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#       new file:   readme.txt

Creating a project and a first snapshot (committing)

Once you are satisfied with the files in your working directory, you can take a snapshot, that is make permanent copy of all the tracked files. This operation is also called commiting your changes:

git commit
[master (root-commit) a7a3a47] First commit
1 file changed, 1 insertion(+)
create mode 100644 readme.txt

This saves a snapshot of the staged files in the hidden directory .git at the root of your project. Unless you delete this directory, this version of your files is saved there forever and will always be accessible.

Before commiting, it is useful to issue the command:

git status

To check which files are tracked and which are not.

Modifying the project

Let us now modify the file readme.txt in the working directory:

echo 'line2' >readme.txt

The command git status allows us to check the state of the files in the working directory:

git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#       modified:   readme.txt
no changes added to commit (use "git add" and/or "git commit -a")
git add readme.txt
git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#       modified:   readme.txt

Let us create a new file, `readme2.txt``:

echo 'trial2' >readme2.txt
readme2.txt  readme.txt
git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#       modified:   readme.txt
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#       readme2.txt

We now add readme2.txt to the repository:

git add readme2.txt
git commit
[master a7e25a1] First revision; added readme2.txt
2 files changed, 2 insertions(+), 1 deletion(-)
create mode 100644 readme2.txt

Let us consult the history of the project:

git log
commit a7e25a158ce52a75c62381420f7dc375de631b1b
Author: Christophe Pallier <christophe@pallier.org>
Date:   Mon Aug 27 10:49:54 2012 +0200

First revision; added readme2.txt

commit a7a3a47edfae9d7c720356b691000a81ded73906
Author: Christophe Pallier <christophe@pallier.org>
Date:   Mon Aug 27 10:47:32 2012 +0200

First commit

git status
# On branch master
nothing to commit (working directory clean)

Renaming a file

To rename a tracked file, you should use git mv rather then just mv:

git mv file.ori file.new

Recovering a file deleted by accident

Let us delete readme2.txt "by accident":

rm readme2.txt # oops
git status
# On branch master
# Changes not staged for commit:
#   (use "git add/rm <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#       deleted:    readme2.txt
no changes added to commit (use "git add" and/or "git commit -a")

To recover it:

git checkout -- readme2.txt
readme2.txt  readme.txt
cat readme2.txt

Checking for changes

Let us now modify readme2.txt and then compare the file in the current directory from the ones in the last commit:

echo 'line2 of 2' > readme2.txt
git diff
diff --git a/readme2.txt b/readme2.txt
index 33d1e15..e361691 100644
--- a/readme2.txt
+++ b/readme2.txt
@@ -1 +1 @@
+line2 of 2
git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#       modified:   readme2.txt
no changes added to commit (use "git add" and/or "git commit -a")

You prefer meld, you can use

git difftool -t meld

Compare the working version of a file with the one in the last commit

git diff HEAD

Compare two branches

For visual diffs, I use meld:

sudo apt install meld
git config --global diff.tool meld

To list all branches:

git branch -a

Then, to see the differences between the branches

git difftool -d branch1..branch2

To compare a specific file:

git difftool branch1..branch2 -- filename

Another approach!

Git command line for merging without commiting:

git checkout branchA
git merge --no-commit --no-ff branchB
git gui

When done:

git merge --abort

Inspecting the history of the project

For a graphical view of the history of the project:

$ gitk

Downloading the most recent changes from the distant repository

If you imported your repository from the internet with 'git clone', you can import the recent changes with:

git pull

Pushing your changes to the distant repository

You can send your modified repository (after commiting) to the original remote internet repository:

git push

Compare the current folder with the remote origin/master

git fetch
git diff origin

Working with several remotes

To add a remote

git remote add -f nameforremote path/to/repo_b.git
git remote update

To list the remotes

git remote -v

To compare the current branch with one in a remote

git diff master remotes/b/master

To see branches on remotes

git branch -r

(To see local branches: git branch -l, all branches, git branch -a)

Handling very large files (e.g. data)

git-annex allows you to leave large files in some of the repositories and keep only links in others.

See https://writequit.org/articles/getting-started-with-git-annex.html and https://git-annex.branchable.com/walkthrough/