Version control

Introduction to git & GitHub

June 2025

Nicolas Casajus

Romain Frelat

Introduction

What is reproducibility?


Reproducibility is about results that can be obtained by someone else (or you in the future) given the same data and the same code. This is a technical problem.


 We talk about Computational reproducibility

Reproducibility spectrum


Source: Peng (2011)[^peng]


Each degree of reproducibility requires additional skills and time. While some of those skills (e.g. literal programming, version control, setting up environments) pay off in the long run, they can require a high up-front investment.

Concepts

According to Wilson et al. (2017)1, good practices for a better reproducibility can be organized into the following six topics:




 Data management

 Project organization

 Tracking changes


 Collaboration

 Manuscript

 Code & Software

Version control

Motivations

Project content (without git)

Questions

  • Which version of analyses.R is the final one?
  • What about data.csv?
  • What are the differences between versions?
  • Who have contributed to these versions? When?

 We need a tool that deals with versions for us

Motivations

Project content (without git)

Project content (with git)

Presentation of git

git is a Version Control System (VCS).

Presentation of git

git is a Version Control System (VCS). With git you can:

  • keep your working copy clean
  • make contributions transparent
    (what | who | when | why)
  • keep the entire history of a file (and project)
  • inspect a file throughout its life time
  • revert back to a previous version
  • handle multiple versions (branches)
  • facilitate collaborations w/ code hosting platforms
    (GitHub, GitLab, Bitbucket, etc.)
  • backup your project



A word of warning

git and GitHub are not the same thing

  • git is a free and open-source software
  • GitHub (and co) is a web platform to host and share projects tracked by git


In other words:

You do not need GitHub to use git but you cannot use GitHub without using git

git as a CLI


  • Git is a command-line interface (CLI)
  • You interact with git using a terminal
  • All commands start w/ the keyword git
    (git status / log / add / commit)

 But a lot of third-party tools provides a graphical interface to git
(e.g. RStudio, GitKraken, GitHub Desktop, extensions for VSCode, VSCodium, neovim, etc.)


Just keep in mind that for some operations you will need to use the terminal

RStudio and git

Git main panel

RStudio and git

Stage files, view differences and commit changes

View history and versions

Installation and configuration

Check out and follow the tutorial from N. Casajus, 2024, Setting up R: https://frbcesab.github.io/rsetup/.

Using git for tracking changes

How does git work?

  • git takes a sequence of snapshots
  • Each snapshot can contain changes for one or many file(s)
  • User chooses which files to ‘save’ in a snapshot and when
    (!= file hosting services like Dropbox, Google Drive, etc.)


 In the git universe, a snapshot is a version, i.e. the state of the whole project at a specific point in time


A snapshot is a two-step process:

  • Stage files: select which files to add to the version
  • Commit changes: save the version and add metadata (commit message)

Basic workflow

 Initialize git in a (empty) folder (repository)


git init


The three areas of a git repository:

  • working copy: current state of the directory (what you actually see)
  • staging area: selected files that will be added to the next version
  • repository: area w/ all the versions
    (the .git/ subdirectory)

Basic workflow

 Add new files in the repository


git status

# On branch main
# 
# No commits yet
# 
# Untracked files:
#   README.md
#   analyses.R
#   data.csv
# 
# Nothing added to commit but untracked files present
# Use "git add <file>..." to track

Basic workflow

 Stage (select) one file


git add data.csv


git status

# On branch main
# 
# No commits yet
# 
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#   new file:   data.csv
# 
# Untracked files:
#   (use "git add <file>..." to track)
#   README.md
#   analyses.R

Basic workflow

 Stage (select) several files


git add data.csv analyses.R


git status

# On branch main
# 
# No commits yet
# 
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#   new file:   analyses.R
#   new file:   data.csv
# 
# Untracked files:
#   (use "git add <file>..." to track)
#   README.md

Basic workflow

 Stage (select) all files


git add .


git status

# On branch main
# 
# No commits yet
# 
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#   new file:   analyses.R
#   new file:   data.csv
#   new file:   README.md

Basic workflow

 Commit changes to create a new version


git commit -m "a good commit message"

Basic workflow

 Now we are up-to-date


git status

# On branch main
# nothing to commit, working tree clean

Commits

When committing a new version (w/ git commit), the following information must be added:

  • WHO - the person who has made the changes
    (automatically added by git)
  • WHEN - the date of the commit
    (automatically added by git)
  • WHAT - the files that have been modified
    (selected by the user w/ git add)
  • WHY - the reason of the commit, i.e. what has been done compared to the previous version
    (added by the user w/ git commit)

A commit message has a title line, and an optional body

# Commit message w/ title
git commit -m "title"


What is a good commit message?

A good commit title:

  • should be short (less than 50 characters)
  • should be informative and unambiguous
  • should use active voice and present tense
# Print git history
git log --oneline

# f960dd3 (HEAD -> main) add data cleaning script
# dd4472c update data.csv
# 2bb9bb4 add README.md
# 2d79e7e first commit

When should you commit?


  • Commit a new version when you reach a milestone
  • Create small and atomic commits
  • Commit a state that is actually working

Undoing things

  1. Undo recent, uncommitted and unstaged changes

You have modified a file but have not staged changes and you want to restore the previous version


git status

# On branch main
# Changes not staged for commit:
#   (use "git add <file>..." to stage changes)
#   (use "git restore <file>..." to discard changes)
#   modified:   data.csv
#
# No changes added to commit

Undoing things

  1. Undo recent, uncommitted and unstaged changes

You have modified a file but have not staged changes and you want to restore the previous version


# Restore one file (discard unstaged changes)
git restore data.csv


git status

# On branch main
# Nothing to commit, working tree clean

Undoing things

  1. Undo recent, uncommitted and unstaged changes

You have modified a file but have not staged changes and you want to restore the previous version


# Restore one file (discard unstaged changes)
git restore data.csv


git status

# On branch main
# Nothing to commit, working tree clean


 To discard all changes:

# Cancel all non-staged changes
git restore .

Undoing things

  2. Unstaged uncommitted files

You have modified and staged file(s) but have not committed changes yet and you want to unstage file(s) and restore the previous version


git status

# On branch main
# Changes to be committed:
#   (use "git restore --staged <file>..." to unstage)
#   modified:   data.csv

Undoing things

  2. Unstaged uncommitted files

You have modified and staged file(s) but have not committed changes yet and you want to unstage file(s) and restore the previous version


# Unstage one file
git restore --staged data.csv


git status

# On branch main
# Changes not staged for commit:
#   (use "git add <file>..." to stage changes)
#   (use "git restore <file>..." to discard changes)
#   modified:   data.csv
#
# No changes added to commit

Undoing things

  2. Unstaged uncommitted files

You have modified and staged file(s) but have not committed changes yet and you want to unstage file(s) and restore the previous version


# Unstage one file
git restore --staged data.csv


git status

# On branch main
# Changes not staged for commit:
#   (use "git add <file>..." to stage changes)
#   (use "git restore <file>..." to discard changes)
#   modified:   data.csv
#
# No changes added to commit


You can now restore the previous version w/:

# Discard changes (restore previous version)
git restore data.csv

Undoing things

  3. Revert one commit

You want to reverse the effects of a commit: use git revert


# Print git history
git log --oneline

# f960dd3 (HEAD -> main) commit 4
# dd4472c commit 3
# 2bb9bb4 commit 2
# 2d79e7e commit 1

Undoing things

  3. Revert one commit

You want to reverse the effects of a commit: use git revert


# Print git history
git log --oneline

# f960dd3 (HEAD -> main) commit 4
# dd4472c commit 3
# 2bb9bb4 commit 2
# 2d79e7e commit 1


# Revert commit dd4472c
git revert dd4472c

Undoing things

  3. Revert one commit

You want to reverse the effects of a commit: use git revert


# Print git history
git log --oneline

# f960dd3 (HEAD -> main) commit 4
# dd4472c commit 3
# 2bb9bb4 commit 2
# 2d79e7e commit 1


# Revert commit dd4472c
git revert dd4472c


# Print git history
git log --oneline

# d62ad3e (HEAD -> main) Revert "commit 3"
# f960dd3 commit 4
# dd4472c commit 3
# 2bb9bb4 commit 2
# 2d79e7e commit 1

git revert does not alter the history and creates a new commit

Undoing things

  4. Deleting commits

You want to delete one or more commits: use git reset --hard


# Print git history
git log --oneline

# f960dd3 (HEAD -> main) commit 4
# dd4472c commit 3
# 2bb9bb4 commit 2
# 2d79e7e commit 1


# Delete the two more recent commits
git reset --hard 2bb9bb4


# Print git history
git log --oneline

# 2bb9bb4 (HEAD -> main) commit 2
# 2d79e7e commit 1

git reset --hard alters the history. Be careful with this command

 Using GitHub for collaboration

Code hosting platforms

GitHub and co are cloud-based git repository hosting services

  Perfect solutions to collaborate on projects tracked by git


Services

  • Full integration of version control (commits, history, differences)
  • Easy collaboration w/ branches, forks, pull requests
  • Issues tracking system
  • Enhanced documentation rendering (README, Wiki)

Presentation of GitHub

Overview

  • Created in 2008
  • For-profit company (property of Microsoft since 2018)
  • Used by more than 100 million developers around the world


Advantages

  • User-friendly interface for git
  • Free account w/ unlimited public/private repositories
  • Organization account (w/ free plan)
  • Advanced tools for collaboration
  • Static website hosting

GitHub - Account homepage

GitHub - Organization homepage

GitHub - Repository homepage

Create a repository

Create a repository

Create a repository

Get the URL to clone

Clone a repository w/ RStudio


Select Version Control

Select Git

Copy the URL and fill all the fields

Local copy of a repository

Working w/ GitHub

 Add a new file: README.md


git status

# On branch main
# Your branch is up to date with 'origin/main'
#
# Untracked files:
#   README.md
# 
# Nothing added to commit but untracked files present
# Use "git add <file>..." to track

Working w/ GitHub

 Stage changes


git add .


git status

# On branch main
# Your branch is up to date with 'origin/main'
#
# Changes to be committed:
#   (use "git restore --staged <file>..." to unstage)
#   new file:   README.md

Working w/ GitHub

 Commit changes


git commit -m "add README"


git status

# On branch main
# Your branch is ahead of 'origin/main' by 1 commit.
#   (use "git push" to publish your local commits)
# 
# nothing to commit, working tree clean

Working w/ GitHub

 Push changes to remote


git push

# Sometimes, you'll need to use:
git push -u origin main


git status

# On branch main
# Your branch is up to date with 'origin/main'.
# 
# nothing to commit, working tree clean

Working w/ GitHub

 Pull changes from remote

Working w/ GitHub

 Pull changes from remote


git pull


git status

# On branch main
# Your branch is up to date with 'origin/main'.
# 
# nothing to commit, working tree clean

Working w/ GitHub

  Make local changes


git status

# On branch main
# Your branch is up to date with 'origin/main'.
# 
# Changes not staged for commit:
#  (use "git add <file>..." to update what will be committed)
#  (use "git restore <file>..." to discard changes in working directory)
#    modified:   data.csv
#    modified:   README.md

Working w/ GitHub

  Stage changes


git add .


git status

# On branch main
# Your branch is up to date with 'origin/main'
#
# Changes to be committed:
#  (use "git restore --staged <file>..." to unstage)
#   modified:   data.csv
#   modified:   README.md

Working w/ GitHub

  Commit changes


git commit -m "update dataset and README"


git status

# On branch main
# Your branch is ahead of 'origin/main' by 1 commit.
#   (use "git push" to publish your local commits)
# 
# nothing to commit, working tree clean

Working w/ GitHub

  Don’t forget to Push changes to remote


git push


git status

# On branch main
# Your branch is up to date with 'origin/main'.
# 
# nothing to commit, working tree clean

Help me, I can’t push!

When you try to push, you might see this following error message:

git push

# To github.com:ahasverus/projectname.git
#  ! [rejected]        main -> main (fetch first)
#
# error: failed to push some refs to 'github.com:ahasverus/projectname.git'
#
# hint: Updates were rejected because the remote contains work that you do
# hint: not have locally. This is usually caused by another repository pushing
# hint: to the same ref. You may want to first integrate the remote changes
# hint: (e.g., 'git pull ...') before pushing again.
# hint: See the 'Note about fast-forwards' in 'git push --help' for details.


 Just git pull and try to git push again

Help me, I can’t pull!

When you try to pull, you might see this following error message:

git pull

# [...]
# Auto-merging README.md
# CONFLICT (content): Merge conflict in README.md
#
# error: could not apply b8302e6... edit README
#
# hint: Resolve all conflicts manually, mark them as resolved with
# hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
# hint: You can instead skip this commit: run "git rebase --skip".
# hint: To abort and get back to the state before "git rebase", 
# hint: run "git rebase --abort".


 Welcome to the wonderful world of git conflicts
A git conflict appears when two versions cannot be merged by git because changes have been made to the same lines.


 You have to decide which version you want to keep.

The .gitignore

 We can also tell git to ignore specific files: it’s the purpose of the .gitignore file


Which files? For instance:

  • passwords, tokens and other secrets
  • temporary files
  • large files

The syntax is simple:

# Ignore a specific file
README.html

# Ignore all PDF
*.pdf

# Ignore a folder
data/

# Ignore a subfolder
data/raw-data/

# Ignore a specific file in a subfolder
data/raw-data/raw-data.csv


 Template for projects available here

Github as a gateway to open source projects

You can access millions of open source projects and contribute to their development.

And if your Github repository is public, everyone can use and contribute to your project.

  • Fork existing project : create an independent copy of a repository
  • Pull request : to contribute to the development of a repository
  • Merge branches : accept development from other branches

Github as a social platform

  • Watch/Star existing repository or Follow colleagues/developers
  • Issues : anyone can fill an issue
    • good for keeping track of the todo list and future development
    • issue can be attributed to someone, and categorized
    • awesome tool to receive feedback from ‘users’ / colleagues
  • Wiki and ReadMe : help to organize the documentation of your code

Take home message



  • Git and Github are IT tools made to help you with the software development of your ecological data analysis.

  • Key git/Github commands to remember:

pull
stage and commit
push



Many resources online.
Please contact me if you have any issue using Git/Github: romain.frelat@fondationbiodiversite.fr