Jon Dehdari




Git Logo




Research Collaboration using Git



Git

Git is a popular software for version control and collaboration. It tracks changes and helps you synchronize files across multiple computers. Using branches it's easy to try out crazy experimental changes and see if they work out.

If you don't know how to use it, well, now's a good time to learn! Here are some useful links to help you learn:

In addition to having your Git repository (repo) on your own computer, you can have your repo hosted on several different sites, including Github, Bitbucket, and Gitlab. You can synchronize between the repo on your computer and on a third-party site. Since your Git repo contains a full history, it's not very important (technologically) which website, if any, you use. They each have pluses and minuses.

Public Code

Github is a very popular site to share your software code. Think of it as a social network for programmers. You can have an unlimited number of public Git repositories, but you must pay for any private repos.

You can create a repo called yourusername.github.io , and put web pages in that repo. It will then be accessible as http://yourusername.github.io Each software repo of yours can also have an accompanying website, using a new branch called gh-pages. The project's website will then be http://yourusername.github.io/projectname

Private Paper and Experimental Results

Both
Bitbucket and Gitlab allow you to have unlimited free private repos. Bitbucket is more popular, but limits the number of people you can share your private repos to 5 people, then you must pay. Gitlab has no such restrictions, but it less popular.

For example, let's say that your username is myusername, your repo is named lm-paper and you have a collaborator named Bob, who has a Bitbucket ID of bobusername :

  1. For private repos, first send an invitation of your private repo to your collaborator(s) (see top-right box "Invite users to this repo"). Then have them also share their fork back with you.
  2. Now, for either public or private repos, type the following:
  3. Bookmark Bob's repo as "bob":
    git remote add bob https://myusername@bitbucket.org/bobusername/lm-paper.git
  4. Show all bookmarks (Git calls these remote's) :
    git remote -v
  5. Grab his changes:
    git fetch bob
  6. Merge his changes into your repo:
    git merge bob/master
  7. Push the combined result upstream:
    git push

Readme Files

Your Git repo should always include two files: README.md and LICENSE.txt .

The README.md file gives an overview of your Git project, including the software name, how to install/compile, command-line usage, copyright, etc. The .md suffix indicates that it is in the Markdown format, which is like plaintext, but includes some simple additions to make it look nice in your web browser. Here is the Wikipedia article on Markdown

The LICENSE.txt file tells others how they can use your software. If you don't specify a license, then you're not granting the right for others to use your code. Consider using a Free software license to encourage others to use and build upon your software.

Large Data Files

Really large data files should not be in a Git repo. Instead, you can use related tools like
Git-LFS or Git-Annex. These tools will track changes to the file, but they won't store the file within the Git repo itself. The Git repo will store the fact that a given large data file has changed, and help you to easily synchronize data changes across computers.


I'm an egotistical bastard, and I name all my projects after myself. First Linux, now git.
–Linus Torvalds