After completing this page, you will be able to:
- Explain how a GitHub repository stores and tracks changes to files.
- Create a copy of (i.e.
fork) other users’ files on GitHub.com .
- Use the
git clonecommand to download a copy of a GitHub repository to your computer.
About Git and GitHub
Previously in this textbook, you learned that git is tool that is used to track changes in files (a process called version control) through a suite of commands that you can execute in the Terminal. You also learned that GitHub allows you to store files in the cloud to access them from any computer and to share them with others.
You can use git and GitHub together in a workflow to make changes to files locally with git and to store and share your files on GitHub.com.
To work together, git and GitHub use repositories (i.e. directories of files) to manage and store files.
Data Tip: A GitHub repository is a directory of files and folders that is hosted on GitHub.com.
Having a copy of a set of files as GitHub repositories in the cloud is ideal because:
- There is a backup: If something happens to your computer, the files are still available online.
- You can share the files with other people easily.
- You can even create a Digital Object Identifier (DOI) using third party tools like Zenodo to cite your files or ask others to cite your files. You can also add these DOIs to your resume or C.V. to promote your work.
Directory Structure of Repositories
In essence, a repository is a directory for a specific project that is identified as a repository by git and GitHub because it contains a subdirectory called .git.
The .git subdirectory is created automatically, either by GitHub if it is created on GitHub.com or by git if the repository is created locally on a computer first (i.e. initialized as a repository). This .git subdirectory is used by these tools to manage and track the various tasks that are run on this directory (e.g. tracking changes to files in the repository).
Thus, you never need to access or modify the files in the .git subdirectory.
A typical repository (e.g.
project-name) is structured as follows:
project-name .git/ data/ scripts/ .gitignore README.md
In addition to the .git subdirectory, it is common to have subdirectories for specific files of a workflow (e.g. data, scripts).
The README.md is a Markdown file that is used to provide a description of the repository (i.e. its contents, purpose, etc), so that others can learn how to use the files in the repository.
The .gitignore file can be used to list the files that you do not want git to track (i.e. monitor via version control). You will learn more about both of these useful files later in this chapter.
URL of Repositories on GitHub.com
When a repository is stored on GitHub.com, it is assigned a unique URL (i.e. link on the GitHub.com website) that can be used to find the repository and access its files.
While repositories on GitHub.com can be made either public or private, the default is public for free GitHub accounts.
In either case (public or private), the URL links to a GitHub repository always follows the same format:
The username is the username of the creator (i.e. owner) of the repository. The username can either be an individual such as
eastudent (or your GitHub username!), or it can represent an organization such as
For example, the repositories that you will work with throughout this textbook will be owned by
earthlab-education, and thus, will have URLs that look as follows:
Create a Copy of Other Users’ Files on GitHub.com (Forking)
Using GitHub.com, you can make a copy of a GitHub repository owned by another user or organization (a task referred to as
forking a repository).
This means that you do not have fork a repository that you already own. Instead, other users can fork your repository if they would a copy to work with, and your original files will not be modified!
The ability to
fork a repository is a great benefit of using GitHub repositories because the forked repository is linked to the original.
This means that you (or other users) can download new updates from the original to your (or their) forked repository as well as suggest changes to the original repository, which can be reviewed by the owner of that repository.
Thus, forking facilitates collaboration while protecting the original versions of files by allowing users to work with copies of the original.
fork an existing GitHub repository from the main GitHub.com page of the repository that you want to copy, for example:
On the main GitHub.com page of the repository, you will see a button on the top right that says
Fork. The number next to
Fork tells the number of times that the repository has been copied or forked.
Click on the
Fork button and select your GitHub.com account as the home of the forked repository.
Once you have forked a repository, you will now have a copy of that repository in your GitHub account (i.e. a fork), and the URL to your fork will contain your username:
Later in this textbook, you will learn how to take advantage of the link between the original repository and the forked repository to suggest changes to the original repository, receive updates from the original repository to your fork, and collaborate with others.
Copy Files From GitHub.com to Your Local Computer (
To work locally with a GitHub repository (including forked repostories), you need to create a local copy of that repository on your computer (a task referred to as
cloining a repository).
You can clone GitHub repositories that you own or that are owned by others (e.g. repositories that you have forked to your GitHub account).
In either case, cloning allows you to create a local copy of a GitHub repository, so that you can work with the files locally on your computer.
Cloning a repository to your computer is a great way to work on your files locally, while still having a copy of your files on the cloud on GitHub.com.
Following the steps below, you will use the
git clone command in the terminal to clone GitHub repositories.
Bash to Change to Your Desired Working Directory
The first step to using any git command is to change the current working directory to your desired directory.
In the case of
git clone, the current working directory needs to be where you want to download a local copy of a GitHub repository.
For this textbook, you will use the
earth-analytics directory that you created under your home directory.
$ cd ~ $ cd earth-analytics $ pwd /users/jpalomino/earth-analytics
Copy a Github.com Repository URL From GitHub.com
To run the
git clone command, you need the URL for the repository that you want to clone (i.e. either a repository owned by you or a fork that you created of another user’s repository).
On the main GitHub.com page of the repository, you can click on the green button for
Clone or download, and copy the URL provided in the box, which will look like:
Data Tip: You can also copy the URL directly from your web browser, or in some cases, you might already know the URL. However, in many cases, you will come across a new GitHub.com repository on your own and will need to follow these instructions to copy the URL for future use.
Run the Git Clone Command in the Terminal
Now that you have the URL for a repository that you want to copy locally, you can use the terminal to run the
git clone command followed by the URL that you copied:
git clone https://github.com/username/example-repository
You have now made a local copy of a repository under your
earth-analytics directory. You can double check that the directory exists using the
ls command in the terminal.
$ ls example-repository