Intro, Git, and GitHub

Why we use code and why we should use version control

Matthew DeHaven

January 1, 2025

Course Home Page

Welcome to the Course

Welcome to Econ 2020

Applied Economics Analysis

Instructor

  • Matthew DeHaven (4th year graduate student)

TA

  • Ruchi Mahadeshwar (on the job market)

Introductions

  • What is your name?

  • Where are from?

  • What do you think your field in economics will be?

    • (purely speculative, not holding you to it!)

A fun one:

  • What is one of your favorite food spots in Providence?

What is this course?

  • Course goals
  • Longer-term assignments
  • Weekly problem sets
  • Lecture structure
  • Class feedback
  • Class website

Course Goals

  • Able to replicate published papers in economics
  • Learn how to program
    • Specifically, R and a bit of Python and Julia
  • Write clean, documented, reproducible code
  • Apply software tools and best practices to economic research projects

Prepare you with practical skills for 2nd year on.

Longer-term assignments

Replication 1

  • Replicate a published paper in economics

Final Project and Presentation

  • Apply what we learn to exploring a research project idea

Replication 2

  • Replicate a classmate’s final project

I will give move information on each of these assignments later.

Weekly Assignments

One “problem set” (really, a coding exercise) due each week.

These will be due:

  • start of Monday class time

The goal is to…

  • practice the material from lectures
  • learn some new methods on your own

The first assignment is due next class.

Lecture Structure

The goal for lectures:

  1. Lecture slides
  1. Live coding example
  1. In-class coding problem

This will adjust depending on the topic being covered.

Please bring your computers to class!

Class Feedback

You will fill out a survey at the end of each lecture.

These will ask some questions about…

  • material comprehension
  • teaching feedback (i.e. things I can do better)

These are not graded, but filling them out counts as your participation grade.

Course Website

All of the material for this course lives on a course website.

  • course schedule
  • lecture slides
  • coding examples
  • assignments
  • guides

I will be updating the website throughout the course.

Course Website Tour

Why code?

Why code?

Why do we program instead of simply open Excel, highlight the right data, hit the “regress” button?

Many reasons, but a main one…

Programs are reproducible documents of the steps we took and decisions we made.

Reproducibility is becomming more and more important in economics.

Importance of Replication

Various “Replication Crises” in economics

  • latest has been in behavorial economics

Writing reproducible code allows

  • others to trust your results.
  • you to know what you did six months ago.

Before the Current Version

Code keeps a reproducible state of our current project.

But what about the decisions we made before hand?

What if we decide we preferred the model we ran 6 months ago?

One solution: make copies of all your files with “_vXX”

Better solution: use version control.

Version Control and Git

Version Control

Version control keeps track of files at different states (“commits”),

by storing the differences between the file in one “commit” and the next.

This allows us to

  • see the history of any file
  • revert to any point in that history

Why do we need version control?

Why do we need version control?

  • To the right, a hypothetical project directory
  • It runs some regressions, makes output LaTeX files
  • Which file creates the second version of the output?
  • Is the “_CA” or “_MD” files more recent?
  • Which is the current file to use?
  • project/
    • code/
      • run_regs.R
      • run_regs_v2.R
      • run_regs_v2_MD.R
      • run_regs_v2_CA.R
      • run_regs_20240101.R
    • output/
      • reg_results.tex
      • reg_results_v2.tex

What is Git?

  • An implementation of version control
  • Very popular among programmers
  • Operate through command line or choose from many easy to use GUIs
    • We will be using VS Code’s built in Git extension
  • Additional tools (namely, GitHub) for collaborating with others

How does command line Git work?

When you start a new git repository,

git init

Git will create a hidden folder .git in your project where it stores versions of your documents.

You can then proceed to make document changes and when ready store the current file versions.

git commit -a -m "Informative message about edits"

Later on, you can view all of these commit messages along with their time stamps.

git log

commit 988ca5180eb7b3ff2dd0dbb32f7be373671cc644
Author: matdehaven <matthew_dehaven@brown.edu>
Date:   Mon Jan 13 10:28:52 2025 -0500

    Adding homepage

commit b3ec056cbdb28afe646ef25fcb7e27f0bde3485d
Author: Matthew DeHaven <98497348+matdehaven@users.noreply.github.com>
Date:   Mon Jan 13 10:27:32 2025 -0500

    Initial commit

Git Commit Graph

Once you have some commits, you can view the history of your repository.

Git File Revisions

And we can look at the difference between the current version of the file, and any of the saved commits.

The changes from the “Updated data source” commit:

GitHub

How is GitHub different from Git?

Git: version control stored on your computer

GitHub: version control stored online

  • backups your project
  • allows colloboration with others
  • let’s you share your work

GitHub in this course

We will be using GitHub extensively in this course.

All of your problem sets will be submitted on GitHub.

Your final project will be a GitHub respository,

and will be graded in part by its commit history.

And we will see how GitHub also makes it easy to host websites.

Git and GitHub

From now on, think of Git and GitHub grouped together.

We won’t use one without the other.

Git and GitHub Details

Git and GitHub Details

  • Initializing Git version control
    • Linking to Github
  • Commiting file changes
    • Staging
    • Commit messages
    • Commiting
    • Pushing to Github
  • .gitignore

Initializing Git

When you initialize Git in a folder,

  • a hidden .git folder is created
  • a copy of every current file and subfolder is stored

Linking to GitHub

If you already have a project locally using Git, you can link it to GitHub.

  • Create a new, empty repository on GitHub.
  • Copy the repository URL
  • Add that repository as the “remote” origin for your project

This can be done from the command line git remote add origin __ or from VS Code’s Git user interface.

Details about this process on GitHub’s documentation.

Starting on GitHub

It is easier to start from GitHub.

  • Create a new repository on GitHub
  • Copy the repository URL
  • Clone the repository to your computer

This is the process you will be using for your homework assignments.

Commiting file changes

Once you edit a file, Git will notice a change.

I edited my previous script “run_regressions.R” and added “new_script.R”.

Staging changes

Once you are at a point to commit, first you stage your changes.

This allows you to select the changes you wish to commit at this time.

Commit Messages

You always have to write a message with every commit.

They also have a character length limit, so they have to be short.

But try to make them useful!

Don’t use:

  • “edits”
  • “hi”
  • “acdfasdfadaf”

Do use:

  • “Reran with data for 2020”
  • “Robust check for table 1”
  • “Made reg loop more efficient”

Commit Messages

Here’s one for our example:

Remember, you will be looking back at your Git messages, so try to write something helpful!

Commiting

Then you hit “commit”!

And a new commit is added to the history.

Pushing commits

Now that you have a new commit you can push it to GitHub.

You can push after one or multiple local commits.

.gitignore

Some files you will not want to track.

  • very large datasets
  • private information (API keys, etc.)
  • large models or output

These can be listed in the .gitignore file

large-output-file.RDS

some-dataset.csv

a-whole-folder/

Git will then, well, ignore those files.

What files should be tracked?

Hard core camp says

“Code is truth”

Only track

  • raw data
  • code

Never track

  • intermediate data
  • output

What files should be tracked?

But it’s super useful to track intermediate steps and output!

In particular because you can easily look at different versions of data and results using the commit history.

You have to be a bit careful about size of files, however.

Different types of file storage

It’s more costly for Git to store certain files.

Text files (all code files are basically just text) are super easy.

  • Git just stores the differences, which we’ve seen.

Other files, like a PDF, are compressed (usually good).

But that means Git can’t calculate a difference.

Instead it saves a whole new copy.

Saving a lot of new copies can eventually add up.

Coding Example

Coding Example

  • Take a look at Assignment 1 together

  • A tour of github.com

  • Showing the commit process again:

    • edit files
    • stage changes
    • write message
    • commit
    • push