Intro, Git, and Github

Matthew DeHaven

Course Home Page

2024-01-01

Welcome to the Course

Welcome to Econ 2020

Applied Economics Analysis

Instructor

Matthew DeHaven (3rd year graduate student)

Eddie Wu (2nd year graduate student)

Introductions

What is your name?
Where are from?
What do you think your field in economics will be?
- (purely speculative, not holding you to it!)

A fun one:

What is one of your favorite food spots in Providence?

What is this course?

Course goals
Longer-term assignments
Weekly problem sets
Lecture structure
Class Feedback
Class Website

Course Goals

Be able to replicate published economics paper
Learn how to organize and write reproducible projects
Learn how to program
- Specifically, R and a bit of Python and Julia

Prepare you with practical skills for 2nd year on.

Longer-term assignments

Replication 1

Replicate a published paper in economics

Final Project and Presentation

Apply what we learn to exploring a research project idea

Replication 2

Replicate a classmate’s final project

I will give move information on each of these assignments later.

Weekly Assignments

One “problem set” (really, a coding exercise) due each week.

These will be due:

start of Monday class time

The goal is to…

practice the material from lectures
learn some new methods on your own

The first assignment is due next class, but this one is just installing software and making a Github account, so you are set up for the course.

Lecture Structure

The goal for lectures:

Lecture slides for ~ 1/2 the time

Live coding examples for ~ 1/2 the time

This will adjust depending on the topic being covered.

Please bring your computers to class! I don’t expect you to code along to the slides, but I hope you will for the live coding examples.

Class Feedback

You will fill out a survey at the end of each lecture.

These will ask some questions about…

material comprehension
teaching feedback (i.e. things I can do better)

These are not graded, but filling them out counts as your participation grade.

Course Website

All of the material for this course lives on a website:

Course Home Page

course schedule
lecture slides
coding examples
assignments

I will be updating the website throughout the course.

Please visit it for the must up-to-date information.

Why code?

Why do we program instead of simply open Excel, highlight the right data, hit the “regress” button?

Many reasons, but a main one…

Programming is a reproducible document of the steps we took and decisions we made.

Reproducibility is becomming more and more important in economics.

Importance of Replication

Various “Replication Crises” in economics

latest has been in behavorial economics

Writing reproducible code allows

others to trust your results.

you to know what you did six months ago.

Before the Current Version

Code keeps a reproducible state of our current project.

But what about the decisions we made before hand?

What if we decide we preferred the model we ran 6 months ago?

One solution: make copies of all your files with “_vXX”

Better solution: use version control.

Version Control and Git

Version Control

Version control keeps track of files at different states (“commits”),

by storing the differences between the file in one “commit” and the next.

This allows us to

see the history of any file
revert to any point in that history

Why do we need version control?

To the right, a hypothetical project directory
It runs some regressions, makes output LaTeX files
Which file creates the second version of the output?
Is the “_CA” or “_MD” files more recent?
Which is the current file to use?

project/
- code/
  - run_regs.R
  - run_regs_v2.R
  - run_regs_v2_MD.R
  - run_regs_v2_CA.R
  - run_regs_20240101.R
- output/
  - reg_results.tex
  - reg_results_v2.tex

What is Git?

An implementation of version control
Very popular among programmers
Operate through command line or choose from many easy to use GUIs
- We will be using VS Code’s built in Git extension
Additional tools (namely, GitHub) for collaborating with others

How does command line Git work?

When you start a new git repository,

git init

Git will create a hidden folder .git in your project where it stores versions of your documents.

You can then proceed to make document changes and when ready store the current file versions.

git commit -a -m "Informative message about edits"

Later on, you can view all of these commit messages along with their time stamps.

git log


commit 4a8ac66154034745cbaad4e61d32b36aa7e63606
Author: matdehaven <matthew_dehaven@brown.edu>
Date:   Sat Dec 2 17:43:30 2023 -0500

    Course website init, renv init

commit 74a769e5b2b916e4c8a1a65073765f79ce90dcb4
Author: Matthew DeHaven <98497348+matdehaven@users.noreply.github.com>
Date:   Sat Dec 2 17:04:56 2023 -0500

    Initial commit

Git Commit Graph

Once you have some commits, you can view the history of your repository.

Git File Revisions

And we can look at the difference between the current version of the file, and any of the saved commits.

The changes from the “Updated data source” commit:

Github

How is Github different from Git?

Git: version control stored on your computer

Github: version control stored online

backups your project

allows colloboration with others

let’s you share your work

Github in this course

We will be using Github extensively in this course.

All of your assignments will be submitted on Github.

Your final project will be a Github respository,

and will be graded in part by its commit history.

And we will see how Github also makes it easy to host websites.

Git and Github

From now on, think of Git and Github grouped together.

We won’t use one without the other.

We are now going to go into the details of Git and Github.

Git and Github Details

Initializing Git version control
- Linking to Github
Commiting file changes
- Staging
- Commit messages
- Commiting
- Pushing to Github
.gitignore

Initializing Git

When you initialize Git in a folder,

a hidden .git folder is created
a copy of every current file and subfolder is stored

Linking to Github

If you already have a project locally using Git, you can link it to Github.

Create a new, empty repository on Github.

Copy the repository URL

Add that repository as the “remote” origin for your project

This can be done from the command line git remote add origin __ or from VS Code’s Git user interface.

Details about this process on Github’s documentation.

Starting on Github

It is easier to start from Github.

Create a new repository on Github

Copy the repository HTTPS

Clone the repository to your computer

This is the process you will be using for your homework assignments.

Commiting file changes

Once you edit a file, Git will notice a change.

I edited my previous script “run_regressions.R” and added “new_script.R”.

Staging changes

Once you are at a point to commit, first you stage your changes.

This allows you to select the changes you wish to commit at this time.

Commit Messages

You always have to write a message with every commit.

They also have a character length limit, so they have to be short.

But try to make them useful!

Don’t use:

“edits”
“hi”
“acdfasdfadaf”

Do use:

“Reran with data for 2020”
“Robust check for table 1”
“Made X more efficient”

Commit Messages

Here’s one for our example:

Remember, you will be looking back at your Git messages, so try to write something helpful!

Commiting

Then you hit “commit”!

And a new commit is added to the history.

Pushing commits

Now that you have a new commit you can push it to Github.

You can push after one or multiple local commits.

I suggest you push after each one.

`.gitignore`

Some files you will not want to track.

very large datasets
private information (API keys, etc.)
large models or output

These can be listed in the .gitignore file

large-output-file.RDS

some-dataset.csv

a-whole-folder/

Git will then, well, ignore those files.

What files should be tracked?

Hard core camp says

“Code is truth”

Only track

raw data
code

Never track

intermediate data
output

What files should be tracked?

But it’s super useful to track intermediate steps and output!

In particular because you can easily look at different versions of data and results using the commit history.

You have to be a bit careful about size of files, however.

Different types of file storage

It’s more costly for Git to store certain files.

Text files (all code files are basically just text) are super easy.

Git just stores the differences, which we’ve seen.

Other files, like a PDF, are compressed (usually good).

But that means Git can’t calculate a difference.

Instead it saves a whole new copy.

Saving a lot of new copies can eventually add up.

Coding Example

Take a look at Assignment 1 together
A tour of Github.com
Showing the commit process again:
- edit files
- stage changes
- write message
- commit
- push