Replication 1: Published Economics Paper

Author

Matthew DeHaven

Due

February 28, 2024

Modified

March 31, 2024

1 Assignment Submission

  • Submit a pdf of the presentation on Canvas
    • by 9:00 am so I can have them all loaded before class!
  • Submit a 1 page pdf write up on Canvas

2 Presentation Requirements

  • You will have 5 minutes to present
  • Please prepare no more than 5 slides

3 Replication Task

For this assignment you will have a few weeks to replicate a published paper in economics. This is something you will often have to do in the 2nd year of the PhD program for field courses. However, the goal for this replication is slightly different than you will see next year. Our goal is to replicate a published paper in order to understand its project structure, the readibility of its code, and the quality of its documentation.

3.1 Picking a Published Paper to Replicate

You will need to choose a paper to replicate.

Any paper, in any field of economics, written by any authors is fine to choose. However, please choose one that meets the following conditions:

  • Uses real data (i.e. not just simulations)
  • Has at least some publically available data sources
Note

You don’t need to choose a paper that uses R. It can use any language or combination of languages. I actually suggest trying to not think about the languages the paper uses till after you choose.

I would suggest choosing a paper from one of the following journals, which are the notorious “top 5” in economics and I know have repositories for replication files: 1

1 But feel free to choose a top field journal paper if it excites you and you can find the files for it!

Journal Replication Repository
AER openICPSR
QJE Harvard Dataverse: QJE
Econometrica Zenodo: Econometrica
JPE Harvard Dataverse: JPE
ReStud Zenodo: ReStud

3.2 “Replicating” the Results

Once you have the files, you should…

  1. Run the code!

Try to do this step with a minimal reading of any documentation and exploring of the code files. It should be straightforward to find the correct files to run. Find the directions to recreate the “main” result, and run the code necessary to do so. You do not need to run any of the robustness checks, if those are clearly separated.

  1. Did you get the same results? Did you get a bunch of errors?

If you got errors, try to spend time fixing them, getting the right languages installed, downloading the correct versions of packages, etc.

Important

Don’t spend more than a few hours on this step. If you cannot get the code to run in that amount of time, then that says something about the reproducibility of the paper and its documentation.

3.3 Find the Original Data

  1. Find the “raw” data in the replication files

This should be the data before any code has operated on it.

  1. Find any documentation about where that data came from 2

  2. Try to find that raw data from the sources listed. Redownload the data directly from the source.

  3. Does the data match?

2 If your paper has some private data, just do this for the publically available data

Important

Don’t spend more than a few hours on this step either.

3.4 Understand the Code and Project Structure

Now we will spend some more time reading the documentation and trying to understand the flow of the coding files.

Some questions to ask:

  • Is there a “main” script that calls everything else?
  • Are they using packages/functions from outside the project?
  • Have they created their own functions inside the project?
  • Do they repeat large blocks of code?
  • Can you “read” their code? 3
  • What is their folder structure?
    • Are input files clear from output?
    • Are different stages of analysis separated?

3 i.e. Can you roughly understand what the code is doing without running it line-by-line?

3.5 Optional: Change Something in the Code

If all of the previous steps went quickly for you 4, try to identify something you could change in the code. For example, you could…

4 i.e. they only took an hour or two

  • Subset the data to a different sample or time period
  • Change the standard errors
  • Change the model specification slightly (add an interaction, or a nonlinear term)

After having done that…

  1. Rerun the code.
  • Do the results change as you expected?
  • Did the change get reflected in every table/chart, or just in one?

4 Write Up

Now that you have finished the “replication” you can write up how it went for you.

  1. Write a few paragraphs (no more than a page) about each of the steps in the replication (getting the code to run, finding the original data, understanding the project structure) and your thoughts on it.
  • What went well? What did not?
  • What would you like to emulate in your own projects?
  • What would you change?

5 Presentation

  1. Create a presentation with no more than 5 slides that shows some highlights from your replication work:
  • Did you succesfully replicate the main result?
  • Did you find the original data? Did it match?
  • What was the project structure? What was good about it? Bad?
  • Was the code “readable”? What made it clear? What made it confusing?

Again, please remember that we only have 5 minutes per presentation. This is meant as a time for your classmates to learn how other replication attempts went so the class can get a broader perspective on the task. It would be fine to pick one of the tasks that was interesting/challenging and spend all five minutes on that for your presentation.