Copy the “UNRATE.csv” file into the raw-data folder
Run “clean-unrate.r” to clean the data
Overly Descriptive?
These may seem overly descriptive.
You could just say, “Downloaded unemployment rate from FRED”.
But this is a bit unclear about which unemployment rate you downloaded.
And for other data sources, the steps to get to the correct series and “download” button may be quite difficult for someone to replicate without explicit instructions.
A Better Option?
Instead of having to choose between:
short, unclear data acquisition documentation
long, cumbersome data acquisition documentation
We can use FRED’s API.
APIs
What is an API?
API stands for “application programing interface”.
It details how a program will talk to other programs.
For us, APIs determine how data servers will respond to requests for data.
I think it’s easiest to contrast it with the UI or “user interface”:
UIs: how programs respond to human requests
APIs: how programs respond to other program requests
Requesting Data from a Server
When you go to FRED and click on “Unemployment Rate” you are sending a request over “https”
<seriess realtime_start="2024-02-19" realtime_end="2024-02-19"><series id="UNRATE" realtime_start="2024-02-19" realtime_end="2024-02-19" title="Unemployment Rate" observation_start="1948-01-01" observation_end="2024-01-01" frequency="Monthly" frequency_short="M" units="Percent" units_short="%" seasonal_adjustment="Seasonally Adjusted" seasonal_adjustment_short="SA" last_updated="2024-02-02 07:49:02-06" popularity="94" notes="The unemployment rate represents the number of unemployed as a percentage of the labor force. Labor force data are restricted to people 16 years of age and older, who currently reside in 1 of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces. This rate is also defined as the U-3 measure of labor underutilization. The series comes from the 'Current Population Survey (Household Survey)' The source code is: LNS14000000"/></seriess>
Which is just the metadata for the unemployment rate series.
Requesting the Actual Data Observations
If we wanted the actual data observations, we’d have to adjust our request to
<?xml version="1.0" encoding="utf-8" ?>
<seriess realtime_start="2024-02-19" realtime_end="2024-02-19">
<series id="UNRATE" realtime_start="2024-02-19" realtime_end="2024-02-19" title="Unemployment Rate" observation_start="1948-01-01" observation_end="2024-01-01" frequency="Monthly" frequency_short="M" units="Percent" units_short="%" seasonal_adjustment="Seasonally Adjusted" seasonal_adjustment_short="SA" last_updated="2024-02-02 07:49:02-06" popularity="94" notes="The unemployment rate represents the number of unemployed as a percentage of the labor force. Labor force data are restricted to people 16 years of age and older, who currently reside in 1 of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces.
This rate is also defined as the U-3 measure of labor underutilization.
The series comes from the 'Current Population Survey (Household Survey)'
The source code is: LNS14000000"/>
</seriess>
Cleaning the Response
Now that we can make our request, we have to parse it and clean it to a format we want to use.
FRED by default returns “XML” responses.
Another common type of response is “JSON”.
Both of these can just be thought of as different types of lists.
If you find yourself making requests directly to an API, I suggest using the httr2 package. It has a great article about APIs.
But luckily for us, someone has already written a package to make and parse requests to FRED!
fredr: FRED API Wrapper for R
I wanted you to see the internals of how an API works, but usually if there is an API, someone has made a wrapper package to communicate with it.
# Source: table<djdaily> [?? x 9]
# Database: postgres [mdehaven@wrds-pgdata.wharton.upenn.edu:9737/wrds]
date djc djct dji djit djt djtt dju djut
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1896-05-26 NA NA 40.9 NA NA NA NA NA
2 1896-05-27 NA NA 40.6 NA NA NA NA NA
3 1896-05-28 NA NA 40.2 NA NA NA NA NA
4 1896-05-29 NA NA 40.6 NA NA NA NA NA
5 1896-06-01 NA NA 40.6 NA NA NA NA NA
6 1896-06-02 NA NA 40.0 NA NA NA NA NA
7 1896-06-03 NA NA 39.8 NA NA NA NA NA
8 1896-06-04 NA NA 39.9 NA NA NA NA NA
9 1896-06-05 NA NA 40.3 NA NA NA NA NA
10 1896-06-08 NA NA 39.8 NA NA NA NA NA
# ℹ more rows
How Big is the Table?
One way to figure out the size of a table would be to download all the data.
This could be very bad. These datasets can be huge!
Instead, let’s ask the database how big the table is, and what min and max dates are,