2024-01-01
ggplot2
to create simple plots
patchwork
ggplot2
One of the most used packages in R.
Developed by Hadley Wickham (we’ve seen his name before).
“gg” stands for “grammar” of “graphics”.
You sould already have ggplot2
installed, as it is part of the tidyverse
.
If not:
We’ve seen this built in dataset mtcars
before. It has values for 32 different cars.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Let’s look in detail how this works.
We will show each of these in our example plot.
data = mtcars
mapping = aes(x = hp, y = mpg, color = cyl)
Once we have that, we can add a “geom”…
geom_point()
But it would have looked pretty silly.
+ geom_point()
geom_point()
We would get both geometric objects drawn on the chart!
Notice that our geom_line
is also using the color aesthetic.
What if we wanted it to be black instead?
Instead of having one plot-wide aesthetic, we can set aesthetics for each layer.
Or we could set the common aesthetics in ggplot()
call, and just add color for geom_point()
.
Or we can override plot-wide aesthetics for individual layers.
Notice that we don’t use aes(color = "black")
.
Possible geom_
s and the aesthetics they require:
geom_point()
, geom_line()
x, ygeom_histogram()
, geom_denstity()
xgeom_col()
x, y,geom_ribbon()
x, ymin, ymaxgeom_text()
, geom_label()
x, y, labelAnd many more. See the ggplot2
reference page for more.
We have seen how to build a ggplot
with data, aesthetics, geoms, and layers.
Now, we will look at how to adjust other parts of the plate:
Every ggplot
aesthetic has a scale.
x-axis
y-axis
color
Note: I’ve saved the plot to a variable p
which we can print later and can add additional layers, scales, or themes to it without repeating the initial construction.
The most used scales are for the x- and y-axes.
Here we are using _continuous()
scales because X and Y are both numeric.
For discrete data, you would use scale_x_discrete()
or scale_y_discrete()
.
scale_*_contionus()
optionslimits = c(0, 50)
expand = c(0, 0)
c(0.05, 0.05)
breaks = c(0, 10, 20, 30, 40, 50)
labels = c(0, 10, 20, 30, 40, 50)
The default continuous color scale is
For low-mid-high colorscales use scale_color_gradient2()
.
For n-level color scales, use scale_color_gradientn()
.
But is a continuous scale the right one for this data?
For changing discrete colors, use scale_color_manual()
.
Use the “values=” argument to provide your own colors.
Use a named vector to match them to specific color values.
There is a scale in your plot for each aesthetic.
The defaults can always be adjusted, with the right scale function.
Important to remember if your data is contionuous or discrete.
There are many more built-in scales, see ggplot2
reference page for more.
These labels default to your variable names:
And these labels are optional:
plot title
plot subtitle
plot caption
ggplot
themesWant to quickly change how your plot looks? Change the theme!
ggplot
built in themestheme_gray()
the default with gray plot background
theme_bw()
black and white (my preference)
theme_minimal()
no axes lines
theme_classic()
no grid lines
theme_dark()
dark background
All elements of a theme can be edited using + theme()
.
To see the full list of options, see ggplot2::theme
reference or type ?theme
.
Many other packages and organizations share their own ggplot
themes.
And then we can make our plot look like it’s from…
The Economist!
The Wall Street Journal!
… or Stata??
Notice that a large part of what changed in each of those themes were the fonts.
With themes you can change just about any non-data part of your plot.
But that means all the options can be hard to figure out.
Using pre-built themes is a good way to get what you want without digging into the details.
No background color
No grid lines
Start and end ticks, if possible
A box around the plot (i.e. top and right axis lines)
Legends within the plot, if possible
Make colors colorblind friendly
Facets allow for easily plotting multiple cuts of the data.
You can think of it as adding another “z” dimension to your plot.
For example, for our Fuel Efficiency plot, instead of using color to show “cylinders” we could have used facets.
facet_wrap()
constructs plot panels from one variable.
Use scales="free"
to let the scales vary by panel
You can create a grid of facets using facet_grid
and two variables
Facets are very useful for looking at lots of data.
But you loose some of the control over each individual panel.
facet_grid
Sometimes you want to create a single image from two or more charts.
There are multiple packages that allow you to do this.
We will use patchwork
.
We can combine two plots side by side with |
.
We can combine two plots in a column with /
.
And we can mix and match to get complicated layouts.
You can set empty spaces using plot_spacer()
patchwork
A super powerful package.
See the vignettes for a great guide.
ggplots are easy to save with ggsave()
Two main choices:
Use vector graphic format if possible.
ggplot2
to create simple plots
patchwork