4 minute read

I often use {ggplot2} to create graphs but there are certain things I always have to Google. I figured I’d create a post for quick reference for myself but I’d love to hear what you always have to look up!

library(tidyverse)

knitr::opts_chunk$set(out.width = '100%') 

To showcase what’s happening, I am going to use a TidyTuesday dataset: Spotify songs! Let’s start by creating a simple graph.

# Load Data
spotify_songs <- 
  readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv')

spotify_songs %>% 
  ggplot(aes(x = playlist_genre)) +
  geom_histogram(stat = "count")

Remove the legend

theme(legend.position = “none”)

Ahh… this one always gets me. Sometimes when your color is mostly just for aesthetics, it doesn’t make sense to also have a color legend. This removes the legend and makes the graph look cleaner.

spotify_songs %>% 
  ggplot(aes(x = playlist_genre, fill = playlist_genre)) +
  geom_histogram(stat = "count") +
  theme(legend.position = "none")

Change Legend Title and Labels

scale_fill_discrete(name = “New Legend Title”, labels = c(“lab1” = “Label 1”, “lab2” = “Label 2”))

Alright, say I do want the legend. How do I make it something readable?

spotify_songs %>% 
  ggplot(aes(x = playlist_genre, fill = playlist_genre)) +
  geom_histogram(stat = "count") +
  scale_fill_discrete(name = "Playlist Genre", 
                      labels = c("edm" = "EDM", 
                                 "latin" = "Latin", 
                                 "pop" = "Pop", 
                                 "r&b" = "R&B", 
                                 "rap" = "Rap", 
                                 "rock" = "Rock"))

Manually Change Colors

scale_fill_manual(“New Legend Title”, values = c(“lab1” = “#000000”, “lab2” = “#FFFFFF”))

This is a bit tricker, in that you cannot use scale_fill_manual and scale_fill_discrete separately on the same plot as they override each other. However, if you want to change the labels and the colors together, you can use scale_fill_manual like below.

spotify_songs %>% 
  ggplot(aes(x = playlist_genre, fill = playlist_genre)) +
  geom_histogram(stat = "count") +
  scale_fill_manual(name = "Playlist Genre", 
                    labels = c("edm" = "EDM", 
                               "latin" = "Latin", 
                               "pop" = "Pop", 
                               "r&b" = "R&B", 
                               "rap" = "Rap", 
                               "rock" = "Rock"),
                    values = c("edm" = "#68B39B", 
                               "latin" = "#F6C7FF", 
                               "pop" = "#ADFFE5", 
                               "r&b" = "#CCB576", 
                               "rap" = "#B3A070", 
                               "rock" = "#d3d3d3"))

Remove X Axis Labels

theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank())

In this case, since we have a legend, we don’t need any x axis labels. Sometimes I use this if there’s redundant information or if it otherwise makes the graph look cleaner.

spotify_songs %>% 
  ggplot(aes(x = playlist_genre, fill = playlist_genre)) +
  geom_histogram(stat = "count") +
  scale_fill_manual(name = "Playlist Genre", 
                    labels = c("edm" = "EDM", 
                               "latin" = "Latin", 
                               "pop" = "Pop", 
                               "r&b" = "R&B", 
                               "rap" = "Rap", 
                               "rock" = "Rock"),
                    values = c("edm" = "#68B39B", 
                               "latin" = "#F6C7FF", 
                               "pop" = "#ADFFE5", 
                               "r&b" = "#CCB576", 
                               "rap" = "#B3A070", 
                               "rock" = "#d3d3d3")) +
  theme(axis.title.x = element_blank(),
         axis.text.x = element_blank(),
         axis.ticks.x = element_blank())

Start the Y Axis at a Specific Number

scale_y_continuous(name = “New Y Axis Title”, limits = c(0, 1000000))

Often times, we want our graph’s y axis to start at 0. In this example it already does, but this handy parameter allows us to set exactly what we want our y axis to be.

spotify_songs %>% 
  ggplot(aes(x = playlist_genre, fill = playlist_genre)) +
  geom_histogram(stat = "count") +
  scale_fill_manual(name = "Playlist Genre", 
                    labels = c("edm" = "EDM", 
                               "latin" = "Latin", 
                               "pop" = "Pop", 
                               "r&b" = "R&B", 
                               "rap" = "Rap", 
                               "rock" = "Rock"),
                    values = c("edm" = "#68B39B", 
                               "latin" = "#F6C7FF", 
                               "pop" = "#ADFFE5", 
                               "r&b" = "#CCB576", 
                               "rap" = "#B3A070", 
                               "rock" = "#d3d3d3")) +
  theme(axis.title.x = element_blank(),
         axis.text.x = element_blank(),
         axis.ticks.x = element_blank()) +
  scale_y_continuous(name = "Count", limits = c(0, 10000))

Use scales on the Y Axis

scale_y_continuous(label = scales::format)

Depending on our data, we may want the y axis to be formatted a certain way (using dollar signs, commas, percentage signs, etc.). The handy {scales} package allows us to do that easily.

spotify_songs %>% 
  ggplot(aes(x = playlist_genre, fill = playlist_genre)) +
  geom_histogram(stat = "count") +
  scale_fill_manual(name = "Playlist Genre", 
                    labels = c("edm" = "EDM", 
                               "latin" = "Latin", 
                               "pop" = "Pop", 
                               "r&b" = "R&B", 
                               "rap" = "Rap", 
                               "rock" = "Rock"),
                    values = c("edm" = "#68B39B", 
                               "latin" = "#F6C7FF", 
                               "pop" = "#ADFFE5", 
                               "r&b" = "#CCB576", 
                               "rap" = "#B3A070", 
                               "rock" = "#d3d3d3")) +
  theme(axis.title.x = element_blank(),
         axis.text.x = element_blank(),
         axis.ticks.x = element_blank()) +
  scale_y_continuous(name = "Count", limits = c(0, 10000),
                     labels = scales::comma)

There we have it! Six things I always eventually end up Googling when I am making plots using {ggplot2}. Hopefully now I can just look at this page instead of searching each and every time!