5 Graphics with ggplot2

5.1 Basic components of a ggplot2 Plot

There are two ways to produce plots in ggplot2, one is qplot() and the other is ggplot(). In this seminar we focus on ggplot.

Data Visualization with ggplot2 Cheat Sheet - RStudio

ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same few components: a data set, a set of geoms—visual marks that represent data points, and a coordinate system.

5.2 Building a simple ggplot plot

Now let us start by creating a scatter plot using ggplot().The scatter plot is created to explore the relationship between transaction price and total floor area in two different local authorities.

5.2.1 Make sure your datasets are data frames

Since ggplot only works with data frames, we use is.data.frame()to check the whether the data is a data frame. If it is not, you need to convert it to a data frame.

#test the data set is a data frame
#is.data.frame(housedata1)
#is.data.frame(housedata2)


5.2.2 Plot background and set the x and y axis

ggplot(housedata1,aes(x=tfarea,y=price))


5.2.3 Add a layer

ggplot(housedata1,aes(x=tfarea,y=price))+
  geom_point()


5.2.4 Color the point

In ggplot, additional aesthetic values can be added in based on other properties from our dataset. Given that housedata1 records transaction price in two local authorities, we can color the points differently depending on the ldnm field.

ggplot(housedata1,aes(x=tfarea,y=price))+
  geom_point(aes(color=ldnm))


5.2.5 Add a linear regression line

To add a linear regression line to a scatter plot, we can do this by adding stat_smooth(),method = lm where lm means linear model i.e. use linear regression

ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
  geom_point(aes(color=ldnm))+
  geom_smooth(method="lm")
## `geom_smooth()` using formula 'y ~ x'


5.3 Customizing the graph

5.3.1 Change the point colour

The colour of the points can be control with the color aesthetic. Below is the code to colour all the points in blue.

ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
  geom_point(color="#56b4e9")+
  geom_smooth(method="lm")+
  facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'


5.3.2 Change the point size

The size of the points can be controlled with the size aesthetic. The default value of size is 2. The size can be set to 1.2 by the following code.

ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  geom_smooth(method="lm")+
  facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'


5.3.3 Change the point shape

The size of the points can be controlled with the shape aesthetic. The default shape is solid circles and you can change with Point Shape Options in ggplot. The following code shows you how to plot the points as solid squares.

ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2, shape=15)+
  geom_smooth(method="lm")+
  facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'


5.3.4 Modify fitted regression lines

The default color of the fit line is blue. This can be change by setting colour,The following code sets a red fit line.

#method 1
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  geom_smooth(method="lm",colour = "red")+
  facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'

# method 2
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  geom_smooth(method="lm",colour = "#FF0000")+
  facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'


The grey area near the fit line is the confidence region.you can disabled it with se = FALSE.

ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  geom_smooth(method="lm",se = FALSE)+
  facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'


5.3.5 Change the axis titles

The labs function can be used to change axis labels.Here are two ways to change the axis title.

#Method 1
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  labs(x = "Total floor area", y = "Transaction price")

#Method 2
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2, shape=15)+
  facet_wrap(~ ldnm)+
  xlab("Total floor area")+
  ylab("Transaction price")


5.3.6 Add axis labels and units

5.3.6.1 Formatting y axis and labels

Below is the code for adding in the y axis units in ggplot().

ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  xlab("Total floor area")+
  ylab("Transaction price (£)")


If you want to change the units of the y axis to thousands of pounds, you can use scale_y_continuous.

ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = function(y) y / 1000)+
  xlab("Total floor area")


You can also format the y labels more readably, with some common formats from the scales package.Below is the code for formatting y labels in comma.

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab("Total floor area")


  • to convert the y axis to a percentage scale ,you can use `scale_y_continuous(labels = scales::percent)’

  • to display dollars on the y axis, you can use `scale_y_continuous(labels = scales::dollar)’

  • to display euros on the y axis, you can use `scale_y_continuous(labels = scales::dollar_format(suffix = “€”, prefix = ""))’

5.3.6.2 Add x axis unit

Below are listed two approaches to labelling the x axis using math notation. Math Notation for R Plot Titles: expression, bquote, & Greek Letters offers more bquoteapplications in R.

#method 1
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(bquote("Total floor area (" ~ m^2 ~ ")"))

#method 2
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))


5.3.6.3 Change x-axis breaks

You can use the breaks function to change the x or y axis breaks.

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  geom_smooth(method="lm",se = FALSE)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(bquote("Total floor area (" ~ m^2 ~ ")"))+
  scale_x_continuous(breaks = c(50,100,150,200,250,300))
## `geom_smooth()` using formula 'y ~ x'

##using seq if the breaks interval are equal
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(bquote("Total floor area (" ~ m^2 ~ ")"))+
  scale_x_continuous(breaks = seq(50,300,50))


5.3.6.4 Specify axis plot range

You can use limits to modify the axis limits.Below is an example to limit the x axis. It plots the total floor area below 300 .

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(bquote("Total floor area (" ~ m^2 ~ ")"))+
  scale_x_continuous(breaks = seq(50,300,50),limits = c(0, 300))
## Warning: Removed 2 rows containing missing values (geom_point).


You can follow the same step for your y axis.

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",limits = c(0, 1200),labels = scales::comma)+
  ylab("Transaction Price (in £1000s)")+
  xlab(bquote("Total floor area (" ~ m^2 ~ ")"))+
  scale_x_continuous(breaks = seq(50,300,50),limits = c(0, 300))


5.3.7 Add in title

You can use ggtitle() to add a title in the plot. Below is the code.

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  ggtitle("Transaction price against total floor area in local authorities, 2009")


5.3.8 Change themes

There are eight themes that can be directly used to give the plot a customized look.theme_grey() is the default ggplot2 theme, you can use theme_bw() to remove it.

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_bw()


Below are listed the rest of six other themes, from which you can choose your favourite for your academic writing.

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
   theme_linedraw()


ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
   theme_light()


ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
   theme_dark()


ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_minimal()


ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_classic()


ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_void()


The R package ggthemes provides another gallery of custom ggplot themes.You can see detials in package:ggthemes • All Your Figure Are Belong To Us.

5.3.9 Change the font size

You can manually customize the ggplot by modifying the components in theme(). Below I give a series of examples on how to change the font size in the plot. Let us do it step-by-step.

5.3.9.1 Change the font size of text in x and y axis and colour it red

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_bw()+
  theme(axis.title = element_text(size=15,color="red"))


5.3.9.2 Change the font size of x and y labels in the plot and colour them red

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_bw()+
  theme(axis.title = element_text(size=15),axis.text = element_text(size=13,color="red"))


5.3.9.3 Change the font size of facet labels in the plot and colour them red

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_bw()+
  theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15,color="red"))


5.3.9.4 Change the font size of legend item labels in the plot and colour them red

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(aes(color=ldnm),size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_bw()+
  theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15),legend.text = element_text(size=13,color="red"))


5.3.9.5 Change the font size of title of the legend in the plot and colour it red

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(aes(color=ldnm),size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_bw()+
  theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15),legend.text = element_text(size=13),legend.title = element_text(size=15,color="red"))


You may wonder about how to change the legend text in the above plot. Below is the answer:

ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
  geom_point(aes(color=ldnm),size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_bw()+
  theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15),legend.text = element_text(size=13),legend.title = element_text(size=15,color="red"))+
  labs(color = "Local authority")


5.3.10 Add in the Pearson correlation coefficient result in the plot

Since the aim of this scatter plot is to explore the relationship between transaction price and property’s total floor area. The Pearson correlation coefficient is a suitable measure to show in the plot.Here we use stat_cor() from the package ggpubr.

# add Pearson correlation coefficient and p value in the graph

ggplot(housedata1,aes(x=tfarea,y=price/1000))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_bw()+
  theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15))+
 stat_cor(method="pearson")


You may wonder how to change R to rho. Here is one solution I found (Modify stat_cor function to output “rho” instead of “R”).

# type this first and modify output.type part
trace(ggpubr:::.cor_test, edit=TRUE)
## Tracing function ".cor_test" in package "ggpubr (not-exported)"
## [1] ".cor_test"
#plot
ggplot(housedata1,aes(x=tfarea,y=price/1000))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_bw()+
  theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15))+
 stat_cor(method="pearson")


Tip: Once you type trace(ggpubr:::.cor_test, edit=TRUE) in R studio, you will get an edit window as shown above. You need only change the red square part from left to right.


You also can change the label text location as shown below:

ggplot(housedata1,aes(x=tfarea,y=price/1000))+
  geom_point(color="#56b4e9",size = 1.2)+
  facet_wrap(~ ldnm)+
  scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
  xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
  theme_bw()+
  theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15))+
  stat_cor(method="pearson",label.x = 150, label.y = 2000,size=5)


5.4 Saving graphs

Plots exported directly from RStudio will only achieve a resolution of 72dpi. Higher DPI (Dots per Inch) means higher resolution. ggsave is used to save a high resolution graph from the screen to a file.

#get you working directory
getwd()
## [1] "D:/R/CASA_seminar2"
#save the figure as tiff 
ggsave("Figure_A.tiff",units="in", width=12, height=6, dpi=500)  
#ggsave("first.png",units="in", width=10, height=5, dpi=300)
#ggsave("example.png", units = "cm",width = 30, height = 20 )