5 Graphics with ggplot2
5.1 Basic components of a ggplot2 Plot
There are two ways to produce plots in ggplot2, one is qplot()
and the other is ggplot()
. In this seminar we focus on ggplot
.
Data Visualization with ggplot2 Cheat Sheet - RStudio
ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same few components: a data set, a set of geoms—visual marks that represent data points, and a coordinate system.
5.2 Building a simple ggplot plot
Now let us start by creating a scatter plot using ggplot()
.The scatter plot is created to explore the relationship between transaction price and total floor area in two different local authorities.
5.2.1 Make sure your datasets are data frames
Since ggplot only works with data frames, we use is.data.frame()
to check the whether the data is a data frame. If it is not, you need to convert it to a data frame.
5.2.4 Color the point
In ggplot, additional aesthetic values can be added in based on other properties from our dataset. Given that housedata1 records transaction price in two local authorities, we can color the points differently depending on the ldnm field.
5.2.5 Add a linear regression line
To add a linear regression line to a scatter plot, we can do this by adding stat_smooth()
,method = lm
where lm
means linear model i.e. use linear regression
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
geom_point(aes(color=ldnm))+
geom_smooth(method="lm")
## `geom_smooth()` using formula 'y ~ x'
5.3 Customizing the graph
5.3.1 Change the point colour
The colour of the points can be control with the color
aesthetic. Below is the code to colour all the points in blue.
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
geom_point(color="#56b4e9")+
geom_smooth(method="lm")+
facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'
5.3.2 Change the point size
The size of the points can be controlled with the size
aesthetic. The default value of size is 2. The size can be set to 1.2 by the following code.
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
geom_smooth(method="lm")+
facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'
5.3.3 Change the point shape
The size of the points can be controlled with the shape
aesthetic. The default shape is solid circles and you can change with Point Shape Options in ggplot. The following code shows you how to plot the points as solid squares.
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2, shape=15)+
geom_smooth(method="lm")+
facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'
5.3.4 Modify fitted regression lines
The default color of the fit line is blue. This can be change by setting colour
,The following code sets a red fit line.
#method 1
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
geom_smooth(method="lm",colour = "red")+
facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'
# method 2
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
geom_smooth(method="lm",colour = "#FF0000")+
facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'
The grey area near the fit line is the confidence region.you can disabled it with se = FALSE
.
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
geom_smooth(method="lm",se = FALSE)+
facet_wrap(~ ldnm)
## `geom_smooth()` using formula 'y ~ x'
5.3.5 Change the axis titles
The labs function can be used to change axis labels.Here are two ways to change the axis title.
#Method 1
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
labs(x = "Total floor area", y = "Transaction price")
#Method 2
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2, shape=15)+
facet_wrap(~ ldnm)+
xlab("Total floor area")+
ylab("Transaction price")
5.3.6 Add axis labels and units
5.3.6.1 Formatting y axis and labels
Below is the code for adding in the y axis units in ggplot().
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
xlab("Total floor area")+
ylab("Transaction price (£)")
If you want to change the units of the y axis to thousands of pounds, you can use scale_y_continuous
.
ggplot(housedata1,aes(x=tfarea,y=price,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = function(y) y / 1000)+
xlab("Total floor area")
You can also format the y labels more readably, with some common formats from the scales package.Below is the code for formatting y labels in comma.
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab("Total floor area")
to convert the y axis to a percentage scale ,you can use `scale_y_continuous(labels = scales::percent)’
to display dollars on the y axis, you can use `scale_y_continuous(labels = scales::dollar)’
to display euros on the y axis, you can use `scale_y_continuous(labels = scales::dollar_format(suffix = “€”, prefix = ""))’
5.3.6.2 Add x axis unit
Below are listed two approaches to labelling the x axis using math notation. Math Notation for R Plot Titles: expression, bquote, & Greek Letters offers more bquote
applications in R.
#method 1
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(bquote("Total floor area (" ~ m^2 ~ ")"))
#method 2
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))
5.3.6.3 Change x-axis breaks
You can use the breaks
function to change the x or y axis breaks.
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
geom_smooth(method="lm",se = FALSE)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(bquote("Total floor area (" ~ m^2 ~ ")"))+
scale_x_continuous(breaks = c(50,100,150,200,250,300))
## `geom_smooth()` using formula 'y ~ x'
##using seq if the breaks interval are equal
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(bquote("Total floor area (" ~ m^2 ~ ")"))+
scale_x_continuous(breaks = seq(50,300,50))
5.3.6.4 Specify axis plot range
You can use limits
to modify the axis limits.Below is an example to limit the x axis. It plots the total floor area below 300 .
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(bquote("Total floor area (" ~ m^2 ~ ")"))+
scale_x_continuous(breaks = seq(50,300,50),limits = c(0, 300))
## Warning: Removed 2 rows containing missing values (geom_point).
You can follow the same step for your y axis.
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",limits = c(0, 1200),labels = scales::comma)+
ylab("Transaction Price (in £1000s)")+
xlab(bquote("Total floor area (" ~ m^2 ~ ")"))+
scale_x_continuous(breaks = seq(50,300,50),limits = c(0, 300))
5.3.7 Add in title
You can use ggtitle()
to add a title in the plot. Below is the code.
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
ggtitle("Transaction price against total floor area in local authorities, 2009")
5.3.8 Change themes
There are eight themes that can be directly used to give the plot a customized look.theme_grey()
is the default ggplot2 theme, you can use theme_bw()
to remove it.
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_bw()
Below are listed the rest of six other themes, from which you can choose your favourite for your academic writing.
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_linedraw()
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_light()
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_dark()
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_minimal()
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_classic()
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_void()
The R package ggthemes provides another gallery of custom ggplot themes.You can see detials in package:ggthemes • All Your Figure Are Belong To Us.
5.3.9 Change the font size
You can manually customize the ggplot by modifying the components in theme(). Below I give a series of examples on how to change the font size in the plot. Let us do it step-by-step.
5.3.9.1 Change the font size of text in x and y axis and colour it red
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_bw()+
theme(axis.title = element_text(size=15,color="red"))
5.3.9.2 Change the font size of x and y labels in the plot and colour them red
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_bw()+
theme(axis.title = element_text(size=15),axis.text = element_text(size=13,color="red"))
5.3.9.3 Change the font size of facet labels in the plot and colour them red
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_bw()+
theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15,color="red"))
5.3.9.4 Change the font size of legend item labels in the plot and colour them red
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(aes(color=ldnm),size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_bw()+
theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15),legend.text = element_text(size=13,color="red"))
5.3.9.5 Change the font size of title of the legend in the plot and colour it red
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(aes(color=ldnm),size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_bw()+
theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15),legend.text = element_text(size=13),legend.title = element_text(size=15,color="red"))
You may wonder about how to change the legend text in the above plot. Below is the answer:
ggplot(housedata1,aes(x=tfarea,y=price/1000,group=ldnm))+
geom_point(aes(color=ldnm),size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_bw()+
theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15),legend.text = element_text(size=13),legend.title = element_text(size=15,color="red"))+
labs(color = "Local authority")
5.3.10 Add in the Pearson correlation coefficient result in the plot
Since the aim of this scatter plot is to explore the relationship between transaction price and property’s total floor area. The Pearson correlation coefficient is a suitable measure to show in the plot.Here we use stat_cor() from the package ggpubr.
# add Pearson correlation coefficient and p value in the graph
ggplot(housedata1,aes(x=tfarea,y=price/1000))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_bw()+
theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15))+
stat_cor(method="pearson")
You may wonder how to change R to rho. Here is one solution I found (Modify stat_cor function to output “rho” instead of “R”).
## Tracing function ".cor_test" in package "ggpubr (not-exported)"
## [1] ".cor_test"
#plot
ggplot(housedata1,aes(x=tfarea,y=price/1000))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_bw()+
theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15))+
stat_cor(method="pearson")
Tip: Once you type trace(ggpubr:::.cor_test, edit=TRUE)
in R studio, you will get an edit window as shown above. You need only change the red square part from left to right.
You also can change the label text location as shown below:
ggplot(housedata1,aes(x=tfarea,y=price/1000))+
geom_point(color="#56b4e9",size = 1.2)+
facet_wrap(~ ldnm)+
scale_y_continuous(name = "Transaction Price (in £1000s)",labels = scales::comma)+
xlab(expression("Total floor area (" ~ m^2 ~ ")"))+
theme_bw()+
theme(axis.title = element_text(size=15),axis.text = element_text(size=13),strip.text = element_text(size=15))+
stat_cor(method="pearson",label.x = 150, label.y = 2000,size=5)
5.4 Saving graphs
Plots exported directly from RStudio will only achieve a resolution of 72dpi. Higher DPI (Dots per Inch) means higher resolution. ggsave
is used to save a high resolution graph from the screen to a file.
## [1] "D:/R/CASA_seminar2"
#save the figure as tiff
ggsave("Figure_A.tiff",units="in", width=12, height=6, dpi=500)
#ggsave("first.png",units="in", width=10, height=5, dpi=300)
#ggsave("example.png", units = "cm",width = 30, height = 20 )