ggplot2
and
leaflet
This lab focuses on creating maps to display spatial data. It covers
both static maps (created using ggplot2
) and interactive
maps (created using leaflet
).
Directions (Please read before starting)
\(~\)
This lab will use both the ggplot2
and
leaflet
packages. It will also require the
dplyr
package for merging and joining, the
maps
package, which contains the spatial information needed
to create a variety commonly used maps, and maptools
, a
package used to process and read certain spatial data files.
# load the following packages
# install.packages("leaflet")
# install.packages("maps")
# install.packages("maptools")
library(leaflet)
library(ggplot2)
library(dplyr)
library(maps)
library(maptools)
Unfortunately, maptools
is scheduled to be retired this
month without a clear successor. However, we can still use it for the
moment, and the broader concepts of shapefiles and map making will
remain even if the functions and syntax involved eventually change.
\(~\)
There are three main geometric elements that are commonly used to represent spatial data:
Aesthetics like color, size, and brightness can be used to encode additional information using these geometric elements.
\(~\)
A choropleth map uses colored polygons to represent values corresponding variables measured across geographic units.
The first step in creating a choropleth map is creating the polygons that define each geographic unit:
## Get US state polygons from the "maps" package
state_polys <- map_data("state")
## Graph these polygons
ggplot(data = state_polys, aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "lightblue")
In state_polys
, each “region” is defined by a series of
ordered vertices, each with its own longitude and latitude. The “group”
and “order” of these vertices determines how the edges of that region’s
polygon are drawn. If these variables are altered during a data
processing step like merging and joining, the polygons will not be drawn
correctly.
Next, we need color these polygon according to a variable. This requires joining new data with the data frame that contains the polygons:
## USArrests data from the "datasets" package
data("USArrests")
## Create a data frame to join
state_values <- data.frame(State = tolower(rownames(USArrests)),
MurderRate = USArrests$Murder)
## Add MurderRate to the state polygons
full_data <- left_join(x = state_polys, y = state_values, by = c("region" = "State"))
## Color the polygons using a fill variable
ggplot(data = full_data, aes(x = long, y = lat, group = group, fill = MurderRate)) +
geom_polygon(color = "black") +
scale_fill_continuous(type = "gradient", high = "darkred", low = "white")
\(~\)
If you’d like to create a map displaying multiple types of spatial
features (such as polygons and points), you’ll need to locally specify
each in a different ggplot
layer (since both polygons and
lines/points use the “x” and “y” aesthetics). This is demonstrated in
the example below:
data("us.cities") ## Data on 1005 US cities
## Blank ggplot, then polygons for states, then points for cities
ggplot() +
geom_polygon(data = full_data, aes(x = long, y = lat, group = group, fill = MurderRate), color = "black") +
geom_point(data = us.cities, aes(x = long, y = lat), size = 0.1, color = "red")
Here, we add red points corresponding to various US cities to the
murder rate choropleth map we previously created (notice how different
data arguments are used within geom_polygon()
and
geom_point()
).
\(~\)
For the following question you should use the “state” and “county”
polygons found in the maps
package, as well as the
us.cities
data frame:
state_polys <- map_data("state")
county_polys <- map_data("county")
data("us.cities")
Question #1: Filter the us.cities
data
to include only state capitals (capital == 2
) in the
contiguous United States (exclude cities in AK and HI). Then create a
map displaying state polygons (defined by black borders with
fill = NA
, size = 0.3
and
alpha = 0.3
) and counties (defined by blue borders with
fill = "white"
, size = 0.1
and
alpha = 0.5
) that also includes state capitals as points
(with color = "yellow"
). To exploit the mechanics of
layering, you should add the county polygons first (so state boundaries
are drawn over them) and the cities last (so they appear “on top”). Your
end result should be similar to the map provided below:
\(~\)
The polygon boundaries given by map_data()
are easy to
understand, but they lack the flexibility that can be provided by more
complex spatial storage structures. In many applications, geographic
information is stored in a shapefile, a complex data structure
that must contain the following components:
To practice working with a shapefile, you should download and extract:
This shapefile was originally obtained from Natural Earth Data. You should notice it contains .shp, .shx, and .dbf files (along with a few other files).
To begin, we’ll load the shapefile using the
readShapeSpatial()
function within the
maptools
package:
world <- readShapeSpatial("C:/Users/millerry/Downloads/world_shape/ne_50m_admin_0_countries") ## You will need to change this file path
Notice we used a local path that specified the root names of the various files, and we omitted their .shp/.shx/.dbf file extensions.
Next, we’ll use the str()
function to print the
structure of world
. We can see it contains 5 “slots”, which
are items we can reference using @
(similar to
$
for data.frame objects):
str(world, max.level = 2) ## Only print the first 2 levels
## Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
## ..@ data :'data.frame': 242 obs. of 168 variables:
## .. ..- attr(*, "data_types")= chr [1:168] "C" "N" "N" "C" ...
## ..@ polygons :List of 242
## ..@ plotOrder : int [1:242] 240 76 203 17 196 211 226 182 134 232 ...
## ..@ bbox : num [1:2, 1:2] -180 -90 180 83.6
## .. ..- attr(*, "dimnames")=List of 2
## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
## ..$ comment: chr "FALSE"
Unfortunately, ggplot
cannot work directly with
shapefiles, so we must use the fortify()
function to
convert them to a workable format:
world_polys <- fortify(world) ## Converts the shapefile into a data frame of polygons
ggplot() + geom_polygon(data = world_polys, aes(x = long, y = lat, group = group), fill = "white", color = "blue")
Note that fortify()
will only create a data frame
version of the polygons contained in the shape file, it will not attach
any additional variables that you might want to use. To attach a data
variable, you must create your own “id” that matches the ordering of the
“id” that was automatically generated by fortify
:
# Create a new column named "id" in the @data object
my_idx = as.character(seq(0, nrow(world@data)-1, by=1))
world_data <- mutate(world@data, id = my_idx)
# Join the fortified polygons w/ word data by this id
world_polys_data <- left_join(world_polys, world_data, by = "id")
# Test it out
BRICS <- c("Brazil", "Russia", "India", "People's Republic of China", "South Africa") ## BRICS nations
ggplot() + geom_polygon(data = world_polys_data,
aes(x = long, y = lat, group = group,
fill = ifelse(NAME_EN %in% BRICS, "BRICS", "not")),
color = "black") + guides(fill = FALSE) +
theme_void()
Note that many of you had previously questioned where
theme_void()
could be used. Maps are one example where it
makes sense to omit the axes.
Question #2: The variable “GDP_MD” (already present
in the merged data frame created above) records the gross domestic
product (GDP) in millions of US dollars for the year 2019. For this
question, create a choropleth that colors each country from white to
green according to its GDP. Hint: you should use the
scale_fill_continuous()
function.
\(~\)
Another popular tool used to create maps is leaflet, an open-source
JavaScript library used by many professional organizations. Similar to
plotly
, maps created using leaflet
are
interactive and can contain hoverable/clickable components.
The base layer of any leaflet
map is a set of map tiles.
Additional layers, such as polygons, lines, or point markers can be
added via piping:
## Just the base layer
leaflet() %>% addTiles()
## Base layer plus polygons from the world shapefile
leaflet(world) %>%
addTiles() %>%
addPolygons()
A benefit of leaflet
is that it can use shapefiles
directly, so we don’t need to worry about the pre-processing steps that
were required by ggplot
.
\(~\)
By default, addTiles()
will generate a base map using OpenStreetMap. This works
great for the vast majority of mapping applications, but if you’d like
something different Leaflet offers a list of alternative tile sets in
the providers
object. These can be used via the
addProviderTiles()
function:
## List of provider tile sets
head(providers)
## $OpenStreetMap
## [1] "OpenStreetMap"
##
## $OpenStreetMap.Mapnik
## [1] "OpenStreetMap.Mapnik"
##
## $OpenStreetMap.DE
## [1] "OpenStreetMap.DE"
##
## $OpenStreetMap.CH
## [1] "OpenStreetMap.CH"
##
## $OpenStreetMap.France
## [1] "OpenStreetMap.France"
##
## $OpenStreetMap.HOT
## [1] "OpenStreetMap.HOT"
## Example of provider tiles
leaflet() %>% addProviderTiles(providers$Esri.WorldPhysical)
\(~\)
In leaflet
, polygon attributes are controlled by the
arguments of addPolygons()
. Of particular importance is the
fillColor
argument, which requires a vector of colors with
a length equal to the number of polygons.
The code below demonstrates how to use the fillColor
to
color each country by the base-ten log of “POP_EST” using the “magma”
color palate:
## Vector of values to color by
col_vals <- log10(world@data$POP_EST)
col_vals[is.infinite(col_vals)] <- NA ## Replace infinities with NA
## Create a color palette (using the "magma" color scheme)
pal <- colorNumeric("magma", domain = col_vals)
leaflet(world) %>% addTiles() %>%
addPolygons(fillColor = pal(col_vals), weight = 1) %>%
addLegend(pal = pal, values = col_vals,
title = "Estimated population (log10)", position = "bottomleft", na.label = "Missing")
Leaflet will not generate a color legend by default, so in the
example above we created our own using the addLegend()
function.
Question #3: Use Leaflet to create a choropleth map displaying the variable “GDP_MD” using the “RdYlBu” color palette. Your map should include an appropriately formatted legend and use the “Stamen.Toner” provider tile set.
\(~\)
There are two ways to add interactive components to a Leaflet map:
Similar to plotly
, Leaflet allows the use of HTML
commands to modify the format and appearance of labels and popups:
## Set up the labels and popup
myLabels <- paste("<strong>", world@data$NAME_EN, "</strong>", "<br/>",
"Population:", prettyNum(world@data$POP_EST, big.mark = ",")) # Adds commas to pop
myPopups <- paste("Income Group", world@data$INCOME_GRP)
## Add to map (using "highlight" argument)
leaflet(world) %>% addTiles() %>%
addPolygons(fillColor = pal(col_vals),
weight = 1,
highlight = highlightOptions(
weight = 3,
color = "grey",
fillOpacity = 0.7,
bringToFront = TRUE),
label = lapply(myLabels, htmltools::HTML),
popup = myPopups) %>%
addLegend(pal = pal, values = col_vals,
title = "Estimated population (log10)", position = "bottomleft", na.label = "Missing")
The highlight
argument dictates how a selected polygon
should be highlighted (in this case it will be brought to the front with
a grey border and an opaque fill)
Additionally, for labels/popup text with HTML commands to render
properly, the HTML()
function in the htmltools
package needs to be applied to each element of the vector containing the
labels (achieved by the lapply()
function). This is one way
that leaflet
features differ from those native to
plotly
(which can interpret HTML without any added
packages/tools).
\(~\)
Often we’d like to combine multiple data sources to create a novel map. For example, we might want to incorporate happiness ratings or economic indicators from the “Happy Planet” data to create an interactive choropleth map.
When using multiple data sources, we must pay special attention to
how our second dataset is merged with the @data
component
of our shapefile. In particular, we need to be mindful of different
naming conventions used by different sources to represent the same
entity.
## Load happy planet data
hp <- read.csv("https://remiller1450.github.io/data/HappyPlanet.csv") ## contains 143 countries
### find non-matching countries using anti_join
non_matches <- anti_join(x = hp, y = world@data, by = c("Country" = "NAME_EN"))
non_matches$Country
## [1] "Burma" "China"
## [3] "Congo" "Congo, Dem. Rep. of the"
## [5] "Korea" "Macedonia"
Notice that there were 6 countries in the Happy Planet data whose
names do not match any of the polygons in world
.
However, this is not because those countries lack polygon boundaries (recognize that China has clearly been displayed in our previous maps). Instead, it is because the countries were named differently in each data source. Thus, we must manually correct the misaligned names to ensure we are able to match as many countries as possible.
## Repair names (manually)
hp$Country[hp$Country == "China"] <- "People's Republic of China"
hp$Country[hp$Country == "Congo, Dem. Rep. of the"] <- "Democratic Republic of the Congo"
hp$Country[hp$Country == "Congo"] <- "Republic of the Congo"
hp$Country[hp$Country == "Korea"] <- "South Korea"
hp$Country[hp$Country == "Burma"] <- "Myanmar"
hp$Country[hp$Country == "Macedonia"] <- "North Macedonia"
### check for non-matches (none)
anti_join(x = hp, y = world@data, by = c("Country" = "NAME_EN"))
## [1] Country Region Happiness LifeExpectancy Footprint
## [6] HLY HPI HPIRank GDPperCapita HDI
## [11] Population
## <0 rows> (or 0-length row.names)
We’re now ready to join the Happy Planet data with our shapefile and create a map:
world <- readShapeSpatial("C:/Users/millerry/Downloads/world_shape/ne_50m_admin_0_countries") ## Reload before overwriting
world@data <- left_join(x = world@data, y = hp, by = c("NAME_EN" = "Country"))
## Vector of values to color by
col_vals <- world@data$Happiness
col_vals[is.infinite(col_vals)] <- NA ## Replace infinities with NA
pal <- colorNumeric("viridis", domain = col_vals)
## Add to map (using "highlight" argument)
leaflet(world) %>% addTiles() %>%
addPolygons(fillColor = pal(col_vals),
weight = 1) %>%
addLegend(pal = pal, values = col_vals,
title = "Happiness", position = "bottomleft")
Question #4: Starting with the
world.cities
data frame from the maps
package.
Filter the data to only keep cities with populations exceeding 1
million, then perform the appropriate data manipulations to sum up the
number of people living in these large cities within each country. Next,
properly combine this information with the world
shapefile
used in earlier examples. You will need to manually repair the names of
seven countries to make this work. After doing so, color each country by
the proportion of it’s “POP_EST” that lives in cities larger than 1
million.
data("world.cities")
Shown below is an example of this map (yours can use different colors, mine used the “Oranges” color scheme):
\(~\)
The addMarkers()
function is the most common way to
display geographic points on a leaflet map. At minimum, a marker only
needs a latitude, lat
, and longitude, lng
.
However, markers are most effective when combined with labels and/or
popups:
data("us.cities")
leaflet(data = filter(us.cities, pop > 500000)) %>%
addTiles() %>%
addMarkers(lng = ~long, lat = ~lat,
label = ~paste(name), ## Hover on a marker to see the city name
popup = ~paste("Population:", pop)) ## Click on a marker to see the population
In this example, we filtered us.cities
to only include
cities with large populations so that we could avoid plotting too many
markers in a small geographic area.
However, we could sensibly display all 1005 cities using cluster markers:
leaflet(data = us.cities) %>%
addTiles() %>%
addMarkers(lng = ~long, lat = ~lat,
label = ~paste(name),
popup = ~paste("Population:", pop),
clusterOptions = markerClusterOptions())
Or by using circle markers with a low opacity to make over-plotting less of an issue:
leaflet(data = us.cities) %>%
addTiles() %>%
addCircleMarkers(lng = ~long, lat = ~lat,
label = ~paste(name),
popup = ~paste("Population:", pop),
fillOpacity = 0.20,
color = "yellow",
stroke = FALSE)
In the example above, the argument fillOpacity = 0.2
makes each marker 80% transparent, and the argument
stroke = FALSE
removes the solid outline around each
marker.
Question #5: Using the world.cities
data, create a map where each city is depicted by a circle marker that
displays the city’s name and country when you hover over it, and the
city’s population when you click on it. Additionally, your map should
color capital cities red and non-capitals yellow and you should use
clusterOptions = markerClusterOptions()
to display clusters
when zoomed out.
Shown below is an example of this map (yours should appear similar to it):