This lab focuses on creating maps to display spatial data. It covers both static maps (created using ggplot2) and interactive maps (created using leaflet).

Directions (Please read before starting)

  1. Please work together with your assigned partner. Make sure you both fully understand each concept before you move on.
  2. Please record your answers and any related code for all embedded lab questions. I encourage you to try out the embedded examples, but you shouldn’t turn them in.
  3. Please ask for help, clarification, or even just a check-in if anything seems unclear.

\(~\)

Preamble

Packages and Datasets

This lab will use both the ggplot2 and leaflet packages. It will also require the dplyr package for merging and joining, the maps package, which contains the spatial information needed to create a variety commonly used maps, and maptools, a package used to process and read certain spatial data files.

# load the following packages
# install.packages("leaflet")
# install.packages("maps")
# install.packages("maptools")
library(leaflet)
library(ggplot2)
library(dplyr)
library(maps)
library(maptools)

Unfortunately, maptools is scheduled to be retired this month without a clear successor. However, we can still use it for the moment, and the broader concepts of shapefiles and map making will remain even if the functions and syntax involved eventually change.

\(~\)

Preamble

There are three main geometric elements that are commonly used to represent spatial data:

  1. Points - simple coordinates showing the position of on an object (ie: cities, sites, etc.)
  2. Lines - ordered sets of coordinates that can be connected (ie: roadways, power lines, etc.)
  3. Polygons - ordered sets of coordinates that form an enclosed shape when connected (ie: states, districts, etc.)

Aesthetics like color, size, and brightness can be used to encode additional information using these geometric elements.

\(~\)

Choropleth Maps

A choropleth map uses colored polygons to represent values corresponding variables measured across geographic units.

The first step in creating a choropleth map is creating the polygons that define each geographic unit:

## Get US state polygons from the "maps" package
state_polys <- map_data("state")  

## Graph these polygons
ggplot(data = state_polys, aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "black", fill = "lightblue")

In state_polys, each “region” is defined by a series of ordered vertices, each with its own longitude and latitude. The “group” and “order” of these vertices determines how the edges of that region’s polygon are drawn. If these variables are altered during a data processing step like merging and joining, the polygons will not be drawn correctly.

Next, we need color these polygon according to a variable. This requires joining new data with the data frame that contains the polygons:

## USArrests data from the "datasets" package
data("USArrests")

## Create a data frame to join
state_values <- data.frame(State = tolower(rownames(USArrests)), 
                           MurderRate = USArrests$Murder)

## Add MurderRate to the state polygons
full_data <- left_join(x = state_polys, y = state_values, by = c("region" = "State"))

## Color the polygons using a fill variable
ggplot(data = full_data, aes(x = long, y = lat, group = group, fill = MurderRate)) +
                        geom_polygon(color = "black") + 
  scale_fill_continuous(type = "gradient", high = "darkred", low = "white")

\(~\)

Adding points or lines

If you’d like to create a map displaying multiple types of spatial features (such as polygons and points), you’ll need to locally specify each in a different ggplot layer (since both polygons and lines/points use the “x” and “y” aesthetics). This is demonstrated in the example below:

data("us.cities") ## Data on 1005 US cities

## Blank ggplot, then polygons for states, then points for cities
ggplot() +
   geom_polygon(data = full_data, aes(x = long, y = lat, group = group, fill = MurderRate), color = "black") +
   geom_point(data = us.cities, aes(x = long, y = lat), size = 0.1, color = "red")

Here, we add red points corresponding to various US cities to the murder rate choropleth map we previously created (notice how different data arguments are used within geom_polygon() and geom_point()).

\(~\)

Lab

Practicing with polygons and points

For the following question you should use the “state” and “county” polygons found in the maps package, as well as the us.cities data frame:

state_polys <- map_data("state")
county_polys <- map_data("county")
data("us.cities")

Question #1: Filter the us.cities data to include only state capitals (capital == 2) in the contiguous United States (exclude cities in AK and HI). Then create a map displaying state polygons (defined by black borders with fill = NA, size = 0.3 and alpha = 0.3) and counties (defined by blue borders with fill = "white", size = 0.1 and alpha = 0.5) that also includes state capitals as points (with color = "yellow"). To exploit the mechanics of layering, you should add the county polygons first (so state boundaries are drawn over them) and the cities last (so they appear “on top”). Your end result should be similar to the map provided below:

\(~\)

Shapefiles

The polygon boundaries given by map_data() are easy to understand, but they lack the flexibility that can be provided by more complex spatial storage structures. In many applications, geographic information is stored in a shapefile, a complex data structure that must contain the following components:

  • .shp - file containing geometry of all features
  • .shx - file indexing the geometry
  • .dbf - file storing certain feature attributes in a tabular format (things like population, area, etc.)

To practice working with a shapefile, you should download and extract:

This shapefile was originally obtained from Natural Earth Data. You should notice it contains .shp, .shx, and .dbf files (along with a few other files).

To begin, we’ll load the shapefile using the readShapeSpatial() function within the maptools package:

world <- readShapeSpatial("C:/Users/millerry/Downloads/world_shape/ne_50m_admin_0_countries")  ## You will need to change this file path

Notice we used a local path that specified the root names of the various files, and we omitted their .shp/.shx/.dbf file extensions.

Next, we’ll use the str() function to print the structure of world. We can see it contains 5 “slots”, which are items we can reference using @ (similar to $ for data.frame objects):

str(world, max.level = 2)  ## Only print the first 2 levels
## Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  242 obs. of  168 variables:
##   .. ..- attr(*, "data_types")= chr [1:168] "C" "N" "N" "C" ...
##   ..@ polygons   :List of 242
##   ..@ plotOrder  : int [1:242] 240 76 203 17 196 211 226 182 134 232 ...
##   ..@ bbox       : num [1:2, 1:2] -180 -90 180 83.6
##   .. ..- attr(*, "dimnames")=List of 2
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
##   ..$ comment: chr "FALSE"

Unfortunately, ggplot cannot work directly with shapefiles, so we must use the fortify() function to convert them to a workable format:

world_polys <- fortify(world)  ## Converts the shapefile into a data frame of polygons
ggplot() +  geom_polygon(data = world_polys, aes(x = long, y = lat, group = group), fill = "white", color = "blue")

Note that fortify() will only create a data frame version of the polygons contained in the shape file, it will not attach any additional variables that you might want to use. To attach a data variable, you must create your own “id” that matches the ordering of the “id” that was automatically generated by fortify:

# Create a new column named "id" in the @data object
my_idx = as.character(seq(0, nrow(world@data)-1, by=1))
world_data <- mutate(world@data, id = my_idx)

# Join the fortified polygons w/ word data by this id
world_polys_data <- left_join(world_polys, world_data, by = "id")

# Test it out
BRICS <- c("Brazil", "Russia", "India", "People's Republic of China", "South Africa") ## BRICS nations
ggplot() +  geom_polygon(data = world_polys_data, 
                         aes(x = long, y = lat, group = group, 
                             fill = ifelse(NAME_EN %in% BRICS, "BRICS", "not")), 
                           color = "black") + guides(fill = FALSE) +
            theme_void()

Note that many of you had previously questioned where theme_void() could be used. Maps are one example where it makes sense to omit the axes.

Question #2: The variable “GDP_MD” (already present in the merged data frame created above) records the gross domestic product (GDP) in millions of US dollars for the year 2019. For this question, create a choropleth that colors each country from white to green according to its GDP. Hint: you should use the scale_fill_continuous() function.

\(~\)

Leaflet

Another popular tool used to create maps is leaflet, an open-source JavaScript library used by many professional organizations. Similar to plotly, maps created using leaflet are interactive and can contain hoverable/clickable components.

The base layer of any leaflet map is a set of map tiles. Additional layers, such as polygons, lines, or point markers can be added via piping:

## Just the base layer
leaflet() %>% addTiles()
## Base layer plus polygons from the world shapefile
leaflet(world) %>% 
  addTiles() %>% 
  addPolygons()

A benefit of leaflet is that it can use shapefiles directly, so we don’t need to worry about the pre-processing steps that were required by ggplot.

\(~\)

Leaflet map tiles

By default, addTiles() will generate a base map using OpenStreetMap. This works great for the vast majority of mapping applications, but if you’d like something different Leaflet offers a list of alternative tile sets in the providers object. These can be used via the addProviderTiles() function:

## List of provider tile sets
head(providers)
## $OpenStreetMap
## [1] "OpenStreetMap"
## 
## $OpenStreetMap.Mapnik
## [1] "OpenStreetMap.Mapnik"
## 
## $OpenStreetMap.DE
## [1] "OpenStreetMap.DE"
## 
## $OpenStreetMap.CH
## [1] "OpenStreetMap.CH"
## 
## $OpenStreetMap.France
## [1] "OpenStreetMap.France"
## 
## $OpenStreetMap.HOT
## [1] "OpenStreetMap.HOT"
## Example of provider tiles
leaflet() %>% addProviderTiles(providers$Esri.WorldPhysical)

\(~\)

Leaflet polygons

In leaflet, polygon attributes are controlled by the arguments of addPolygons(). Of particular importance is the fillColor argument, which requires a vector of colors with a length equal to the number of polygons.

The code below demonstrates how to use the fillColor to color each country by the base-ten log of “POP_EST” using the “magma” color palate:

## Vector of values to color by
col_vals <- log10(world@data$POP_EST)
col_vals[is.infinite(col_vals)] <- NA   ## Replace infinities with NA

## Create a color palette (using the "magma" color scheme)
pal <- colorNumeric("magma", domain = col_vals)

leaflet(world) %>% addTiles() %>% 
      addPolygons(fillColor = pal(col_vals),  weight = 1) %>%
      addLegend(pal = pal, values = col_vals,
            title = "Estimated population (log10)", position = "bottomleft", na.label = "Missing")

Leaflet will not generate a color legend by default, so in the example above we created our own using the addLegend() function.

Question #3: Use Leaflet to create a choropleth map displaying the variable “GDP_MD” using the “RdYlBu” color palette. Your map should include an appropriately formatted legend and use the “Stamen.Toner” provider tile set.

\(~\)

Labels and popups

There are two ways to add interactive components to a Leaflet map:

  • Labels appear when the tool-tip hovers over a geometric element
  • Popups appear when a geometric element is clicked on

Similar to plotly, Leaflet allows the use of HTML commands to modify the format and appearance of labels and popups:

## Set up the labels and popup
myLabels <- paste("<strong>", world@data$NAME_EN, "</strong>", "<br/>", 
                   "Population:", prettyNum(world@data$POP_EST, big.mark = ","))  # Adds commas to pop
myPopups <- paste("Income Group", world@data$INCOME_GRP)


## Add to map (using "highlight" argument)
leaflet(world) %>% addTiles() %>% 
      addPolygons(fillColor = pal(col_vals),
                  weight = 1,
                  highlight = highlightOptions(
                     weight = 3,
                     color = "grey",
                     fillOpacity = 0.7,
                     bringToFront = TRUE),
                     label = lapply(myLabels, htmltools::HTML),
                     popup = myPopups) %>%
  addLegend(pal = pal, values = col_vals,
            title = "Estimated population (log10)", position = "bottomleft", na.label = "Missing")

The highlight argument dictates how a selected polygon should be highlighted (in this case it will be brought to the front with a grey border and an opaque fill)

Additionally, for labels/popup text with HTML commands to render properly, the HTML() function in the htmltools package needs to be applied to each element of the vector containing the labels (achieved by the lapply() function). This is one way that leaflet features differ from those native to plotly (which can interpret HTML without any added packages/tools).

\(~\)

Leaflet polygons using external data

Often we’d like to combine multiple data sources to create a novel map. For example, we might want to incorporate happiness ratings or economic indicators from the “Happy Planet” data to create an interactive choropleth map.

When using multiple data sources, we must pay special attention to how our second dataset is merged with the @data component of our shapefile. In particular, we need to be mindful of different naming conventions used by different sources to represent the same entity.

## Load happy planet data
hp <- read.csv("https://remiller1450.github.io/data/HappyPlanet.csv")  ## contains 143 countries

### find non-matching countries using anti_join
non_matches <- anti_join(x = hp, y = world@data, by = c("Country" = "NAME_EN"))
non_matches$Country 
## [1] "Burma"                   "China"                  
## [3] "Congo"                   "Congo, Dem. Rep. of the"
## [5] "Korea"                   "Macedonia"

Notice that there were 6 countries in the Happy Planet data whose names do not match any of the polygons in world.

However, this is not because those countries lack polygon boundaries (recognize that China has clearly been displayed in our previous maps). Instead, it is because the countries were named differently in each data source. Thus, we must manually correct the misaligned names to ensure we are able to match as many countries as possible.

## Repair names (manually)
hp$Country[hp$Country == "China"] <- "People's Republic of China"
hp$Country[hp$Country == "Congo, Dem. Rep. of the"] <- "Democratic Republic of the Congo"
hp$Country[hp$Country == "Congo"] <- "Republic of the Congo"
hp$Country[hp$Country == "Korea"] <- "South Korea"
hp$Country[hp$Country == "Burma"] <- "Myanmar"
hp$Country[hp$Country == "Macedonia"] <- "North Macedonia"

### check for non-matches (none)
anti_join(x = hp, y = world@data, by = c("Country" = "NAME_EN"))
##  [1] Country        Region         Happiness      LifeExpectancy Footprint     
##  [6] HLY            HPI            HPIRank        GDPperCapita   HDI           
## [11] Population    
## <0 rows> (or 0-length row.names)

We’re now ready to join the Happy Planet data with our shapefile and create a map:

world <- readShapeSpatial("C:/Users/millerry/Downloads/world_shape/ne_50m_admin_0_countries") ## Reload before overwriting
world@data <- left_join(x = world@data, y = hp, by = c("NAME_EN" = "Country"))

## Vector of values to color by
col_vals <- world@data$Happiness
col_vals[is.infinite(col_vals)] <- NA   ## Replace infinities with NA
pal <- colorNumeric("viridis", domain = col_vals)

## Add to map (using "highlight" argument)
leaflet(world) %>% addTiles() %>% 
      addPolygons(fillColor = pal(col_vals),
                  weight = 1) %>%
  addLegend(pal = pal, values = col_vals,
            title = "Happiness", position = "bottomleft")

Question #4: Starting with the world.cities data frame from the maps package. Filter the data to only keep cities with populations exceeding 1 million, then perform the appropriate data manipulations to sum up the number of people living in these large cities within each country. Next, properly combine this information with the world shapefile used in earlier examples. You will need to manually repair the names of seven countries to make this work. After doing so, color each country by the proportion of it’s “POP_EST” that lives in cities larger than 1 million.

data("world.cities")

Shown below is an example of this map (yours can use different colors, mine used the “Oranges” color scheme):

\(~\)

Leaflet markers

The addMarkers() function is the most common way to display geographic points on a leaflet map. At minimum, a marker only needs a latitude, lat, and longitude, lng. However, markers are most effective when combined with labels and/or popups:

data("us.cities")
leaflet(data = filter(us.cities, pop > 500000)) %>%
          addTiles() %>%
          addMarkers(lng = ~long, lat = ~lat,
                     label = ~paste(name),                ## Hover on a marker to see the city name
                     popup = ~paste("Population:", pop))  ## Click on a marker to see the population

In this example, we filtered us.cities to only include cities with large populations so that we could avoid plotting too many markers in a small geographic area.

However, we could sensibly display all 1005 cities using cluster markers:

leaflet(data = us.cities) %>%
          addTiles() %>%
          addMarkers(lng = ~long, lat = ~lat,
                     label = ~paste(name),
                     popup = ~paste("Population:", pop),
                   clusterOptions = markerClusterOptions())

Or by using circle markers with a low opacity to make over-plotting less of an issue:

leaflet(data = us.cities) %>%
          addTiles() %>%
          addCircleMarkers(lng = ~long, lat = ~lat,
                     label = ~paste(name),
                     popup = ~paste("Population:", pop),
                     fillOpacity = 0.20,
                     color = "yellow",
                     stroke = FALSE)

In the example above, the argument fillOpacity = 0.2 makes each marker 80% transparent, and the argument stroke = FALSE removes the solid outline around each marker.

Question #5: Using the world.cities data, create a map where each city is depicted by a circle marker that displays the city’s name and country when you hover over it, and the city’s population when you click on it. Additionally, your map should color capital cities red and non-capitals yellow and you should use clusterOptions = markerClusterOptions() to display clusters when zoomed out.

Shown below is an example of this map (yours should appear similar to it):