Making Maps with Leaflet in R

I’ve been searching for a way to interactively display some geospatial data. After trying out a few things I found {leaflet} - so easy and beautiful! I started out reading this excellent primer, then got to work with specific examples. Let’s take a look and see what it can do ->

#libraries
library(tidyverse) #for manipulating and visualizing data
library(leaflet) #for making interactive maps

#upload data from Geotracker - LA County
URL <- "https://geotracker.waterboards.ca.gov/data_download/geo_by_county/LosAngelesGeoXY.zip"
download.file(URL, destfile='LosAngelesGeoXY.zip', method='curl')
unzip('LosAngelesGeoXY.zip')

LA_XY <- read.delim("LosAngelesGeoXY.txt")
head(LA_XY)
##        COUNTY   GLOBAL_ID FIELD_PT_NAME FIELD_PT_CLASS XY_SURVEY_DATE LATITUDE
## 1 Los Angeles T0603701211      GMX-RPZ7             PZ     2001-01-25 34.02664
## 2 Los Angeles T0603701211      GMX-RPZ8             PZ     2001-07-11 34.01503
## 3 Los Angeles T0603701211      GMX-RPZ9             PZ     2001-07-11 34.01506
## 4 Los Angeles T0603701211       KMX-MW1             MW     1999-08-27 34.01560
## 5 Los Angeles T0603701211       KMX-MW2             MW     1999-08-27 34.01787
## 6 Los Angeles T0603701211       KMX-MW3             MW     1999-08-27 34.01728
##   LONGITUDE XY_METHOD XY_DATUM XY_ACC_VAL  XY_SURVEY_ORG GPS_EQUIP_TYPE
## 1 -118.4214      CGPS    NAD83          3 Calvada Survey           L530
## 2 -118.4168      CGPS    NAD83          3 Calvada Survey           L530
## 3 -118.4169      CGPS    NAD83          3 Calvada Survey           L530
## 4 -118.4266      CGPS    NAD83          3 Calvada Survey           L530
## 5 -118.4251      CGPS    NAD83          3 Calvada Survey           L530
## 6 -118.4246      CGPS    NAD83          3 Calvada Survey           L530
##   XY_SURVEY_DESC
## 1               
## 2               
## 3               
## 4               
## 5               
## 6

Similar to the well measurements data, there is documentation on the geotracker for each of these column ids. The GLOBAL_ID represents a site and matches up with the concentration data files. Let’s take a closer look at the FIELD_PT_CLASS.

LA_XY %>%
  select(FIELD_PT_CLASS) %>%
  summary()
##  FIELD_PT_CLASS
##    :  1        
##  BH: 11        
##  MW:109        
##  PZ:  6        
##  SG: 30

There are four codes in our data - BH, MW, PZ, SG. You might guess that BH is borehole and MW is monitoring well. I would guess that PZ stands for piezometer and I’m actually not sure what SG stands for. The documentation lists 33 valid values for FIELD_PT_CLASS, but PZ and SG are not on the list.

The date on the documentation is 2005, so maybe these codes have been added in the 15 years since then. A disclaimer at the top of the list states that new values are added occasionally, but the link to see the current list is broken :(

In any case, we’ll plot these points and see what we can add to make them useful.

LA_wells_map <- leaflet(LA_XY) %>%
  addProviderTiles("CartoDB.Positron") %>% #using the CartoDB.Positron tiles; there are other options!
  addCircleMarkers(lng = ~LONGITUDE,
                   lat = ~LATITUDE, 
                   )
#LA_wells_map 

Look at that! With just a few lines of code we’ve made a map of points that we can pan and zoom. I was honestly expecting a lot more sites. There are other basemaps that you can use, so take a look at what’s available and pick what’s right for you. At this point my map doesn’t display that much information though, so let’s see what we can improve.

#make a palette to add colors
pal <- colorFactor(topo.colors(5), LA_XY$FIELD_PT_CLASS)

LA_wells_map_with_color <- leaflet(LA_XY) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(lng = ~LONGITUDE,
                   lat = ~LATITUDE, 
                   label= ~as.character(FIELD_PT_CLASS), #add a label
                   color = ~pal(FIELD_PT_CLASS)) #add colors

#LA_wells_map_with_color

This shows which class the points are and when you hover over the class it shows the class code. If you zoom in to the site near San Gabriel, you can see three monitoring wells (the two close to each other are probably the same one) and a pattern of SG class points that is fairly uniform. The site near El Monte is made up only of boreholes, as is the one near Manhattan Beach. The others have a mix of monitoring wells and piezometers.

The label field in our call to leaflet creates a tag that shows up when we hover our mouse over it. There’s another field for popup, which will make a box that appears when you click on a point. Let’s add some data and put it in a popup box. To do this we’ll have to get the concentration data for the sites and join by the FIELD_PT_NAME

URL <- "https://geotracker.waterboards.ca.gov/data_download/edf_by_county/LosAngelesEDF.zip"
download.file(URL, destfile='LosAngelesEDF.zip', method='curl')
unzip('LosAngelesEDF.zip')

LA_EDF <- read.delim("LosAngelesEDF.txt")
str(LA_EDF)
## 'data.frame':    128666 obs. of  23 variables:
##  $ COUNTY       : Factor w/ 2 levels "\032","Los Angeles": 2 2 2 2 2 2 2 2 2 2 ...
##  $ GLOBAL_ID    : Factor w/ 8 levels "","SL603799209",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ FIELD_PT_NAME: Factor w/ 127 levels "","15200","15210",..: 50 31 40 50 45 39 39 39 32 40 ...
##  $ LOGDATE      : Factor w/ 274 levels "","2006-07-10",..: 56 46 46 56 47 46 46 46 46 46 ...
##  $ LOGTIME      : int  545 1230 945 850 1130 730 850 715 1215 900 ...
##  $ LOGCODE      : Factor w/ 10 levels "","AEII","EAIH",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ SAMPID       : Factor w/ 2537 levels "","15200","15210",..: 2062 1835 1954 2061 2033 1934 1935 1936 1852 1952 ...
##  $ MATRIX       : Factor w/ 7 levels "","AX","GS","IA",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ LABWO        : Factor w/ 45 levels "","1609-03","1803292",..: 45 45 45 45 45 45 45 45 45 45 ...
##  $ LABCODE      : Factor w/ 9 levels "","AAC","ASLL",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ LABSAMPID    : Factor w/ 2656 levels "","060703521",..: 525 437 443 527 455 439 440 435 436 442 ...
##  $ ANMCODE      : Factor w/ 24 levels "","8260+OX","8260FA",..: 23 23 23 23 23 23 23 23 23 23 ...
##  $ LABLOTCTL    : Factor w/ 652 levels "","060711B01",..: 144 123 127 144 129 123 123 123 123 123 ...
##  $ ANADATE      : Factor w/ 384 levels "","2006-07-11",..: 87 72 74 87 76 72 72 72 72 72 ...
##  $ BASIS        : Factor w/ 7 levels "","A","D","L",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ PARLABEL     : Factor w/ 150 levels "","4:2FTS","6:2FTS",..: 80 12 56 141 90 141 100 82 141 68 ...
##  $ PARVAL       : num  10 1 1 1 10 1 5 1 1 2 ...
##  $ PARVQ        : Factor w/ 5 levels "","<","=","ND",..: 2 2 2 2 3 2 2 2 2 2 ...
##  $ LABDL        : num  5.4 0.24 0.36 0.23 4.3 0.23 1.5 0.26 0.23 0.18 ...
##  $ REPDL        : num  10 1 1 1 10 1 5 1 1 2 ...
##  $ UNITS        : Factor w/ 10 levels "","MG/KG","MG/L",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ DILFAC       : num  1 1 1 1 1 1 5 1 1 1 ...
##  $ LNOTE        : Factor w/ 72 levels "","B","B,VCO",..: 1 1 1 1 68 1 1 1 1 1 ...

Ooof that’s a lot of data. The question now is what do we want to put in the popup? Most recent concentration of a small class of chemicals? Maximum concentration? Could we add a plot of the concentration over time? I haven’t been able to get the plot-within-a-plot working (yet), so let’s go with maximum concentration of Trichloroethene (TCE) at each well.

TCE_conc <- LA_EDF %>%
  filter(PARLABEL == "TCE") %>% 
  group_by(FIELD_PT_NAME) %>% 
  mutate(TCE = max(PARVAL)) %>%
  select(FIELD_PT_NAME, TCE, UNITS) %>%
  filter(UNITS %in% c("UG/L", "MG/L")) %>%
  unique() %>%
  mutate(value = case_when(UNITS == "MG/L" ~ 1000, T ~ 1)) %>% #create a col with conversion factor
  mutate(TCE = TCE*value) #convert mg/L to ug/L

LA_TCE_xy <- TCE_conc %>%
  inner_join(LA_XY, by = "FIELD_PT_NAME") %>%
  select(FIELD_PT_NAME, TCE, LATITUDE, LONGITUDE)

TCEpal <- colorNumeric(
  palette = "YlGnBu",
  domain = 0:110)

popup1 <- "TCE concentration: "  #I bet there's a better way to do this
popup2 <- " ug/L"                #Tell me in the comments!

LA_TCE_xy$pop1 <- popup1         #Like and subscribe!
LA_TCE_xy$pop2 <- popup2
LA_TCE_xy$popup <- paste0(LA_TCE_xy$pop1, LA_TCE_xy$TCE, LA_TCE_xy$pop2)

LA_wells_TCE_map <- leaflet(LA_TCE_xy) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(lng = ~LONGITUDE,
                   lat = ~LATITUDE, 
                   color = ~TCEpal(TCE),
                   popup = ~popup) %>% #add colors
  addLegend(pal = TCEpal, values = ~TCE, opacity = 0.7, title = NULL,
  position = "bottomright")

#LA_wells_TCE_map

You may notice some annoying things about this map; most of the values are very low so they skew the colors to almost zero and the scale starts with highest values on the bottom, which is counterintuitive to me. Also we’ve lost some points - the spatial file had 157 points but when we join it with concentration data we’re left with 69 points.

One common way to visualize data with a large range is with a log transformation. Let’s try it out!

LA_TCE_xy <- LA_TCE_xy %>%        #log transform the concentrations
  mutate(TCE = case_when(TCE == 0 ~ 0.0001, T ~ TCE)) %>%
  mutate(log_TCE = log(TCE))

logpal <- colorNumeric(
  palette = "YlGnBu",
  domain = -9.2105:14.701)

LA_wells_logTCE_map <- leaflet(LA_TCE_xy) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(lng = ~LONGITUDE,
                   lat = ~LATITUDE, 
                   color = ~logpal(log_TCE),
                   popup = ~popup) 

#LA_wells_logTCE_map

If we use the log of the TCE value it’s easier to see the changes in color, though it could still use some fiddling with. I took out the scale becuse the popup still says the real value but the color is now based on a log value. There seem to be some workarounds for this but I didn’t try them out. There are probably better color schemes out there but that’s like entering another world, so I’ll leave it at this.

I’m sure that there are many ways to improve this map and build on these concepts in leaflet. Take a look at what’s out there an let me know if you discover anything you love!

Liz McConnell
Liz McConnell
Graduate Student, CSU Center for Contaminant Hydrology

My research interests include contaminant fate and transport, data analysis using statistics and machine learning, R programming, and geospatial analysis.

comments powered by Disqus

Related