Spatial Analysis of Houston Power Outages

Using geospatial data to understand Houston power outages.
R
Geospatial
Author
Affiliation
Published

September 14, 2023

Introduction

The state of Texas experienced significant power loss following three severe winter storms in February 2021, affecting many families and communities in the state. More background on this can be found on Wikipedia.

I will be estimating the number of homes in Houston that lost power due to these storms, and investigating if socioeconomic factors were predictors of community recovery from a power outage. To do so, I will be working with raster and vector data, and joining spatial data.

Data

Night lights

I will be using remotely-sensed data on night lights, gathered from the Visible Infrared Imaging Radiometer Suite (VIIRS), which is on the Suomi satellite. Specifically, I will be accessing the VNP46A1 to detect the difference in night lights prior to and following the storms to identify areas that lost electricity due to the storms. I accessed the VIIRS data through NASA’s Level-1 and Atmospheric Archive & Distribution System Distributed Active Archive Center (LAADS DAAC).

I will be using data from February 7, 2021 and February 16, 2021. After exploring data on NASA Worldview, these two dates have a low amount of cloud cover compared to surrounding dates, giving two clear, but contrasting imaged to understand the areas that experienced a power outage in Texas.

Roads

I used Geofabrik’s download sites to retrieve a shapefile of all highways in Texas and prepared a Geopackage (.gpkg file) containing just the subset of roads that intersect the Houston metropolitan area This is a third party company that redistributes OSM Data, as ingesting this data into a database where it can be subsetted and processed is a large undertaking. OpenStreetMap (OSM) is a collaborative project which creates publicly available geographic data of the world and is where the original data is gathered.

Typically highways account for a large portion of the night lights observable from space (see Google’s Earth at Night). To minimize falsely identifying areas with reduced traffic as areas without power, I will be ignoring areas close to highways.

Houses

OpenStreetMap also has building data. I again downloaded from Geofabrick and prepared a file containing only houses in the Houston metropolitan area.

Socioeconomic

Since I can’t readily access all socioeconomic information for every home in the Houston area, I obtained data from the U.S. Census Bureau’s American Community Survey for census tracts in 2019.


Spatial Analysis

1. Finding locations of blackouts

Combining the data

Here, I will start with loading in the appropriate packages and reading in the night light tiles I downloaded. Then, I will combine the tiles into a single object for each date (one before the storm, one after the storm).

Show code
library(sf) #Support for simple features
library(stars) #Manipulating spatiotemporal arrays 
library(dplyr) #Data manipulation
library(tmap) #map making
library(ggplot2) #plot making
library(here) #path construction
library(terra) #spatial analysis of vector data

#read in night light tiles
nightlights_pic1 <- read_stars(here("posts/2023-09-06-houston_spatial_analysis/data/VNP46A1/VNP46A1.A2021038.h08v05.001.2021039064328.tif"))

nightlights_pic2 <- read_stars(here("posts/2023-09-06-houston_spatial_analysis/data/VNP46A1/VNP46A1.A2021038.h08v06.001.2021039064329.tif"))

nightlights_pic3 <- read_stars(here("posts/2023-09-06-houston_spatial_analysis/data/VNP46A1/VNP46A1.A2021047.h08v05.001.2021048091106.tif"))

nightlights_pic4 <- read_stars(here("posts/2023-09-06-houston_spatial_analysis/data/VNP46A1/VNP46A1.A2021047.h08v06.001.2021048091105.tif"))
Show code
#combine into one stars object for each day

#2021-02-07
night_tiles_2021_02_07 <- st_mosaic(nightlights_pic1, nightlights_pic2)

#2021-02-16
night_tiles_2021_02_16 <- st_mosaic(nightlights_pic3, nightlights_pic4)
Creating a blackout mask

Masking is a subsetting technique for spatial data. Creating a mask identifies cells with information I want to know, make that its own layer, and then can be applied to the other raster layer.

I am finding the difference in intensity of night lights between the two dates I have spatial data for. I am making an assumption that the difference is caused by the storm. I have to reclassify the difference raster to create the mask, so I make an assumption that anywhere that had a difference of 200 nW cm-2sr-1 (unit of radiance) or more experienced a blackout. Then, I will assign NA all locations that experienced a difference of less than this threshold, since the difference is not great enough to indicate a power outage.

Show code
#find the change in night lights intensity caused by storm by subtracting the Feb 7 raster from Feb 16 raster
intensity_diff <- night_tiles_2021_02_16 - night_tiles_2021_02_07

plot(intensity_diff,
     main = "Change in night lights intensity due to storm")

Figure 1. First step in creating blackout mask is to find the difference in intensity of lights from before and after the storm.
Show code
#start by reclassifying intensity diff raster to recognize >200 as a blackout
mask <- cut(intensity_diff, c(200, Inf), labels = c("Blackout"))

#reassign raster to assign NA to all locations of less than 200 n
mask[mask < 200] = NA

#blackout mask with <200 = NA and >200 = blackout
plot(mask, breaks = "equal", col = "black") 

Figure 2. Blackout mask, where <200 nW cm-2sr-1 = NA and >200 nW cm-2sr-1 = blackout
Vectorize the mask

Now, I will use st_as_sf() from the sf package to vectorize the blackout mask so I can continue working with the mask I made, ultimately being able to overlay it with other data and determine where the blackouts occurred. I also will use st_make_valid from the same package to fix any invalid geometries.

Show code
vector_mask <- st_as_sf(mask) |> #vectorize sf mask
  st_make_valid() #fix invalid geoms

plot(vector_mask)

Figure 3. Vectorized blackout mask
Cropping the vectorized map to our region of interest

I am interest in the Houston Metropolitan area, and will be defining this area by the following coordinates:

(-96.5, 29), (-96.5, 30.5), (-94.5, 30.5), (-94.5, 29)

Now that I have the coordinates of interest, I will turn them into a polygon using st_polygon function from the sf package. A polygon is a set of discrete spatial points, composed of vertices of many spatial coordinates.

Then, I will convert the polygon into a simple feature collection using st_sfc() and assign a coordinate reference system (CRS). I am assigning a CRS since the polygon’s CRS must match that of the night lights data I downloaded if I want to join them. So, I will reproject the cropped blackout data to EPSG:3083 (NAD83 / Texas Centric Albers Equal Area).

Then, I will spatially subset/crop the blackout mask I created to the Houston region I defined as a polygon.

Show code
# define coords 
ne <- c(-94.5, 30.5) 
se <- c(-94.5, 29) 
sw <- c(-96.5, 29)
nw <- c(-96.5, 30.5) 

# st_polygon input (since we can't just put these points in by themselves, input has to be a list) and we will make polygon of Houston using st_polygons
houston_boundary <- list(rbind(sw, nw, ne, se, sw))

# polygon to represent Houston
houston_points <- st_polygon(x = houston_boundary, if (length(x)) "XYZ" else "XY")

# Convert houston to a simple feature collection to feed in into st_crop
houston_sf <- st_sfc(houston_points, crs = 4326)

# Crop the blackout mask to houston area
blackout_cropped <- st_crop(vector_mask, houston_sf)

# Reproject cropped blackout mask onto crs of interest (3083)
reproj_houst_blackout <- st_transform(blackout_cropped, crs = 3083)

plot(reproj_houst_blackout)

Figure 4. Cropped blackout mask overlaid in Houston region
Excluding highways from blackout mask

The data I downloaded for the roads includes data on roads other than highways. So, to avoid reading in data I won’t be using, I will take advantage of the st_read function’s ability to subset using a SQL query.

To do so, I will start by defining a SQL query and load only the highway data using this query and st_read. I then have to reproject this data to the same CRS I used earlier (EPSG:3038).

To identify areas that are within 200 meters of all highways, I will use st_buffer that will compute a polygon that represents all points this range. This will allow me to find the areas that experienced blackouts that are further than 200 meters from a highway (to eliminate the lights produced by street lamps and other lights along highways that are detected).

Show code
#define a SQL query
query <- "SELECT * FROM gis_osm_roads_free_1 WHERE fclass='motorway'"

#load highway data from geopackage  THIS PART NEEDS TO BE FIXED!!!!!!!!!!
highways <- st_read("C:/Users/kiran/Documents/MEDS 2022-2023/kiranfavre.github.io/posts/2023-09-06-houston_spatial_analysis/data/gis_osm_roads_free_1.gpkg", query = query)

#reproject to EPSG 3083
reproj_highways <- st_transform(highways, crs = 3083)

#identify areas within 200m of all highways
highway_buffers <- st_buffer(reproj_highways, dist = 200) |> 
  st_union()

#identify areas that are further away than 200, from a highway that had a blackout
outside_hwy_buffers <- st_disjoint(x = highway_buffers, y = reproj_houst_blackout) 

2. Finding homes impacted by blackouts

Loading buildings data

To find homes impacted by blackouts, I will start by loading in the building dataset and use another SQL query to only select residential buildings. As I did with the earlier data, I will also reproject this data to the same CRS.

Show code
#make sql query for building data
bldg_query <- "SELECT * FROM gis_osm_buildings_a_free_1 WHERE type IS NULL AND name IS NULL OR type in ('residential', 'apartments', 'house', 'static_caravan', 'detached')"

#load in building data
buildings <- st_read("C:/Users/kiran/Documents/MEDS 2022-2023/kiranfavre.github.io/posts/2023-09-06-houston_spatial_analysis/data/gis_osm_buildings_a_free_1.gpkg", query = bldg_query)

#reproject building data ro crs EPSG:3083
reproj_buildings <- st_transform(buildings, crs = 3083)
Finding homes in blackout areas

Now, I will filter the data to homes within the blackout area by using the blackout mask I created and overlay that onto the data on Houston residential buildings, and count the number of impacted homes.

Show code
#use blackout mask, overlay onto houston homes
blackout_homes <- st_join(reproj_houst_blackout, reproj_buildings)

#filter houston with blackout mask
number_blackout_homes <- blackout_homes |> 
  filter(fclass == "building") |>  #filtering for homes
  summarize(count_blackout_homes = n())  #give number of homes that were impacted by blackout

#number of homes in blackout area
print(paste0("There were " ,number_blackout_homes$count_blackout_homes, " homes in Houston that were impacted by the blackout","."))

3. Investigating socioeconomic factors

Loading Census data

I will again use st_read() to load the data, and separate the layers from the geodatabase into geometries and income data. I will select the median income field from this layer, and also reproject this data.

Show code
#ACS geodatabase geomoetries
texas_geoms <- st_read(dsn = "C:/Users/kiran/Documents/MEDS 2022-2023/kiranfavre.github.io/posts/2023-09-06-houston_spatial_analysis/data/ACS_2019_5YR_TRACT_48_TEXAS.gdb",
                       layer = "ACS_2019_5YR_TRACT_48_TEXAS")

#read in income layer from geometries
income_data <- st_read(dsn = "C:/Users/kiran/Documents/MEDS 2022-2023/kiranfavre.github.io/posts/2023-09-06-houston_spatial_analysis/data/ACS_2019_5YR_TRACT_48_TEXAS.gdb",
                       layer = "X19_INCOME")

#select median income field from income layer
median_income <- income_data |> 
  select('GEOID','B19013e1') 

#reproj texas geomms
reproj_tx_geoms <- st_transform(texas_geoms, crs = 3083)
Determining which census tracts experienced blackouts

Now, I will join the income data to the census tract geometries by geometry ID, and spatially join census tract data with buildings determined to be impacted by blackouts to find which census tracts had blackouts.

Show code
#want to join all of both data sets by GEOMID, so using a left join
joined_income_census <- left_join(reproj_tx_geoms, median_income, by = c("GEOID_Data" = "GEOID")) 

#spatially join tx census data with buildings impacted by blackouts
blackout_bldgs_census <- st_join(blackout_homes,joined_income_census)

#determine which census tracts were impacted by the blackout
blackout_tracts <- blackout_bldgs_census |> 
  group_by(TRACTCE) |> 
  summarize(B19013e1 = n()) 
#just has tract id, median income, and associated geometries
Comparison of incomes of impacted tracts to unimpacted tracts
Show code
#median income by tract, 
joined_median_income_census <- joined_income_census |> group_by(TRACTCE, B19013e1)|>
  summarize()
#just has tract id, median income, and associated geometries

#map of median income homes by census tract that experienced blackouts
tm_shape(houston_sf) + tm_polygons(col = "white") +
  tm_shape(joined_median_income_census) +
  tm_polygons(col = "white") +
  tm_shape(blackout_tracts) +
  tm_polygons(col = "blue") +
  tm_layout(main.title = "Median Income Homes in Houston Impacted by Blackouts", title.size = 0.1)  ##FIX THIS!

Figure 5. Map of median income by census tract
Show code
#distribution of income in immpacted tracts - histogram w ggplot
#make histogram 
impacted_hist <- ggplot(joined_median_income_census) +
  geom_histogram(aes(x = B19013e1),
                 color = "black",
                 fill = "seagreen3") +
  labs(x = "Median Income ($)",
       y = "Count",
       title = "Distribution of income in impacted tracts") +
  theme_classic()

impacted_hist

Figure 6. Distribution of income in impacted tracts
Show code
#need to take out geom for both median income census data and blackout datato find tracts not affected
joined_income_census_nogeom <- joined_median_income_census |> 
  st_drop_geometry()

blackout_tracts_nogeom <- blackout_tracts |> 
  st_drop_geometry()

##not joined_income_census

#need to find tracts that were not impacted by blackout
non_impacted_tracts <- anti_join(x = joined_income_census_nogeom, y = blackout_tracts_nogeom, by = "TRACTCE")

#make hist of dist
non_impacted_hist <- ggplot(non_impacted_tracts) +
  geom_histogram(aes(x = B19013e1),
                 color = "black",
                 fill = "orchid") +
  labs(x = "Median Income ($)",
       y = "Count",
       title = "Distribution of income in non-impacted tracts") +
  theme_classic()

non_impacted_hist

Figure 7. Distribution of income in non-impacted tracts
Show code
library(gridExtra)
#plot both side by side to compare
both_hists <- grid.arrange(impacted_hist,non_impacted_hist)

both_hists

Figure 8. Comparison of distribution of income fpr impacted versus non-impacted tracts

This study showed that there were more houses with a lower median income impacted by the blackout than those not impacted by the blackout. However, the distribution of income in impacted areas and non impacted areas are very similar (both being skewed to the right). Limitations of this study could include non-response bias with the Census data or could have used differing sizes for the highway buffers. People could have not responded to Census data collection, or families could have impacted the intensity of light coming from their home depending if they were there or not the days the images were collected.

Citation

BibTeX citation:
@online{favre2023,
  author = {Favre, Kiran},
  title = {Spatial {Analysis} of {Houston} {Power} {Outages}},
  date = {2023-09-14},
  url = {https://kiranfavre.github.io/posts/houston_spatial_analysis/},
  langid = {en}
}
For attribution, please cite this work as:
Favre, Kiran. 2023. “Spatial Analysis of Houston Power Outages.” September 14, 2023. https://kiranfavre.github.io/posts/houston_spatial_analysis/.