Learning Objectives
Subset bands by index or name
Extract raster data by row and column number
Extract data by bounding window
Extract raster data by coordinates
Extract raster data by geometry (point, polygon)
Raster Data Extraction#
Raster data is often of little use unless we can extract and summarize the data. For instance, extracting raster to points by coordinates allows us to pass data to machine learning models for land cover classification or cloud masking.
Subsetting rasters#
We can subset sections of the data array in a number of ways. In this case we will create a slice based on row and column location to subset LandSat data using a rasterio.window.Window
.
Either a rasterio.window.Window
object or tuple can be used with geowombat.open
.
import geowombat as gw
from geowombat.data import rgbn
from rasterio.windows import Window
w = Window(row_off=0, col_off=0, height=100, width=100)
with gw.open(rgbn,
band_names=['blue', 'green', 'red'],
num_workers=8,
indexes=[1, 2, 3],
window=w,
out_dtype='float32') as src:
print(src)
We can also slice a subset of data using a tuple of bounded coordinates.
bounds = (793475.76, 2049033.03, 794222.03, 2049527.24)
with gw.open(rgbn,
band_names=['green', 'red', 'nir'],
num_workers=8,
indexes=[2, 3, 4],
bounds=bounds,
out_dtype='float32') as src:
print(src)
The configuration manager provides an alternative method to subset rasters. See tutorial-config
for more details.
with gw.config.update(ref_bounds=bounds):
with gw.open(rgbn) as src:
print(src)
By default, the subset will be returned by the upper left coordinates of the bounds, potentially shifting cell alignment with the reference raster. To subset a raster and align it to the same grid, use the ref_tar keyword. This is equivalent to a “snap raster” in ArcGIS.
with gw.config.update(ref_bounds=bounds, ref_tar=rgbn):
with gw.open(rgbn) as src:
print(src)
Extracting data by coordinates#
To extract values at a coordinate pair, translate the coordinates into array indices. For extraction by geometry, for instance with a shapefile, see extract by point geometry.
import geowombat as gw
from geowombat.data import l8_224078_20200518
# Coordinates in map projection units
y, x = -2823031.15, 761592.60
with gw.open(l8_224078_20200518) as src:
# Transform the map coordinates to data indices
j, i = gw.coords_to_indices(x, y, src)
# Subset by index
data = src[:, i, j].data.compute()
print(data.flatten())
A latitude/longitude pair can be extracted after converting to the map projection.
import geowombat as gw
from geowombat.data import l8_224078_20200518
# Coordinates in latitude/longitude
lat, lon = -25.50142964, -54.39756038
with gw.open(l8_224078_20200518) as src:
# Transform the coordinates to map units
x, y = gw.lonlat_to_xy(lon, lat, src)
# Transform the map coordinates to data indices
j, i = gw.coords_to_indices(x, y, src)
data = src[:, i, j].data.compute()
print(data.flatten())
Extracting data with point geometry#
In the example below, ‘l8_224078_20200518_points’ is a GeoPackage of point locations, and the output df
is a GeoPandas GeoDataFrame. To extract the raster values at the point locations, use geowombat.extract
.
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_points
with gw.open(l8_224078_20200518) as src:
df = src.gw.extract(l8_224078_20200518_points)
print(df)
Note
The line df = src.gw.extract(l8_224078_20200518_points)
could also have been written as df = gw.extract(src, l8_224078_20200518_points)
.
In the previous example, the point vector had a CRS that matched the raster (i.e., EPSG=32621, or UTM zone 21N). If the CRS had not matched, the geowombat.extract
function transforms the CRS on-the-fly.
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_points
import geopandas as gpd
point_df = gpd.read_file(l8_224078_20200518_points)
print(point_df.crs)
# Transform the CRS to WGS84 lat/lon
point_df = point_df.to_crs('epsg:4326')
print(point_df.crs)
with gw.open(l8_224078_20200518) as src:
df = src.gw.extract(point_df)
print(df)
Set the data band names using sensor = 'bgr'
, which assigns the band names blue, green, red.
import geowombat as gw
from geowombat.data import l8_224078_20200518, l8_224078_20200518_points
with gw.config.update(sensor='bgr'):
with gw.open(l8_224078_20200518) as src:
df = src.gw.extract(l8_224078_20200518_points,
band_names=src.band.values.tolist())
print(df)
Extracting time series images by point geometry#
We can also easily extract a time series of raster images. Extracted pixel values are provided in ‘wide’ format with appropriate labels, for instance the column ‘t2_blue’ would be the blue band for the second time period
from geowombat.data import l8_224078_20200518, l8_224078_20200518_points
with gw.config.update(sensor='bgr'):
with gw.open([l8_224078_20200518, l8_224078_20200518],
time_names=['t1', 't2'],
stack_dim='time') as src:
# Extract and by point geometry
df = src.gw.extract(l8_224078_20200518_points)
print(df)
Extracting data by polygon geometry#
To extract values within polygons, use the same geowombat.extract
function.
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
with gw.config.update(sensor='bgr'):
with gw.open(l8_224078_20200518) as src:
df = src.gw.extract(l8_224078_20200518_polygons,
band_names=src.band.values.tolist())
print(df)
Calculate mean pixel value by polygon#
It is simple then to calculate the mean value of pixels within each polygon by using the polygon id
column and pandas groupby function. You can easily calculate other statistics like min, max, median etc.
from geowombat.data import l8_224078_20200518, l8_224078_20200518_polygons
with gw.config.update(sensor='bgr'):
with gw.open(l8_224078_20200518) as src:
df = src.gw.extract(l8_224078_20200518_polygons,
band_names=src.band.values.tolist())
# use pandas groupby to calc pixel mean
df.drop(columns=['geometry'], inplace=True)
df = df.groupby(['id', 'name']).mean()
print(df)