Learning Objectives
Creating raster data with Rasterio
Creating raster data with Rasterio
Reading & Writing Rasters with Rasterio#
In order to work with raster data we will be using rasterio
and later geowombat
. Behind the scenes a numpy.ndarray
does all the heavy lifting. To understand how raster works it helps to construct one from scratch.
Here we create two ndarray
objects one X
spans [-90°,90°] longitude, and Y
covers [-90°,90°] latitude.
import numpy as np
x = np.linspace(-90, 90, 6)
y = np.linspace(90, -90, 6)
X, Y = np.meshgrid(x, y)
X
array([[-90., -54., -18., 18., 54., 90.],
[-90., -54., -18., 18., 54., 90.],
[-90., -54., -18., 18., 54., 90.],
[-90., -54., -18., 18., 54., 90.],
[-90., -54., -18., 18., 54., 90.],
[-90., -54., -18., 18., 54., 90.]])
Y
array([[ 90., 90., 90., 90., 90., 90.],
[ 54., 54., 54., 54., 54., 54.],
[ 18., 18., 18., 18., 18., 18.],
[-18., -18., -18., -18., -18., -18.],
[-54., -54., -54., -54., -54., -54.],
[-90., -90., -90., -90., -90., -90.]])
Let’s generate some data representing temperature and store it an array Z
import matplotlib.pyplot as plt
Z1 = np.abs(((X - 10) ** 2 + (Y - 10) ** 2) / 1 ** 2)
Z2 = np.abs(((X + 10) ** 2 + (Y + 10) ** 2) / 2.5 ** 2)
Z = (Z1 - Z2)
plt.imshow(Z)
plt.title("Temperature")
plt.show()
Creating Raster Data#
The final array Z
still lacks a number of elements that transform it from being a non-spatial numpy
array to a spatial one usable by rasterio
etc. Rasterio
requires the following elements to write out a geotif, or spatial raster dataset:
Parameters |
Description |
Argument |
---|---|---|
driver |
the name of the desired format driver |
|
width |
the number of columns of the dataset |
|
height |
the number of rows of the dataset |
|
count |
a count of the dataset bands |
|
dtype |
the data type of the dataset |
|
crs |
a coordinate reference system identifier or description |
|
transform |
an affine transformation matrix |
|
nodata |
a “nodata” value |
|
Note
transform
defines the location of the upper left hand corner of the raster on the globe, and its spatial resolution. The arguments for transform
are complex and beyond the scope of the chapter, please refer to the next chapter @ affine transforms and raster crs for more info.
Writing Rasters#
To save this array along with spatial information to a file, we call rasterio.open()
with a path to the new file to be created, and ‘w’ to specify writing mode, along with the arguments above.
import rasterio
from rasterio.transform import Affine
xres = (x[-1] - x[0]) / len(x)
yres = (y[-1] - y[0]) / len(y)
transform = Affine.translation(x[0] - xres / 2, y[0] - yres / 2) * Affine.scale(xres, yres)
with rasterio.open(
"../temp/temperature.tif",
mode="w",
driver="GTiff",
height=Z.shape[0],
width=Z.shape[1],
count=1,
dtype=Z.dtype,
crs="+proj=latlong",
transform=transform,
) as new_dataset:
new_dataset.write(Z, 1)
When we go to look again at our data we can read it back in using open
. Before plotting we raster.read(1)
the first band, which converts the data back to a numpy
array that can be plotted in matplotlib
. Notice that we also specify the band number (in this case 1), with .write(Z, 1)
.
import matplotlib.pyplot as plt
raster = rasterio.open("../temp/temperature.tif")
plt.imshow(raster.read(1), cmap="BrBG")
plt.title("Temperature")
plt.show()
So in summary, a spatial dataset is essentially just a numpy.ndarray
with information about the location and resolution of the array, the coordinate reference system, and number of bands. This information is typically accessed and updated via the .profile
.
print(raster.profile)
{'driver': 'GTiff', 'dtype': 'float64', 'nodata': None, 'width': 6, 'height': 6, 'count': 1, 'crs': CRS.from_epsg(4326), 'transform': Affine(30.0, 0.0, -105.0,
0.0, -30.0, 105.0), 'blockysize': 6, 'tiled': False, 'interleave': 'band'}
Update Raster Metadata#
Notice the the .profile
above is missing a meaningful nodata
value and was uncompressed. Let’s learn how to update these values. For this we can update the .profile
dictionary.
# start with the original profile
profile = raster.profile
# update values for nodata and compression type
profile.update(nodata=0, compress="lzw")
print(profile)
{'driver': 'GTiff', 'dtype': 'float64', 'nodata': 0, 'width': 6, 'height': 6, 'count': 1, 'crs': CRS.from_epsg(4326), 'transform': Affine(30.0, 0.0, -105.0,
0.0, -30.0, 105.0), 'blockysize': 6, 'tiled': False, 'interleave': 'band', 'compress': 'lzw'}
Now we just need to write the array of data (obtained from raster.read(1)
) out with the updated profile info. We can unpack all the dictionary values from the profile using **profile
.
with rasterio.open("../temp/temperature.tif",
mode="w",
**profile,) as update_dataset:
update_dataset.write(raster.read(1), 1)
Rasterio Multiband Rasters#
Working with multiband imagery starts to get a bit tricky, especially with rasterio
alone.
Let’s start with a problematic raster file, a landsat image that stores its red, green, and blue bands in reverse order (blue, green, red), that is scaled by 100 (multiplied by 100 to store data as integers rather than float), and that is missing a meaningful nodata value.
with rasterio.open(
"../data/LC08_L1TP_224078_20200518_20200518_01_RT.TIF", mode="r", nodata=0) as src:
# read in the array, band 3 first, then band 2, then band 1
arr = src.read([3, 2, 1])
# the array has three bands
print("Array shape:", arr.shape)
# look at the profile, despite setting nodata=0, there still isn't a nodata value
# this is because we need to update the profile and write out a new image with
# nodata set
profile = src.profile
print(profile)
Array shape: (3, 1860, 2041)
{'driver': 'GTiff', 'dtype': 'uint16', 'nodata': None, 'width': 2041, 'height': 1860, 'count': 3, 'crs': CRS.from_epsg(32621), 'transform': Affine(30.0, 0.0, 717345.0,
0.0, -30.0, -2776995.0), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'lzw', 'interleave': 'pixel'}
Note
We use src.read([3, 2, 1])
in order to reverse the read order of our bands. With this the original blue, green, red order is read in reverse as red, green, blue, which is required for true color images.
To see what it looks like, let’s use rasterio’s show
function which sort of helps when viewing multiband imagery.
from rasterio.plot import show
print(arr)
print('--------------')
show(arr)
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [0..24147].
[[[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
...
[6672 6168 6100 ... 6028 6015 6008]
[6291 6457 6423 ... 6021 6001 6017]
[6181 6727 6747 ... 6098 6023 5991]]
[[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
...
[7083 6794 6731 ... 6760 6724 6685]
[6951 7079 7016 ... 6699 6687 6715]
[6864 7286 7338 ... 6817 6732 6659]]
[[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
...
[7692 7518 7513 ... 7440 7432 7415]
[7586 7590 7610 ... 7440 7411 7425]
[7576 7743 7770 ... 7464 7443 7406]]]
--------------
<Axes: >
Arg… it looks terrible. A few things, the primary problem is that the zeros aren’t being treated as missing data, and therefor messing everything up. Second, there is a message at the top of the image saying its clipping... [0..255] for integers)
.
Let’s check what the range of values are in our raster. To do this we will use scipy
stats
, and .ravel()
to convert our n
dimension array to a 1d
array.
from scipy import stats
print(stats.describe(arr.ravel()))
DescribeResult(nobs=11388780, minmax=(0, 24147), mean=6095.312502656123, variance=7725568.744020711, skewness=-1.6071105370358574, kurtosis=0.9303298625510847)
Since we currently have integer data that ranges from 0 to 24147, we should try scaling it, if we go to the documentation we would see that we need to divide all values by 100.
scaled_arr = arr / 100
print(scaled_arr.dtype)
float64
Then let’s write the data back out again with an updated profile. Notice that the datatype (dtype
) has changed since we divided by 100.
# update the profile for the new raster
profile = src.profile
profile.update(nodata=0, compress="lzw", dtype=scaled_arr.dtype)
# write out raster as RGB
with rasterio.open(
"../temp/LS_scaled_20200518.tif",
mode="w",
**profile, # unpack the profile arguments set above
) as new_dataset:
new_dataset.write(scaled_arr, [1, 2, 3])
ERROR 10: Pointer 'hObject' is NULL in 'GDALGetMetadata'.
ERROR 10: Pointer 'hObject' is NULL in 'GDALGetMetadata'.
ERROR 10: Pointer 'hObject' is NULL in 'GDALGetMetadata'.
Note
Since the bands have already been put in the right order in the array (ie. RGB), we can just write it out as follows .write(scaled_arr, [1, 2, 3])
.
There you go! A fixed raster. Try opening it in Qgis to make sure. We can look at it now using rasterio show
, but again it isn’t great, but better.
with rasterio.open("../temp/LS_scaled_20200518.tif", mode="r") as src:
show(src,adjust='linear')
To help make all of this easier and more intuitive we will be presenting the use of geowombat
for remote sensing applications later, start here.
Just as a preview, here’s how to do this in geowombat
.
import geowombat as gw
import matplotlib.pyplot as plt
fig, ax = plt.subplots(dpi=200)
# tell gw to read a blue green red
with gw.config.update(sensor="bgr"):
with gw.open("../data/LC08_L1TP_224078_20200518_20200518_01_RT.TIF") as src:
# see that bands names, blue green red are assigned
print(src.band)
# remove 0 value, rearrange band order
temp = src.where(src != 0).sel(band=["red", "green", "blue"])
# plot
temp.gw.imshow(robust=True, ax=ax)
#save to file
temp.gw.to_raster(
"../temp/LS_scaled_20200518.tif", verbose=0, n_workers=4, overwrite=True
)
/home/mmann1123/miniconda3/envs/pygis/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
<xarray.DataArray 'band' (band: 3)> Size: 60B
array(['blue', 'green', 'red'], dtype='<U5')
Coordinates:
* band (band) <U5 60B 'blue' 'green' 'red'
0%| | 0/50 [00:00<?, ?it/s]
22%|██████████████████████████████████▎ | 11/50 [00:00<00:00, 98.55it/s]
42%|█████████████████████████████████████████████████████████████████▌ | 21/50 [00:00<00:00, 88.16it/s]
60%|█████████████████████████████████████████████████████████████████████████████████████████████▌ | 30/50 [00:00<00:00, 78.79it/s]
76%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 38/50 [00:00<00:00, 74.27it/s]
92%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 46/50 [00:00<00:00, 71.27it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 75.17it/s]
0%| | 0/14 [00:00<?, ?it/s]
64%|████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 9/14 [00:00<00:00, 88.46it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 107.97it/s]
Credits: rasterio quickstart