National Water ModelΒΆ
Compute and visualize the mean annual river discharge from the National Water Model v2 from 2.7 million rivers in minutes using Pangeo
[1]:
import xarray as xr
import fsspec
import numpy as np
[2]:
import hvplot.pandas
import hvplot.xarray
import geoviews as gv
from holoviews.operation.datashader import rasterize
import cartopy.crs as ccrs
[3]:
from dask.distributed import Client, progress
from dask_gateway import Gateway
gateway = Gateway()
[4]:
cluster = gateway.new_cluster()
[5]:
cluster.adapt(minimum=4, maximum=20);
[6]:
client = Client(cluster)
client
[6]:
Client
|
Cluster
|
[7]:
ds = xr.open_zarr(fsspec.get_mapper('s3://pangeo-data-uswest2/esip/NWM2/2017', anon=False, requester_pays=True))
[8]:
ds
[8]:
<xarray.Dataset> Dimensions: (feature_id: 2729077, time: 8760) Coordinates: * feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804 latitude (feature_id) float32 dask.array<chunksize=(209929,), meta=np.ndarray> longitude (feature_id) float32 dask.array<chunksize=(209929,), meta=np.ndarray> * time (time) datetime64[ns] 2017-01-01 ... 2017-12-31T23:00:00 Data variables: crs (time) |S1 dask.array<chunksize=(72,), meta=np.ndarray> elevation (time, feature_id) float32 dask.array<chunksize=(72, 209929), meta=np.ndarray> order (time, feature_id) int32 dask.array<chunksize=(72, 209929), meta=np.ndarray> qBtmVertRunoff (time, feature_id) float64 dask.array<chunksize=(72, 209929), meta=np.ndarray> qBucket (time, feature_id) float64 dask.array<chunksize=(72, 209929), meta=np.ndarray> qSfcLatRunoff (time, feature_id) float64 dask.array<chunksize=(72, 209929), meta=np.ndarray> q_lateral (time, feature_id) float64 dask.array<chunksize=(72, 209929), meta=np.ndarray> streamflow (time, feature_id) float64 dask.array<chunksize=(72, 209929), meta=np.ndarray> velocity (time, feature_id) float64 dask.array<chunksize=(72, 209929), meta=np.ndarray> Attributes: Conventions: CF-1.6 cdm_datatype: Station code_version: v5.1.0-alpha11 dev: dev_ prefix indicates development/internal me... dev_NOAH_TIMESTEP: 3600 dev_OVRTSWCRT: 1 dev_channelBucket_only: 0 dev_channel_only: 0 featureType: timeSeries model_configuration: retrospective model_initialization_time: 2016-10-01_00:00:00 model_output_type: channel_rt model_output_valid_time: 2017-01-01_00:00:00 model_total_valid_times: 2208 proj4: +proj=lcc +units=m +a=6370000.0 +b=6370000.0 ... station_dimension: feature_id stream_order_output: 1
xarray.Dataset
- feature_id: 2729077
- time: 8760
- feature_id(feature_id)int32101 179 ... 1180001803 1180001804
- cf_role :
- timeseries_id
- comment :
- Gage Points Specified by User in Routelink file
- long_name :
- Reach ID
array([ 101, 179, 181, ..., 1180001802, 1180001803, 1180001804], dtype=int32)
- latitude(feature_id)float32dask.array<chunksize=(209929,), meta=np.ndarray>
- long_name :
- Feature latitude
- standard_name :
- latitude
- units :
- degrees_north
Array Chunk Bytes 10.92 MB 839.72 kB Shape (2729077,) (209929,) Count 14 Tasks 13 Chunks Type float32 numpy.ndarray - longitude(feature_id)float32dask.array<chunksize=(209929,), meta=np.ndarray>
- long_name :
- Feature longitude
- standard_name :
- longitude
- units :
- degrees_east
Array Chunk Bytes 10.92 MB 839.72 kB Shape (2729077,) (209929,) Count 14 Tasks 13 Chunks Type float32 numpy.ndarray - time(time)datetime64[ns]2017-01-01 ... 2017-12-31T23:00:00
- long_name :
- valid output time
- standard_name :
- time
array(['2017-01-01T00:00:00.000000000', '2017-01-01T01:00:00.000000000', '2017-01-01T02:00:00.000000000', ..., '2017-12-31T21:00:00.000000000', '2017-12-31T22:00:00.000000000', '2017-12-31T23:00:00.000000000'], dtype='datetime64[ns]')
- crs(time)|S1dask.array<chunksize=(72,), meta=np.ndarray>
- _CoordinateAxes :
- latitude longitude
- esri_pe_string :
- GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]];-400 -400 1000000000;-100000 10000;-100000 10000;8.98315284119521E-09;0.001;0.001;IsHighPrecision
- grid_mapping_name :
- latitude longitude
- inverse_flattening :
- 298.2572326660156
- long_name :
- CRS definition
- longitude_of_prime_meridian :
- 0.0
- semi_major_axis :
- 6378137.0
- semi_minor_axis :
- 6356752.5
- spatial_ref :
- GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]];-400 -400 1000000000;-100000 10000;-100000 10000;8.98315284119521E-09;0.001;0.001;IsHighPrecision
- transform_name :
- latitude longitude
Array Chunk Bytes 8.76 kB 72 B Shape (8760,) (72,) Count 123 Tasks 122 Chunks Type |S1 numpy.ndarray - elevation(time, feature_id)float32dask.array<chunksize=(72, 209929), meta=np.ndarray>
- long_name :
- Feature Elevation
- standard_name :
- Elevation
- units :
- meters
Array Chunk Bytes 95.63 GB 60.46 MB Shape (8760, 2729077) (72, 209929) Count 1587 Tasks 1586 Chunks Type float32 numpy.ndarray - order(time, feature_id)int32dask.array<chunksize=(72, 209929), meta=np.ndarray>
- long_name :
- Streamflow Order
- standard_name :
- order
Array Chunk Bytes 95.63 GB 60.46 MB Shape (8760, 2729077) (72, 209929) Count 1587 Tasks 1586 Chunks Type int32 numpy.ndarray - qBtmVertRunoff(time, feature_id)float64dask.array<chunksize=(72, 209929), meta=np.ndarray>
- grid_mapping :
- crs
- long_name :
- Runoff from bottom of soil to bucket
- units :
- m3
- valid_range :
- [0, 499999968]
Array Chunk Bytes 191.25 GB 120.92 MB Shape (8760, 2729077) (72, 209929) Count 1587 Tasks 1586 Chunks Type float64 numpy.ndarray - qBucket(time, feature_id)float64dask.array<chunksize=(72, 209929), meta=np.ndarray>
- grid_mapping :
- crs
- long_name :
- Flux from gw bucket
- units :
- m3 s-1
- valid_range :
- [0, -2147483648]
Array Chunk Bytes 191.25 GB 120.92 MB Shape (8760, 2729077) (72, 209929) Count 1587 Tasks 1586 Chunks Type float64 numpy.ndarray - qSfcLatRunoff(time, feature_id)float64dask.array<chunksize=(72, 209929), meta=np.ndarray>
- grid_mapping :
- crs
- long_name :
- Runoff from terrain routing
- units :
- m3 s-1
- valid_range :
- [0, -2147483648]
Array Chunk Bytes 191.25 GB 120.92 MB Shape (8760, 2729077) (72, 209929) Count 1587 Tasks 1586 Chunks Type float64 numpy.ndarray - q_lateral(time, feature_id)float64dask.array<chunksize=(72, 209929), meta=np.ndarray>
- grid_mapping :
- crs
- long_name :
- Runoff into channel reach
- units :
- m3 s-1
- valid_range :
- [0, 5000000]
Array Chunk Bytes 191.25 GB 120.92 MB Shape (8760, 2729077) (72, 209929) Count 1587 Tasks 1586 Chunks Type float64 numpy.ndarray - streamflow(time, feature_id)float64dask.array<chunksize=(72, 209929), meta=np.ndarray>
- grid_mapping :
- crs
- long_name :
- River Flow
- units :
- m3 s-1
- valid_range :
- [0, 50000000]
Array Chunk Bytes 191.25 GB 120.92 MB Shape (8760, 2729077) (72, 209929) Count 1587 Tasks 1586 Chunks Type float64 numpy.ndarray - velocity(time, feature_id)float64dask.array<chunksize=(72, 209929), meta=np.ndarray>
- grid_mapping :
- crs
- long_name :
- River Velocity
- units :
- ms-1
- valid_range :
- [0, 50000000]
Array Chunk Bytes 191.25 GB 120.92 MB Shape (8760, 2729077) (72, 209929) Count 1587 Tasks 1586 Chunks Type float64 numpy.ndarray
- Conventions :
- CF-1.6
- cdm_datatype :
- Station
- code_version :
- v5.1.0-alpha11
- dev :
- dev_ prefix indicates development/internal meta data
- dev_NOAH_TIMESTEP :
- 3600
- dev_OVRTSWCRT :
- 1
- dev_channelBucket_only :
- 0
- dev_channel_only :
- 0
- featureType :
- timeSeries
- model_configuration :
- retrospective
- model_initialization_time :
- 2016-10-01_00:00:00
- model_output_type :
- channel_rt
- model_output_valid_time :
- 2017-01-01_00:00:00
- model_total_valid_times :
- 2208
- proj4 :
- +proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0 +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +nadgrids=@
- station_dimension :
- feature_id
- stream_order_output :
- 1
[9]:
with fsspec.open('s3://pangeo-data-uswest2/esip/NWM2/nwm-v1.2-channel_spatial_index.nc', anon=False, requester_pays=True) as f:
ds_lonlat = xr.open_dataset(f)
lat = ds_lonlat['latitude'].values
lon = ds_lonlat['longitude'].values
[10]:
print(lat.max(), lon.max())
52.86352 -66.99203
Letβs find the site with the largest streamflow on June 1
[11]:
imax = ds.streamflow.sel(time='2017-06-01 00:00:00').argmax().values
Letβs plot the whole year-long time series at that location
[12]:
%%time
ds.streamflow[:,imax].hvplot()
CPU times: user 84 ms, sys: 12.6 ms, total: 96.6 ms
Wall time: 8.95 s
[12]:
[13]:
var='streamflow'
[14]:
ds[var].nbytes/1e9
[14]:
191.25371616
[15]:
var_mean = ds[var].mean(dim='time').persist()
progress(var_mean)
[16]:
df = var_mean.to_pandas().to_frame()
[17]:
df = df.assign(latitude=lat)
df = df.assign(longitude=lon)
df.rename(columns={0: "transport"}, inplace=True)
[18]:
p = df.hvplot.points('longitude', 'latitude', crs=ccrs.PlateCarree(),
c='transport', colorbar=True, size=14)
[19]:
g = rasterize(p, aggregator='mean', x_sampling=0.02, y_sampling=0.02, width=500).opts(tools=['hover'],
aspect='equal', logz=True, clim=(1e-2, np.nan))
[20]:
g * gv.tile_sources.OSM
[20]: