Xarray Tutorial¶

Overview¶

teaching: 30 minutes
exercises: 0
questions:
- What is xarray designed to do?
- How do I create an xarray Dataset?
- How does indexing work with xarray?
- How do I run computations on xarray datasets?
- What are ways to ploy xarray datasets?

Table of contents¶

Xarray primer
Creating data
Loading data
Selecting data
Basic computations
Advanced computations
ENSO excercise

Xarray primer¶

We’ve seen that Pandas and Geopandas are excellent libraries for analyzing tabular “labeled data”. Xarray is designed to make it easier to work with with labeled multidimensional data. By multidimensional data (also often called N-dimensional), we mean data with many independent dimensions or axes. For example, we might represent Earth’s surface temperature \(T\) as a three dimensional variable

\[T(x, y, t)\]

where \(x\) and \(y\) are spatial dimensions and and \(t\) is time. By labeled, we mean data that has metadata associated with it describing the names and relationships between the variables. The cartoon below shows a “data cube” schematic dataset with temperature and preciptation sharing the same three dimensions, plus longitude and latitude as auxilliary coordinates.

xarray data model

Basic Computations¶

When we perform mathematical manipulations of xarray DataArrays, the coordinates come along for the ride. Imagine we want to calcuate

\[g = f^2 + 1\]

We can apply familiar numpy operations to xarray objects.

[12]:

da_g = da_f**2 + 1
da_g

[12]:

xarray.DataArray

x: 19

1.0 1.117 1.413 1.75 1.97 1.97 1.75 ... 1.97 1.97 1.75 1.413 1.117 1.0

array([1.        , 1.11697778, 1.41317591, 1.75      , 1.96984631,
       1.96984631, 1.75      , 1.41317591, 1.11697778, 1.        ,
       1.11697778, 1.41317591, 1.75      , 1.96984631, 1.96984631,
       1.75      , 1.41317591, 1.11697778, 1.        ])

Coordinates: (1)

(x)

float64

-3.142 -2.793 ... 2.793 3.142

array([-3.141593, -2.792527, -2.443461, -2.094395, -1.745329, -1.396263,
       -1.047198, -0.698132, -0.349066,  0.      ,  0.349066,  0.698132,
        1.047198,  1.396263,  1.745329,  2.094395,  2.443461,  2.792527,
        3.141593])

Attributes: (0)

[13]:

da_g.plot()

[13]:

[<matplotlib.lines.Line2D at 0x7fc69bde8cd0>]

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_24_1.png

Multidimensional Indexing¶

In this example, we take advantage of the fact that xarray understands time to select a particular date

[19]:

sst.sel(time='1960-06-15').plot(vmin=-2, vmax=30)

[19]:

<matplotlib.collections.QuadMesh at 0x7fc69b0ffd10>

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_33_1.png

But we can select along any axis

[20]:

sst.sel(lon=180).transpose().plot()

[20]:

<matplotlib.collections.QuadMesh at 0x7fc69b0c9ed0>

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_35_1.png

[21]:

sst.sel(lon=180, lat=40).plot()

[21]:

[<matplotlib.lines.Line2D at 0x7fc69af6c250>]

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_36_1.png

Label-Based Reduction Operations¶

Usually the process of data analysis involves going from a big, multidimensional dataset to a few concise figures. Inevitably, the data must be “reduced” somehow. Examples of simple reduction operations include:

Mean
Standard Deviation
Minimum
Maximum

etc. Xarray supports all of these and more, via a familiar numpy-like syntax. But with xarray, you can specify the reductions by dimension.

First we start with the default, reduction over all dimensions:

[22]:

sst.mean()

[22]:

xarray.DataArray

'sst'

13.626506
```
array(13.626506, dtype=float32)
```
Coordinates: (0)
Attributes: (0)

[23]:

sst_time_mean = sst.mean(dim='time')
sst_time_mean.plot(vmin=-2, vmax=30)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/core/nanops.py:142: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis=axis, dtype=dtype)

[23]:

<matplotlib.collections.QuadMesh at 0x7fc69ae94f10>

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_39_2.png

[24]:

sst_zonal_mean = sst.mean(dim='lon')
sst_zonal_mean.transpose().plot()

[24]:

<matplotlib.collections.QuadMesh at 0x7fc69aee1050>

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_40_1.png

[25]:

sst_time_and_zonal_mean = sst.mean(dim=('time', 'lon'))
sst_time_and_zonal_mean.plot()

[25]:

[<matplotlib.lines.Line2D at 0x7fc69be2a150>]

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_41_1.png

[26]:

# some might prefer to have lat on the y axis
sst_time_and_zonal_mean.plot(y='lat')

[26]:

[<matplotlib.lines.Line2D at 0x7fc69aee1fd0>]

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_42_1.png

More Complicated Example: Weighted Mean¶

The means we calculated above were “naive”; they were straightforward numerical means over the different dimensions of the dataset. They did not account, for example, for spherical geometry of the globe and the necessary weighting factors. Although xarray is very useful for geospatial analysis, it has no built-in understanding of geography.

Below we show how to create a proper weighted mean by using the formula for the area element in spherical coordinates. This is a good illustration of several xarray concepts.

The area element for lat-lon coordinates is

\[\delta A = R^2 \delta \phi \delta \lambda \cos(\phi)\]

where \(\phi\) is latitude, \(\delta \phi\) is the spacing of the points in latitude, \(\delta \lambda\) is the spacing of the points in longitude, and \(R\) is Earth’s radius. (In this formula, \(\phi\) and \(\lambda\) are measured in radians.) Let’s use xarray to create the weight factor.

[27]:

R = 6.37e6
# we know already that the spacing of the points is one degree latitude
dϕ = np.deg2rad(1.)
dλ = np.deg2rad(1.)
dA = R**2 * dϕ * dλ * np.cos(np.deg2rad(ds.lat))
dA.plot()

[27]:

[<matplotlib.lines.Line2D at 0x7fc69ac60cd0>]

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_44_1.png

[28]:

dA.where(sst[0].notnull())

[28]:

xarray.DataArray

'lat'

lat: 89
lon: 180

nan nan nan nan ... 431372740.0 431372740.0 431372740.0 431372740.0

array([[          nan,           nan,           nan, ...,           nan,
                  nan,           nan],
       [          nan,           nan,           nan, ...,           nan,
                  nan,           nan],
       [          nan,           nan,           nan, ...,           nan,
                  nan,           nan],
       ...,
       [1.2920163e+09, 1.2920163e+09, 1.2920163e+09, ..., 1.2920163e+09,
        1.2920163e+09, 1.2920163e+09],
       [8.6222048e+08, 8.6222048e+08, 8.6222048e+08, ..., 8.6222048e+08,
        8.6222048e+08, 8.6222048e+08],
       [4.3137274e+08, 4.3137274e+08, 4.3137274e+08, ..., 4.3137274e+08,
        4.3137274e+08, 4.3137274e+08]], dtype=float32)

Coordinates: (3)

lat

(lat)

float32

-88.0 -86.0 -84.0 ... 86.0 88.0

standard_name :: latitude
pointwidth :: 2.0
gridtype :: 0
units :: degree_north

array([-88., -86., -84., -82., -80., -78., -76., -74., -72., -70., -68., -66.,
       -64., -62., -60., -58., -56., -54., -52., -50., -48., -46., -44., -42.,
       -40., -38., -36., -34., -32., -30., -28., -26., -24., -22., -20., -18.,
       -16., -14., -12., -10.,  -8.,  -6.,  -4.,  -2.,   0.,   2.,   4.,   6.,
         8.,  10.,  12.,  14.,  16.,  18.,  20.,  22.,  24.,  26.,  28.,  30.,
        32.,  34.,  36.,  38.,  40.,  42.,  44.,  46.,  48.,  50.,  52.,  54.,
        56.,  58.,  60.,  62.,  64.,  66.,  68.,  70.,  72.,  74.,  76.,  78.,
        80.,  82.,  84.,  86.,  88.], dtype=float32)

lon

(lon)

float32

0.0 2.0 4.0 ... 354.0 356.0 358.0

standard_name :: longitude
pointwidth :: 2.0
gridtype :: 1
units :: degree_east

array([  0.,   2.,   4.,   6.,   8.,  10.,  12.,  14.,  16.,  18.,  20.,  22.,
        24.,  26.,  28.,  30.,  32.,  34.,  36.,  38.,  40.,  42.,  44.,  46.,
        48.,  50.,  52.,  54.,  56.,  58.,  60.,  62.,  64.,  66.,  68.,  70.,
        72.,  74.,  76.,  78.,  80.,  82.,  84.,  86.,  88.,  90.,  92.,  94.,
        96.,  98., 100., 102., 104., 106., 108., 110., 112., 114., 116., 118.,
       120., 122., 124., 126., 128., 130., 132., 134., 136., 138., 140., 142.,
       144., 146., 148., 150., 152., 154., 156., 158., 160., 162., 164., 166.,
       168., 170., 172., 174., 176., 178., 180., 182., 184., 186., 188., 190.,
       192., 194., 196., 198., 200., 202., 204., 206., 208., 210., 212., 214.,
       216., 218., 220., 222., 224., 226., 228., 230., 232., 234., 236., 238.,
       240., 242., 244., 246., 248., 250., 252., 254., 256., 258., 260., 262.,
       264., 266., 268., 270., 272., 274., 276., 278., 280., 282., 284., 286.,
       288., 290., 292., 294., 296., 298., 300., 302., 304., 306., 308., 310.,
       312., 314., 316., 318., 320., 322., 324., 326., 328., 330., 332., 334.,
       336., 338., 340., 342., 344., 346., 348., 350., 352., 354., 356., 358.],
      dtype=float32)

time

()

datetime64[ns]

1960-01-15

array('1960-01-15T00:00:00.000000000', dtype='datetime64[ns]')

Attributes: (0)

[29]:

pixel_area = dA.where(sst[0].notnull())
pixel_area.plot()

[29]:

<matplotlib.collections.QuadMesh at 0x7fc69ac23190>

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_46_1.png

[30]:

total_ocean_area = pixel_area.sum(dim=('lon', 'lat'))
sst_weighted_mean = (sst * pixel_area).sum(dim=('lon', 'lat')) / total_ocean_area
sst_weighted_mean.plot()

[30]:

[<matplotlib.lines.Line2D at 0x7fc69ad11150>]

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_47_1.png

Maps¶

Xarray integrates with cartopy to enable you to plot your data on a map

[31]:

import matplotlib.pyplot as plt
import cartopy.crs as ccrs

plt.figure(figsize=(12, 8))
ax = plt.axes(projection=ccrs.InterruptedGoodeHomolosine())
ax.coastlines()

sst[0].plot(transform=ccrs.PlateCarree(), vmin=-2, vmax=30,
            cbar_kwargs={'shrink': 0.4})

[31]:

<matplotlib.collections.QuadMesh at 0x7fc6949a3f90>

/srv/conda/envs/notebook/lib/python3.7/site-packages/cartopy/io/__init__.py:260: DownloadWarning: Downloading: http://naciscdn.org/naturalearth/110m/physical/ne_110m_coastline.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_49_2.png

IV. Opening Many Files¶

One of the killer features of xarray is its ability to open many files into a single dataset. We do this with the open_mfdataset function.

[32]:

ds_all = xr.open_mfdataset('./tutorial-data/sst/*nc', combine='by_coords')
ds_all

[32]:

xarray.Dataset

Dimensions:
- lat: 89
- lon: 180
- time: 684

Coordinates: (3)

lon

(lon)

float32

0.0 2.0 4.0 ... 354.0 356.0 358.0

standard_name :: longitude
pointwidth :: 2.0
gridtype :: 1
units :: degree_east

array([  0.,   2.,   4.,   6.,   8.,  10.,  12.,  14.,  16.,  18.,  20.,  22.,
        24.,  26.,  28.,  30.,  32.,  34.,  36.,  38.,  40.,  42.,  44.,  46.,
        48.,  50.,  52.,  54.,  56.,  58.,  60.,  62.,  64.,  66.,  68.,  70.,
        72.,  74.,  76.,  78.,  80.,  82.,  84.,  86.,  88.,  90.,  92.,  94.,
        96.,  98., 100., 102., 104., 106., 108., 110., 112., 114., 116., 118.,
       120., 122., 124., 126., 128., 130., 132., 134., 136., 138., 140., 142.,
       144., 146., 148., 150., 152., 154., 156., 158., 160., 162., 164., 166.,
       168., 170., 172., 174., 176., 178., 180., 182., 184., 186., 188., 190.,
       192., 194., 196., 198., 200., 202., 204., 206., 208., 210., 212., 214.,
       216., 218., 220., 222., 224., 226., 228., 230., 232., 234., 236., 238.,
       240., 242., 244., 246., 248., 250., 252., 254., 256., 258., 260., 262.,
       264., 266., 268., 270., 272., 274., 276., 278., 280., 282., 284., 286.,
       288., 290., 292., 294., 296., 298., 300., 302., 304., 306., 308., 310.,
       312., 314., 316., 318., 320., 322., 324., 326., 328., 330., 332., 334.,
       336., 338., 340., 342., 344., 346., 348., 350., 352., 354., 356., 358.],
      dtype=float32)

lat

(lat)

float32

-88.0 -86.0 -84.0 ... 86.0 88.0

standard_name :: latitude
pointwidth :: 2.0
gridtype :: 0
units :: degree_north

array([-88., -86., -84., -82., -80., -78., -76., -74., -72., -70., -68., -66.,
       -64., -62., -60., -58., -56., -54., -52., -50., -48., -46., -44., -42.,
       -40., -38., -36., -34., -32., -30., -28., -26., -24., -22., -20., -18.,
       -16., -14., -12., -10.,  -8.,  -6.,  -4.,  -2.,   0.,   2.,   4.,   6.,
         8.,  10.,  12.,  14.,  16.,  18.,  20.,  22.,  24.,  26.,  28.,  30.,
        32.,  34.,  36.,  38.,  40.,  42.,  44.,  46.,  48.,  50.,  52.,  54.,
        56.,  58.,  60.,  62.,  64.,  66.,  68.,  70.,  72.,  74.,  76.,  78.,
        80.,  82.,  84.,  86.,  88.], dtype=float32)

time

(time)

datetime64[ns]

1960-01-15 ... 2016-12-15

array(['1960-01-15T00:00:00.000000000', '1960-02-15T00:00:00.000000000',
       '1960-03-15T00:00:00.000000000', ..., '2016-10-15T00:00:00.000000000',
       '2016-11-15T00:00:00.000000000', '2016-12-15T00:00:00.000000000'],
      dtype='datetime64[ns]')

Data variables: (1)

sst

(time, lat, lon)

float32

dask.array<chunksize=(12, 89, 180), meta=np.ndarray>

pointwidth :: 1.0
valid_min :: -3.0
valid_max :: 45.0
units :: degree_Celsius
long_name :: Extended reconstructed sea surface temperature
standard_name :: sea_surface_temperature
iridl:hasSemantics :: iridl:SeaSurfaceTemperature





  
      Array  Chunk 
  
  
     Bytes  43.83 MB   768.96 kB 
     Shape  (684, 89, 180)   (12, 89, 180) 
     Count  171 Tasks  57 Chunks 
     Type  float32  numpy.ndarray

Attributes: (3)
Conventions :
IRIDL
source :
https://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCDC/.ERSST/.version3b/.sst/
history :
extracted and cleaned by Ryan Abernathey for Research Computing in Earth Science

Now we have 57 years of data instead of one!

V. Groupby¶

Now that we have a bigger dataset, this is a good time to check out xarray’s groupby capabilities.

[33]:

sst_clim = ds_all.sst.groupby('time.month').mean(dim='time')
sst_clim

[33]:

xarray.DataArray

'sst'

month: 12
lat: 89
lon: 180

dask.array<chunksize=(1, 89, 180), meta=np.ndarray>





  
      Array  Chunk 
  
  
     Bytes  768.96 kB   64.08 kB 
     Shape  (12, 89, 180)   (1, 89, 180) 
     Count  1791 Tasks  12 Chunks 
     Type  float32  numpy.ndarray

Coordinates: (3)

lon

(lon)

float32

0.0 2.0 4.0 ... 354.0 356.0 358.0

standard_name :: longitude
pointwidth :: 2.0
gridtype :: 1
units :: degree_east

array([  0.,   2.,   4.,   6.,   8.,  10.,  12.,  14.,  16.,  18.,  20.,  22.,
        24.,  26.,  28.,  30.,  32.,  34.,  36.,  38.,  40.,  42.,  44.,  46.,
        48.,  50.,  52.,  54.,  56.,  58.,  60.,  62.,  64.,  66.,  68.,  70.,
        72.,  74.,  76.,  78.,  80.,  82.,  84.,  86.,  88.,  90.,  92.,  94.,
        96.,  98., 100., 102., 104., 106., 108., 110., 112., 114., 116., 118.,
       120., 122., 124., 126., 128., 130., 132., 134., 136., 138., 140., 142.,
       144., 146., 148., 150., 152., 154., 156., 158., 160., 162., 164., 166.,
       168., 170., 172., 174., 176., 178., 180., 182., 184., 186., 188., 190.,
       192., 194., 196., 198., 200., 202., 204., 206., 208., 210., 212., 214.,
       216., 218., 220., 222., 224., 226., 228., 230., 232., 234., 236., 238.,
       240., 242., 244., 246., 248., 250., 252., 254., 256., 258., 260., 262.,
       264., 266., 268., 270., 272., 274., 276., 278., 280., 282., 284., 286.,
       288., 290., 292., 294., 296., 298., 300., 302., 304., 306., 308., 310.,
       312., 314., 316., 318., 320., 322., 324., 326., 328., 330., 332., 334.,
       336., 338., 340., 342., 344., 346., 348., 350., 352., 354., 356., 358.],
      dtype=float32)

lat

(lat)

float32

-88.0 -86.0 -84.0 ... 86.0 88.0

standard_name :: latitude
pointwidth :: 2.0
gridtype :: 0
units :: degree_north

array([-88., -86., -84., -82., -80., -78., -76., -74., -72., -70., -68., -66.,
       -64., -62., -60., -58., -56., -54., -52., -50., -48., -46., -44., -42.,
       -40., -38., -36., -34., -32., -30., -28., -26., -24., -22., -20., -18.,
       -16., -14., -12., -10.,  -8.,  -6.,  -4.,  -2.,   0.,   2.,   4.,   6.,
         8.,  10.,  12.,  14.,  16.,  18.,  20.,  22.,  24.,  26.,  28.,  30.,
        32.,  34.,  36.,  38.,  40.,  42.,  44.,  46.,  48.,  50.,  52.,  54.,
        56.,  58.,  60.,  62.,  64.,  66.,  68.,  70.,  72.,  74.,  76.,  78.,
        80.,  82.,  84.,  86.,  88.], dtype=float32)

month

(month)

int64

1 2 3 4 5 6 7 8 9 10 11 12

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

Attributes: (0)

Now the data has dimension month instead of time! Each value represents the average among all of the Januaries, Februaries, etc. in the dataset.

[34]:

(sst_clim[6] - sst_clim[0]).plot()
plt.title('June minus July SST Climatology')

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/array/numpy_compat.py:40: RuntimeWarning: invalid value encountered in true_divide
  x = np.divide(x1, x2, out)

[34]:

Text(0.5, 1.0, 'June minus July SST Climatology')

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_56_2.png

VI. Resample and Rolling¶

Resample is meant specifically to work with time data (data with a datetime64 variable as a dimension). It allows you to change the time-sampling frequency of your data.

Let’s illustrate by selecting a single point.

[35]:

sst_ts = ds_all.sst.sel(lon=300, lat=10)
sst_ts_annual = sst_ts.resample(time='A').mean(dim='time')
sst_ts_annual

[35]:

xarray.DataArray

'sst'

time: 57

dask.array<chunksize=(1,), meta=np.ndarray>





  
      Array  Chunk 
  
  
     Bytes  228 B   4 B 
     Shape  (57,)   (1,) 
     Count  456 Tasks  57 Chunks 
     Type  float32  numpy.ndarray

Coordinates: (3)

time

(time)

datetime64[ns]

1960-12-31 ... 2016-12-31

array(['1960-12-31T00:00:00.000000000', '1961-12-31T00:00:00.000000000',
       '1962-12-31T00:00:00.000000000', '1963-12-31T00:00:00.000000000',
       '1964-12-31T00:00:00.000000000', '1965-12-31T00:00:00.000000000',
       '1966-12-31T00:00:00.000000000', '1967-12-31T00:00:00.000000000',
       '1968-12-31T00:00:00.000000000', '1969-12-31T00:00:00.000000000',
       '1970-12-31T00:00:00.000000000', '1971-12-31T00:00:00.000000000',
       '1972-12-31T00:00:00.000000000', '1973-12-31T00:00:00.000000000',
       '1974-12-31T00:00:00.000000000', '1975-12-31T00:00:00.000000000',
       '1976-12-31T00:00:00.000000000', '1977-12-31T00:00:00.000000000',
       '1978-12-31T00:00:00.000000000', '1979-12-31T00:00:00.000000000',
       '1980-12-31T00:00:00.000000000', '1981-12-31T00:00:00.000000000',
       '1982-12-31T00:00:00.000000000', '1983-12-31T00:00:00.000000000',
       '1984-12-31T00:00:00.000000000', '1985-12-31T00:00:00.000000000',
       '1986-12-31T00:00:00.000000000', '1987-12-31T00:00:00.000000000',
       '1988-12-31T00:00:00.000000000', '1989-12-31T00:00:00.000000000',
       '1990-12-31T00:00:00.000000000', '1991-12-31T00:00:00.000000000',
       '1992-12-31T00:00:00.000000000', '1993-12-31T00:00:00.000000000',
       '1994-12-31T00:00:00.000000000', '1995-12-31T00:00:00.000000000',
       '1996-12-31T00:00:00.000000000', '1997-12-31T00:00:00.000000000',
       '1998-12-31T00:00:00.000000000', '1999-12-31T00:00:00.000000000',
       '2000-12-31T00:00:00.000000000', '2001-12-31T00:00:00.000000000',
       '2002-12-31T00:00:00.000000000', '2003-12-31T00:00:00.000000000',
       '2004-12-31T00:00:00.000000000', '2005-12-31T00:00:00.000000000',
       '2006-12-31T00:00:00.000000000', '2007-12-31T00:00:00.000000000',
       '2008-12-31T00:00:00.000000000', '2009-12-31T00:00:00.000000000',
       '2010-12-31T00:00:00.000000000', '2011-12-31T00:00:00.000000000',
       '2012-12-31T00:00:00.000000000', '2013-12-31T00:00:00.000000000',
       '2014-12-31T00:00:00.000000000', '2015-12-31T00:00:00.000000000',
       '2016-12-31T00:00:00.000000000'], dtype='datetime64[ns]')

lon
()
float32
300.0
standard_name :
longitude
pointwidth :
2.0
gridtype :
1
units :
degree_east
```
array(300., dtype=float32)
```
lat
()
float32
10.0
standard_name :
latitude
pointwidth :
2.0
gridtype :
0
units :
degree_north
```
array(10., dtype=float32)
```

Attributes: (0)

[36]:

sst_ts.plot()
sst_ts_annual.plot()

[36]:

[<matplotlib.lines.Line2D at 0x7fc69021b210>]

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_59_1.png

An alternative approach is a “running mean” over the time dimension. This can be accomplished with xarray’s .rolling operation.

[37]:

sst_ts_rolling = sst_ts.rolling(time=24, center=True).mean()
sst_ts_annual.plot(marker='o')
sst_ts_rolling.plot()

[37]:

[<matplotlib.lines.Line2D at 0x7fc690109d10>]

../../../_images/repos_pangeo-data_pangeo-tutorial-gallery_xarray_61_1.png

Finale: Calculate the ENSO Index¶

This page from NOAA explains how the El Niño Southern Oscillation index is calculated.

The Nino 3.4 region is defined as the region between +/- 5 deg. lat, 170 W - 120 W lon.
Warm or cold phases of the Oceanic Nino Index are defined by a five consecutive 3-month running mean of sea surface temperature (SST) anomalies in the Niño 3.4 region that is above (below) the threshold of +0.5°C (-0.5°C). This is known as the Oceanic Niño Index (ONI).

(Note that “anomaly” means that the seasonal cycle is removed.)

Try working on this on your own for 5 minutes.

[ ]:

Once you’re done, try comparing the ENSO Index you calculated with the NINO3.4 index published by NOAA. The pandas snippet below will load the official time series for comparison.

[38]:

import pandas as pd
noaa_nino34 = pd.read_csv('https://www.cpc.ncep.noaa.gov/data/indices/sstoi.indices',
                          sep=r" ", skipinitialspace=True,
                          parse_dates={'time': ['YR','MON']},
                          index_col='time')['NINO3.4']
noaa_nino34.head()

[38]:

time
1982-01-01    26.72
1982-02-01    26.70
1982-03-01    27.20
1982-04-01    28.02
1982-05-01    28.54
Name: NINO3.4, dtype: float64

Getting Help with Xarray¶

Here are some important resources for learning more about xarray and getting help.

Xarray Tutorial¶

Overview¶

Table of contents¶

Xarray primer¶

Xarray data structures¶

Creating data¶

Selecting Data¶

Basic Computations¶

Exercise¶

Multidimensional Data¶

Multidimensional Indexing¶

Label-Based Reduction Operations¶

More Complicated Example: Weighted Mean¶

Maps¶

IV. Opening Many Files¶

V. Groupby¶

VI. Resample and Rolling¶

Finale: Calculate the ENSO Index¶

Getting Help with Xarray¶