Load CMIP6 Data with Intake ESM

Intake ESM is an experimental new package that aims to provide a higher-level interface to searching and loading Earth System Model data archives, such as CMIP6. The packages is under very active development, and features may be unstable. Please report any issues or suggestions on github.

[1]:
import xarray as xr
xr.set_options(display_style='html')
import intake
%matplotlib inline

Intake ESM works by parsing an ESM Collection Spec and converting it to an intake catalog. The collection spec is stored in a .json file. Here we open it using intake.

[2]:
cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col
/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3417: DtypeWarning: Columns (10) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)

pangeo-cmip6 catalog with 4749 dataset(s) from 294376 asset(s):

unique
activity_id 15
institution_id 34
source_id 79
experiment_id 107
member_id 213
table_id 30
variable_id 392
grid_label 10
zstore 294376
dcpp_init_year 60
version 529

We can now use intake methods to search the collection, and, if desired, export a pandas dataframe.

[3]:
cat = col.search(experiment_id=['historical', 'ssp585'], table_id='Oyr', variable_id='o2',
                 grid_label='gn')
cat.df
[3]:
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 CMIP CCCma CanESM5-CanOE historical r1i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN 20190429
1 CMIP CCCma CanESM5-CanOE historical r2i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN 20190429
2 CMIP CCCma CanESM5-CanOE historical r3i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN 20190429
3 CMIP CCCma CanESM5 historical r10i1p1f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1... NaN 20190429
4 CMIP CCCma CanESM5 historical r10i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1... NaN 20190429
... ... ... ... ... ... ... ... ... ... ... ...
133 ScenarioMIP IPSL IPSL-CM6A-LR ssp585 r4i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/IPSL/IPSL-CM6A-LR/ssp58... NaN 20191122
134 ScenarioMIP IPSL IPSL-CM6A-LR ssp585 r6i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/IPSL/IPSL-CM6A-LR/ssp58... NaN 20191121
135 ScenarioMIP MIROC MIROC-ES2L ssp585 r1i1p1f2 Oyr o2 gn gs://cmip6/ScenarioMIP/MIROC/MIROC-ES2L/ssp585... NaN 20190823
136 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp585 r10i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp... NaN 20190710
137 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp585 r1i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp... NaN 20190710

138 rows × 11 columns

Intake knows how to automatically open the datasets using xarray. Furthermore, intake esm contains special logic to concatenate and merge the individual results of our query into larger, more high-level aggregated xarray datasets.

[4]:
dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})
list(dset_dict.keys())

--> The keys in the returned dictionary of datasets are constructed as follows:
        'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
100.00% [18/18 00:05<00:00]
[4]:
['ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Oyr.gn',
 'ScenarioMIP.CCCma.CanESM5-CanOE.ssp585.Oyr.gn',
 'CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.Oyr.gn',
 'CMIP.MPI-M.MPI-ESM1-2-LR.historical.Oyr.gn',
 'CMIP.CSIRO.ACCESS-ESM1-5.historical.Oyr.gn',
 'ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.Oyr.gn',
 'ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Oyr.gn',
 'CMIP.NCC.NorESM2-MM.historical.Oyr.gn',
 'ScenarioMIP.MIROC.MIROC-ES2L.ssp585.Oyr.gn',
 'CMIP.CCCma.CanESM5-CanOE.historical.Oyr.gn',
 'CMIP.MIROC.MIROC-ES2L.historical.Oyr.gn',
 'ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp585.Oyr.gn',
 'CMIP.NCC.NorESM2-LM.historical.Oyr.gn',
 'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp585.Oyr.gn',
 'CMIP.CCCma.CanESM5.historical.Oyr.gn',
 'ScenarioMIP.CCCma.CanESM5.ssp585.Oyr.gn',
 'CMIP.MPI-M.MPI-ESM1-2-HR.historical.Oyr.gn',
 'CMIP.IPSL.IPSL-CM6A-LR.historical.Oyr.gn']
[5]:
ds = dset_dict['CMIP.CCCma.CanESM5.historical.Oyr.gn']
ds
[5]:
<xarray.Dataset>
Dimensions:             (bnds: 2, i: 360, j: 291, lev: 45, member_id: 35, time: 165, vertices: 4)
Coordinates:
  * i                   (i) int32 0 1 2 3 4 5 6 ... 353 354 355 356 357 358 359
  * j                   (j) int32 0 1 2 3 4 5 6 ... 284 285 286 287 288 289 290
    latitude            (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
  * lev                 (lev) float64 3.047 9.454 16.36 ... 5.375e+03 5.625e+03
    lev_bnds            (lev, bnds) float64 dask.array<chunksize=(45, 2), meta=np.ndarray>
    longitude           (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
  * time                (time) object 1850-07-02 12:00:00 ... 2014-07-02 12:0...
    time_bnds           (time, bnds) object dask.array<chunksize=(165, 2), meta=np.ndarray>
  * member_id           (member_id) <U9 'r10i1p1f1' 'r10i1p2f1' ... 'r9i1p2f1'
Dimensions without coordinates: bnds, vertices
Data variables:
    o2                  (member_id, time, lev, j, i) float32 dask.array<chunksize=(1, 12, 45, 291, 360), meta=np.ndarray>
    vertices_latitude   (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray>
    vertices_longitude  (j, i, vertices) float64 dask.array<chunksize=(291, 360, 4), meta=np.ndarray>
Attributes:
    variant_label:               r9i1p2f1
    branch_method:               Spin-up documentation
    source:                      CanESM5 (2019): \naerosol: interactive\natmo...
    sub_experiment_id:           none
    cmor_version:                3.4.0
    institution_id:              CCCma
    experiment:                  all-forcing simulation of the recent past
    mip_era:                     CMIP6
    parent_source_id:            CanESM5
    parent_activity_id:          CMIP
    nominal_resolution:          100 km
    parent_time_units:           days since 1850-01-01 0:0:0.0
    source_type:                 AOGCM
    branch_time_in_child:        0.0
    activity_id:                 CMIP
    grid_label:                  gn
    experiment_id:               historical
    grid:                        ORCA1 tripolar grid, 1 deg with refinement t...
    forcing_index:               1
    CCCma_model_hash:            Unknown
    source_id:                   CanESM5
    YMDH_branch_time_in_child:   1850:01:01:00
    external_variables:          areacello volcello
    references:                  Geophysical Model Development Special issue ...
    CCCma_parent_runid:          p2-pictrl
    realm:                       ocnBgchem
    product:                     model-output
    institution:                 Canadian Centre for Climate Modelling and An...
    table_id:                    Oyr
    realization_index:           9
    YMDH_branch_time_in_parent:  5950:01:01:00
    frequency:                   yr
    creation_date:               2019-05-30T08:58:45Z
    title:                       CanESM5 output prepared for CMIP6
    Conventions:                 CF-1.7 CMIP-6.2
    status:                      2019-10-25;created;by nhn2@columbia.edu
    CCCma_runid:                 p2-his09
    parent_mip_era:              CMIP6
    data_specs_version:          01.00.29
    parent_experiment_id:        piControl
    version:                     v20190429
    license:                     CMIP6 model data produced by The Government ...
    variable_id:                 o2
    further_info_url:            https://furtherinfo.es-doc.org/CMIP6.CCCma.C...
    history:                     2019-05-02T13:53:53Z ;rewrote data to be con...
    sub_experiment:              none
    tracking_id:                 hdl:21.14100/41426118-701c-482b-ae16-82932e4...
    contact:                     ec.cccma.info-info.ccmac.ec@canada.ca
    branch_time_in_parent:       1496500.0
    initialization_index:        1
    intake_esm_varname:          ['o2']
    table_info:                  Creation Date:(20 February 2019) MD5:374fbe5...
    intake_esm_dataset_key:      CMIP.CCCma.CanESM5.historical.Oyr.gn
[ ]: