Along Track Altimetry Analysis

[1]:
import fsspec
import xarray as xr
import numpy as np
import hvplot
import hvplot.dask
import hvplot.pandas
import hvplot.xarray

Load Data

The analysis ready along-track altimetry data were prepared by CNES. They are catalogged in the Pangeo Cloud Data Catalog here: https://catalog.pangeo.io/browse/master/ocean/altimetry/

We work with Jason 3.

[2]:
from intake import open_catalog
cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean/altimetry.yaml")
print(list(cat))
ds = cat['j3'].to_dask()
ds
['al', 'alg', 'c2', 'e1', 'e1g', 'e2', 'en', 'enn', 'g2', 'h2', 'j1', 'j1g', 'j1n', 'j2', 'j2g', 'j2n', 'j3', 's3a', 's3b', 'tp', 'tpn']
[2]:
<xarray.Dataset>
Dimensions:         (time: 53154815)
Coordinates:
    latitude        (time) float64 dask.array<chunksize=(53154815,), meta=np.ndarray>
    longitude       (time) float64 dask.array<chunksize=(53154815,), meta=np.ndarray>
  * time            (time) datetime64[ns] 2016-05-26T14:14:03.917554 ... 2019...
Data variables:
    cycle           (time) int16 dask.array<chunksize=(53154815,), meta=np.ndarray>
    dac             (time) float64 dask.array<chunksize=(53154815,), meta=np.ndarray>
    lwe             (time) float64 dask.array<chunksize=(53154815,), meta=np.ndarray>
    mdt             (time) float64 dask.array<chunksize=(53154815,), meta=np.ndarray>
    ocean_tide      (time) float64 dask.array<chunksize=(53154815,), meta=np.ndarray>
    sla_filtered    (time) float64 dask.array<chunksize=(53154815,), meta=np.ndarray>
    sla_unfiltered  (time) float64 dask.array<chunksize=(53154815,), meta=np.ndarray>
    track           (time) int16 dask.array<chunksize=(53154815,), meta=np.ndarray>
Attributes: (12/24)
    Conventions:               CF-1.6
    Metadata_Conventions:      Unidata Dataset Discovery v1.0
    cdm_data_type:             Swath
    comment:                   Sea surface height measured by altimeters refe...
    contact:                   servicedesk.cmems@mercator-ocean.eu
    creator_email:             servicedesk.cmems@mercator-ocean.eu
    ...                        ...
    software_version:          6.2_DUACS_DT2018_baseline
    source:                    Jason-3 measurements
    ssalto_duacs_comment:      The reference mission used for the altimeter i...
    standard_name_vocabulary:  NetCDF Climate and Forecast (CF) Metadata Conv...
    summary:                   SSALTO/DUACS Delayed-Time Level-3 sea surface ...
    title:                     DT Jason-3 Global Ocean Along track SSALTO/DUA...

Load somed data into memory:

[3]:
ds_ll = ds[['latitude', 'longitude', 'sla_filtered']].reset_coords().astype('f4').load()
ds_ll
[3]:
<xarray.Dataset>
Dimensions:       (time: 53154815)
Coordinates:
  * time          (time) datetime64[ns] 2016-05-26T14:14:03.917554 ... 2019-0...
Data variables:
    latitude      (time) float32 -66.15 -66.15 -66.15 ... 66.14 66.14 66.14
    longitude     (time) float32 17.22 17.36 17.5 17.64 ... 335.4 335.5 335.7
    sla_filtered  (time) float32 0.026 0.028 0.03 0.031 ... 0.129 0.134 0.138
Attributes: (12/24)
    Conventions:               CF-1.6
    Metadata_Conventions:      Unidata Dataset Discovery v1.0
    cdm_data_type:             Swath
    comment:                   Sea surface height measured by altimeters refe...
    contact:                   servicedesk.cmems@mercator-ocean.eu
    creator_email:             servicedesk.cmems@mercator-ocean.eu
    ...                        ...
    software_version:          6.2_DUACS_DT2018_baseline
    source:                    Jason-3 measurements
    ssalto_duacs_comment:      The reference mission used for the altimeter i...
    standard_name_vocabulary:  NetCDF Climate and Forecast (CF) Metadata Conv...
    summary:                   SSALTO/DUACS Delayed-Time Level-3 sea surface ...
    title:                     DT Jason-3 Global Ocean Along track SSALTO/DUA...

Convert to pandas dataframe:

[4]:
df = ds_ll.to_dataframe()
df
[4]:
latitude longitude sla_filtered
time
2016-05-26 14:14:03.917554 -66.147430 17.224194 0.026
2016-05-26 14:14:04.996134 -66.147163 17.361219 0.028
2016-05-26 14:14:06.074715 -66.146767 17.498243 0.030
2016-05-26 14:14:07.153295 -66.146240 17.635262 0.031
2016-05-26 14:14:08.231875 -66.145592 17.772272 0.032
... ... ... ...
2019-05-13 23:58:09.837942 66.142929 335.137817 0.120
2019-05-13 23:58:10.916522 66.143593 335.274872 0.125
2019-05-13 23:58:11.995103 66.144127 335.411926 0.129
2019-05-13 23:58:13.073683 66.144539 335.548981 0.134
2019-05-13 23:58:14.152263 66.144814 335.686066 0.138

53154815 rows × 3 columns

Visualize with hvplot

[5]:
df.hvplot.scatter(x='longitude', y='latitude', datashade=True, )
[5]:

Bin using xhistogram

https://xhistogram.readthedocs.io/

[6]:
from xhistogram.xarray import histogram

lon_bins = np.arange(0, 361, 2)
lat_bins = np.arange(-70, 71, 2)

# helps with memory management
ds_ll_chunked = ds_ll.chunk({'time': '5MB'})

sla_variance = histogram(ds_ll_chunked.longitude, ds_ll_chunked.latitude,
                         bins=[lon_bins, lat_bins],
                         weights=ds_ll_chunked.sla_filtered.fillna(0.)**2)

norm = histogram(ds_ll_chunked.longitude, ds_ll_chunked.latitude,
                         bins=[lon_bins, lat_bins])


# let's get at least 200 points in a box for it to be unmasked
thresh = 200
sla_variance = sla_variance / norm.where(norm > thresh)
sla_variance
[6]:
<xarray.DataArray 'histogram_longitude_latitude' (longitude_bin: 180, latitude_bin: 70)>
dask.array<truediv, shape=(180, 70), dtype=float64, chunksize=(180, 70), chunktype=numpy.ndarray>
Coordinates:
  * longitude_bin  (longitude_bin) float64 1.0 3.0 5.0 7.0 ... 355.0 357.0 359.0
  * latitude_bin   (latitude_bin) float64 -69.0 -67.0 -65.0 ... 65.0 67.0 69.0
[7]:
sla_variance.load()
[7]:
<xarray.DataArray 'histogram_longitude_latitude' (longitude_bin: 180, latitude_bin: 70)>
array([[       nan, 0.00620336, 0.00644503, ..., 0.01241829, 0.01244138,
               nan],
       [       nan, 0.00597437, 0.00593017, ..., 0.01349369, 0.01050862,
               nan],
       [       nan, 0.00590221, 0.00598762, ..., 0.01173668, 0.00958628,
               nan],
       ...,
       [       nan, 0.00657664, 0.00591848, ..., 0.00922974, 0.00990859,
               nan],
       [       nan, 0.00629435, 0.00607831, ..., 0.01104878, 0.0128521 ,
               nan],
       [       nan, 0.00647509, 0.00636302, ..., 0.01209499, 0.01273962,
               nan]])
Coordinates:
  * longitude_bin  (longitude_bin) float64 1.0 3.0 5.0 7.0 ... 355.0 357.0 359.0
  * latitude_bin   (latitude_bin) float64 -69.0 -67.0 -65.0 ... 65.0 67.0 69.0
[8]:
sla_variance.plot(x='longitude_bin', figsize=(12, 6), vmax=0.2)
[8]:
<matplotlib.collections.QuadMesh at 0x7f78955daf10>
../../../_images/repos_pangeo-gallery_physical-oceanography_02_along_track_13_1.png
[ ]: