stats.py#

This module provides utilities to assist in statistics calculations related to SMYLE analysis. Functions provide tools to perform linear detrending along a particular axis, determine skill metrics based on model and observation DataArrays, and generate a distribution of skill scores using a smaller ensemble member size.

Authors#

  • Steve Yeager

  • Elizabeth Maroon

Use#

Users wishing to utilize these tools may do so by importing various functions, for example:

from esp-tools.utils.stat_utils import cor_ci_bootyears

Dependencies#

The user must have an activated conda environment which includes xarray, numpy, sys, cftime, and xskillscore.

stats.compute_resampskill_annual(mod_da, mod_time, obs_da, nleadavg=1, nleads=1, detrend=False, resamp=0, mean=True)[source]#

Computes a suite of deterministic skill metrics given two DataArrays corresponding to model and observations, which must share the same lat/lon coordinates (if any). Assumes time coordinates are compatible (can be aligned). Both DataArrays should contain annual fields.

Unlike compute_skill_annual(), this version operates on a mod_da input that has already been resampled across the member dimension (M) such that it has an ‘iteration’ dimension. Returns the resampled skill score distribution (or the mean of the skill score distribution if mean==True).

Parameters
  • mod_da (DataArray) – a annually-averaged (de-drifted) hindcast DataArray dimensioned (Y,L,M,…). Assumes ‘iteration’ dimension.

  • mod_time (DataArray) – a hindcast time DataArray dimensioned (Y,L). Assumes year values as int or float.

  • obs_da (DataArray) – a annually-averaged OBS DataArray dimensioned (time,…)

  • nleadavg (int (optional)) – sets temporal smoothing (e.g., nleadavg=3 to verify 3-year average fields).

  • nleads (int (optional)) – number of leads to include in skill computation (e.g., nleadavg=3,nleads=2 will return metrics for FY1-3, FY2-4)

  • resamp (bool (optional)) – number of resamplings of individual-member timeseries for computing forecast variance.

  • detrend (bool (optional)) – defaults to False; if set to True, skill scores will be computed after detrending

  • mean (bool (optional)) – set to False to return full resampled skill score distribution

Returns

dsout (DataArray) – set of skill score metrics

stats.compute_resampskill_seasonal(mod_da, mod_time, obs_da, climy0, climy1, nleadavg=1, nleads=1, detrend=False, resamp=0, mean=True, monthly=False)[source]#

Computes a suite of deterministic skill metrics given two DataArrays corresponding to model and observations, which must share the same lat/lon coordinates (if any). Assumes time coordinates are compatible (can be aligned). Both DataArrays should contain either monthly or 3monthseason-average fields.

Unlike compute_skill_annual(), this version operates on a mod_da input that has already been resampled across the member dimension (M) such that it has an ‘iteration’ dimension. Returns the resampled skill score distribution (or the mean of the skill score distribution if mean==True).

Parameters
  • mod_da (DataArray) – a monthly or seasonally-averaged (de-drifted) hindcast DataArray dimensioned (Y,L,M,…). Assumes ‘iteration’ dimension.

  • mod_time (DataArray) – a hindcast time DataArray dimensioned (Y,L). Assumes mod_time.dt.month & mod_time.dt.year exist.

  • obs_da (DataArray) – a monthly or seasonally-averaged OBS DataArray dimensioned (time,…)

  • climy0 (int) – start year of climatology for computing anomalies

  • climy1 (int) – end year of climatology for computing anomalies

  • nleadavg (int (optional)) – sets temporal smoothing (e.g., nleadavg=3 to verify 3-year average fields).

  • nleads (int (optional)) – number of leads to include in skill computation (e.g., nleadavg=3,nleads=2 will return metrics for FY1-3, FY2-4)

  • resamp (bool (optional)) – number of resamplings of individual-member timeseries for computing forecast variance.

  • detrend (bool (optional)) – defaults to False; if set to True, skill scores will be computed after detrending

  • mean (bool (optional)) – set to False to return full resampled skill score distribution

  • monthly (bool (optional)) – set to True if mod_da and obs_da are monthly means (skill will be computed for each lead month instead of each lead season)

Returns

dsout (DataArray) – set of skill score metrics

stats.compute_skill_annual(mod_da, mod_time, obs_da, nleadavg=1, nleads=1, resamp=0, detrend=False)[source]#

Computes a suite of deterministic skill metrics given two DataArrays corresponding to model and observations, which must share the same lat/lon coordinates (if any). Assumes time coordinates are compatible (can be aligned). Both DataArrays should contain annual-average fields.

Parameters
  • mod_da (DataArray) – an annually-averaged hindcast DataArray dimensioned (Y,L,M,…)

  • mod_time (DataArray) – a hindcast time DataArray dimensioned (Y,L). Assumes year values as int or float.

  • obs_da (DataArray) – an annually-averaged OBS DataArray dimensioned (time,…)

  • nleadavg (int (optional)) – permits additional temporal smoothing (e.g., nleadavg=3 to verify 3-year average hindcasts).

  • nleads (int (optional)) – number of leads to include in skill computation (e.g., nleadavg=3,nleads=2 will return metrics for: FY1-3, FY2-4)

  • resamp (bool (optional)) – number of resamplings of individual-member timeseries for computing forecast variance.

  • detrend (bool (optional)) – defaults to False; if set to True, skill scores will be computed after detrending

Returns

dsout (DataArray) – set of skill score metrics

stats.compute_skill_seasonal(mod_da, mod_time, obs_da, climy0, climy1, nleadavg=1, nleads=1, resamp=0, detrend=False, monthly=False)[source]#

Computes a suite of deterministic skill metrics given two DataArrays corresponding to model and observations, which must share the same lat/lon coordinates (if any). Assumes time coordinates are compatible (can be aligned). Both DataArrays should contain either monthly or 3monthseason-average fields.

Parameters
  • mod_da (DataArray) – a monthly or seasonally-averaged (de-drifted) hindcast DataArray dimensioned (Y,L,M,…)

  • mod_time (DataArray) – a hindcast time DataArray dimensioned (Y,L). Assumes mod_time.dt.month & mod_time.dt.year exist.

  • obs_da (DataArray) – a monthly or seasonally-averaged OBS DataArray dimensioned (time,…)

  • climy0 (int) – start year of climatology for computing anomalies

  • climy1 (int) – end year of climatology for computing anomalies

  • nleadavg (int (optional)) – sets temporal smoothing (e.g., nleadavg=3 to verify 3-year average fields).

  • nleads (int (optional)) – number of leads to include in skill computation (e.g., nleadavg=3,nleads=2 will return metrics for FY1-3, FY2-4)

  • resamp (bool (optional)) – number of resamplings of individual-member timeseries for computing forecast variance.

  • detrend (bool (optional)) – defaults to False; if set to True, skill scores will be computed after detrending

  • monthly (bool (optional)) – set to True if mod_da and obs_da are monthly means (skill will be computed for each lead month instead of each lead season)

Returns

dsout (DataArray) – set of skill score metrics

stats.cor_ci_bootyears(ts1, ts2, seed=None, nboots=1000, conf=95)[source]#

Determine confidence intervals for correlation scores.

Parameters
  • ts1 (array)

  • ts2 (array)

  • seed (int (optional)) – seed for random number generation, default None

  • nboots (int) – number boots (optional, default 1000)

  • conf (float (optional)) – confidence value; defaults to 95

Returns

  • minci (float) – minimum confidence interval

  • maxci (float) – maximum confidence interval

stats.detrend_linear(dat, dim)[source]#

Linear detrend dat along the axis dim.

Parameters
  • dat (array) – data which is to be detrended

  • dim (str) – dimension along which linear detrending is performed

Returns

dat (DataArray) – detrended DataArray

stats.leadtime_skill_seas(mod_da, mod_time, obs_da, detrend=False)[source]#

Computes a suite of deterministic skill metrics given two DataArrays corresponding to model and observations, which must share the same lat/lon coordinates (if any). Assumes time coordinates are compatible (can be aligned). Both DataArrays should represent 3-month seasonal averages (DJF, MAM, JJA, SON).

Parameters
  • mod_da (DataArray) – a seasonally-averaged hindcast DataArray dimensioned (Y,L,M,…)

  • mod_time (DataArray) – a hindcast time DataArray dimensioned (Y,L). note: assumes mod_time.dt.month

  • obs_da (DataArray) – an OBS DataArray dimensioned (season,year,…)

  • detrend (optional) (bool) – defaults to False; if True, skill scores computed after detrending

Returns

xr_dataset (DataArray) – set of skill score metrics

stats.leadtime_skill_seas_resamp(mod_da, mod_time, obs_da, sampsize, N, detrend=False)[source]#

Computes a suite of deterministic skill metrics given two DataArrays corresponding to model and observations, which must share the same lat/lon coordinates (if any). Assumes time coordinates are compatible (can be aligned). Both DataArrays should represent 3-month seasonal averages (DJF, MAM, JJA, SON).

Unlike leadtime_skill_seas(), this version resamples the mod_da member dimension (M) to generate a distribution of skill scores using a smaller ensemble size (N, where N<M). Returns the mean of the resampled skill score distribution.

Parameters
  • mod_da (DataArray) – a seasonally-averaged hindcast DataArray dimensioned (Y,L,M,…)

  • mod_time (DataArray) – a hindcast time DataArray dimensioned (Y,L). Assumes mod_time.dt.month

  • obs_da (DataArray) – an OBS DataArray dimensioned (season,year,…)

  • sampsize (int) – sample size

  • N (int) – maximum dimension for resampling

  • detrend (bool (optional)) – defaults to False; if set to True, skill scores will be computed after detrending

Returns

dsout (xarray) – mean of resampled skill score metrics

stats.remove_drift(da, da_time, y1, y2)[source]#

Function to convert raw DP DataArray into anomaly DP DataArray with leadtime-dependent climatology removed.

Parameters
  • da (DP DataArray) – Raw DP DataArray with dimensions (Y,L,M,…)

  • da_time (DP DataArray) – Verification time of DP DataArray (Y,L)

  • y1 (int) – Start year of climatology

  • y2 (int) – End year of climatology

Returns

  • da_anom (DP DataArray) – De-drifted DP DataArray

  • da_climo (DP DataArray) – Leadtime-dependent climatology