stats.py#

This module provides utilities to assist in statistics calculations related to SMYLE analysis. Functions provide tools to perform linear detrending along a particular axis, determine skill metrics based on model and observation DataArrays, and generate a distribution of skill scores using a smaller ensemble member size.

Authors#

  • Steve Yeager

    1. Maroon

Use#

Users wishing to utilize these tools may do so by importing various functions, for example:

from esp-tools.utils.stat_utils import cor_ci_bootyears

Dependencies#

The user must have an activated conda environment which includes xarray, numpy, sys, cftime, and xskillscore.

stats.cor_ci_bootyears(ts1, ts2, seed=None, nboots=1000, conf=95)[source]#

Determine confidence intervals for correlation scores.

Parameters
  • ts1 (array)

  • ts2 (array)

  • seed (int (optional)) – seed for random number generation, default None

  • nboots (int) – number boots (optional, default 1000)

  • conf (float (optional)) – confidence value; defaults to 95

Returns

  • minci (float) – minimum confidence interval

  • maxci (float) – maximum confidence interval

stats.detrend_linear(dat, dim)[source]#

Linear detrend dat along the axis dim.

Parameters
  • dat (array) – data which is to be detrended

  • dim (str) – dimension along which linear detrending is performed

Returns

dat (array) – detrended array

stats.leadtime_skill_seas(mod_da, mod_time, obs_da, detrend=False)[source]#

Computes a suite of deterministic skill metrics given two DataArrays corresponding to model and observations, which must share the same lat/lon coordinates (if any). Assumes time coordinates are compatible (can be aligned). Both DataArrays should represent 3-month seasonal averages (DJF, MAM, JJA, SON).

Parameters
  • mod_da (DataArray) – a seasonally-averaged hindcast DataArray dimensioned (Y,L,M,…)

  • mod_time (DataArray) – a hindcast time DataArray dimensioned (Y,L). note: assumes mod_time.dt.month

  • obs_da (DataArray) – an OBS DataArray dimensioned (season,year,…)

  • detrend (optional) (bool) – defaults to False; if True, skill scores computed after detrending

Returns

xr_dataset (DataArray) – the mid-month of a 3-month seasonal average (e.g., mon=1 ==> “DJF”).

stats.leadtime_skill_seas_resamp(mod_da, mod_time, obs_da, sampsize, N, detrend=False)[source]#

Computes a suite of deterministic skill metrics given two DataArrays corresponding to model and observations, which must share the same lat/lon coordinates (if any). Assumes time coordinates are compatible (can be aligned). Both DataArrays should represent 3-month seasonal averages (DJF, MAM, JJA, SON).

Unlike leadtime_skill_seas(), this version resamples the mod_da member dimension (M) to generate a distribution of skill scores using a smaller ensemble size (N, where N<M). Returns the mean of the resampled skill score distribution.

Parameters
  • mod_da (DataArray) – a seasonally-averaged hindcast DataArray dimensioned (Y,L,M,…)

  • mod_time (DataArray) – a hindcast time DataArray dimensioned (Y,L). Assumes mod_time.dt.month

  • obs_da (DataArray) – an OBS DataArray dimensioned (season,year,…)

  • sampsize (int) – sample size

  • N (int) – maximum dimension for resampling

  • detrend (bool (optional)) – defaults to False; if set to True, skill scores will be computed after detrending

Returns

dsout (xarray) – mean of resampled skill score distribution

stats.remove_drift(da, da_time, y1, y2)[source]#

Function to convert raw DP DataArray into anomaly DP DataArray with leadtime-dependent climatology removed.

  1. Maroon (modified by S. Yeager)

Parameters
  • da (DP DataArray) – Raw DP DataArray with dimensions (Y,L,M,…)

  • da_time (DP DataArray) – Verification time of DP DataArray (Y,L)

  • y1 (int) – Start year of climatology

  • y2 (int) – End year of climatology

Returns

  • da_anom (DP DataArray) – De-drifted DP DataArray

  • da_climo (DP DataArray) – Leadtime-dependent climatology