stats.py#
This module provides utilities to assist in statistics calculations related to SMYLE analysis. Functions provide tools to perform linear detrending along a particular axis, determine skill metrics based on model and observation DataArrays, and generate a distribution of skill scores using a smaller ensemble member size.
Use#
Users wishing to utilize these tools may do so by importing various functions, for example:
from esp-tools.utils.stat_utils import cor_ci_bootyears
Dependencies#
The user must have an activated conda environment which includes xarray, numpy, sys, cftime, and xskillscore.
- stats.cor_ci_bootyears(ts1, ts2, seed=None, nboots=1000, conf=95)[source]#
Determine confidence intervals for correlation scores.
- Parameters
ts1 (array)
ts2 (array)
seed (int (optional)) – seed for random number generation, default None
nboots (int) – number boots (optional, default 1000)
conf (float (optional)) – confidence value; defaults to 95
- Returns
minci (float) – minimum confidence interval
maxci (float) – maximum confidence interval
- stats.detrend_linear(dat, dim)[source]#
Linear detrend dat along the axis dim.
- Parameters
dat (array) – data which is to be detrended
dim (str) – dimension along which linear detrending is performed
- Returns
dat (array) – detrended array
- stats.leadtime_skill_seas(mod_da, mod_time, obs_da, detrend=False)[source]#
Computes a suite of deterministic skill metrics given two DataArrays corresponding to model and observations, which must share the same lat/lon coordinates (if any). Assumes time coordinates are compatible (can be aligned). Both DataArrays should represent 3-month seasonal averages (DJF, MAM, JJA, SON).
- Parameters
mod_da (DataArray) – a seasonally-averaged hindcast DataArray dimensioned (Y,L,M,…)
mod_time (DataArray) – a hindcast time DataArray dimensioned (Y,L). note: assumes mod_time.dt.month
obs_da (DataArray) – an OBS DataArray dimensioned (season,year,…)
detrend (optional) (bool) – defaults to False; if True, skill scores computed after detrending
- Returns
xr_dataset (DataArray) – the mid-month of a 3-month seasonal average (e.g., mon=1 ==> “DJF”).
- stats.leadtime_skill_seas_resamp(mod_da, mod_time, obs_da, sampsize, N, detrend=False)[source]#
Computes a suite of deterministic skill metrics given two DataArrays corresponding to model and observations, which must share the same lat/lon coordinates (if any). Assumes time coordinates are compatible (can be aligned). Both DataArrays should represent 3-month seasonal averages (DJF, MAM, JJA, SON).
Unlike leadtime_skill_seas(), this version resamples the mod_da member dimension (M) to generate a distribution of skill scores using a smaller ensemble size (N, where N<M). Returns the mean of the resampled skill score distribution.
- Parameters
mod_da (DataArray) – a seasonally-averaged hindcast DataArray dimensioned (Y,L,M,…)
mod_time (DataArray) – a hindcast time DataArray dimensioned (Y,L). Assumes mod_time.dt.month
obs_da (DataArray) – an OBS DataArray dimensioned (season,year,…)
sampsize (int) – sample size
N (int) – maximum dimension for resampling
detrend (bool (optional)) – defaults to False; if set to True, skill scores will be computed after detrending
- Returns
dsout (xarray) – mean of resampled skill score distribution
- stats.remove_drift(da, da_time, y1, y2)[source]#
Function to convert raw DP DataArray into anomaly DP DataArray with leadtime-dependent climatology removed.
Maroon (modified by S. Yeager)
- Parameters
da (DP DataArray) – Raw DP DataArray with dimensions (Y,L,M,…)
da_time (DP DataArray) – Verification time of DP DataArray (Y,L)
y1 (int) – Start year of climatology
y2 (int) – End year of climatology
- Returns
da_anom (DP DataArray) – De-drifted DP DataArray
da_climo (DP DataArray) – Leadtime-dependent climatology