data_access.py#

This module provides utilities which support input/output processes. Functions in this module can provide methods to return dictionaries of filepaths keyed by initialization year, nested lists of files for particular start years and ensemble members, and dask arrays containing particular hindcast ensembles. This module also provides preprocessing which can assist in using intake-esm in conjunction with other data_access functions.

Authors#

  • Steve Yeager

  • Elizabeth Maroon

Use#

Users wishing to utilize these tools may do so by importing various functions, for example:

from esp-tools.utils.io_utils import file_dict

Dependencies#

The user must have an activated conda environment which includes xarray, numpy, glob, and functools.

data_access.file_dict(filetempl, filetype, mem, stmon)[source]#

Returns a dictionary of filepaths keyed by initialization year, for a given experiment, field, ensemble member, and initialization month

Parameters
  • filetempl (str) – file template

  • filetype (str) – file ending

  • mem (int) – ensemble member

  • stmon (int) – month

Returns

filepaths (dict) – dictionary containing filepaths keyed by initialization year

data_access.get_monthly_data(filetemplate, filetype, ens, nlead, field, start_years, stmon, preproc, chunks={})[source]#

Returns a dask array containing the requested hindcast ensemble.

Parameters
  • nfiletemplate (str) – file template

  • filetype (str) – file ending

  • ens (int) – ensemble member

  • nlead (int) – number of months over which data is read; allows for a partial read of the data and controls the time dimension of returned dask array

  • field (str) – variable to be examined, eg ‘TREFHT’

  • startyears (list) – list of start years which are integers

  • stmon (str) – month

  • preproc (func) – preprocessing function

  • chunks (dict) – chunks for dask array, defaults to {}

Returns

ds0 (dask array) – dask array containing requested hindcast ensemble

data_access.nested_file_list_by_year(filetemplate, filetype, ens, start_years, stmon)[source]#

Retrieves a nested list of files for these start years and ensemble members

Parameters
  • filetemplate (str) – file template

  • filetype (str) – file ending

  • ens (int) – ensemble member

  • start_years (list) – list of start years which are integers

  • stmon (str) – month

Returns

nested_files (list) – nested list of files

data_access.preprocessor(ds0, nlead, field)[source]#

This preprocessor is applied on an individual timeseries file basis. It will return a monthly mean CAM field with centered time coordinate. Edit this appropriately for your analysis to speed up processing.

Parameters
  • ds0 (xarray) – timeseries xarray dataset that requires preprocessing

  • nlead (int) – number of months over which data is read; allows for a partial read of the data and controls the time dimension of returned dask array

  • field (str) – variable to be examined, eg ‘TREFHT’

Returns

d0 (xarray) – xarray dataset of monthly mean CAM field with centered time coordinate

data_access.time_set_midmonth(ds, time_name)[source]#

Return copy of ds with values of ds[time_name] replaced with mid-month values (day=15) rather than end-month values.

Parameters
  • ds (xarray) – xarray dataset which currently has end month values that will be replaced with mid month values

  • time_name (str) – name of time component, eg ‘time’

Returns

ds (xarray) – xarray dataset with end month values replaced with mid month values