data_access.py#
This module provides utilities which support input/output processes. Functions in this module can provide methods to return dictionaries of filepaths keyed by initialization year, nested lists of files for particular start years and ensemble members, and dask arrays containing particular hindcast ensembles. This module also provides preprocessing which can assist in using intake-esm in conjunction with other data_access functions.
Use#
Users wishing to utilize these tools may do so by importing various functions, for example:
from esp-tools.utils.io_utils import file_dict
Dependencies#
The user must have an activated conda environment which includes xarray, numpy, glob, and functools.
- data_access.file_dict(filetempl, filetype, mem, stmon)[source]#
Returns a dictionary of filepaths keyed by initialization year, for a given experiment, field, ensemble member, and initialization month
- Parameters
filetempl (str) – file template
filetype (str) – file ending
mem (int) – ensemble member
stmon (int) – month
- Returns
filepaths (dict) – dictionary containing filepaths keyed by initialization year
- data_access.get_monthly_data(filetemplate, filetype, ens, nlead, field, start_years, stmon, preproc, chunks={})[source]#
Returns a dask array containing the requested hindcast ensemble.
- Parameters
nfiletemplate (str) – file template
filetype (str) – file ending
ens (int) – ensemble member
nlead (int) – number of months over which data is read; allows for a partial read of the data and controls the time dimension of returned dask array
field (str) – variable to be examined, eg ‘TREFHT’
startyears (list) – list of start years which are integers
stmon (str) – month
preproc (func) – preprocessing function
chunks (dict) – chunks for dask array, defaults to {}
- Returns
ds0 (dask array) – dask array containing requested hindcast ensemble
- data_access.nested_file_list_by_year(filetemplate, filetype, ens, start_years, stmon)[source]#
Retrieves a nested list of files for these start years and ensemble members
- Parameters
filetemplate (str) – file template
filetype (str) – file ending
ens (int) – ensemble member
start_years (list) – list of start years which are integers
stmon (str) – month
- Returns
nested_files (list) – nested list of files
- data_access.preprocessor(ds0, nlead, field)[source]#
This preprocessor is applied on an individual timeseries file basis. It will return a monthly mean CAM field with centered time coordinate. Edit this appropriately for your analysis to speed up processing.
- Parameters
ds0 (xarray) – timeseries xarray dataset that requires preprocessing
nlead (int) – number of months over which data is read; allows for a partial read of the data and controls the time dimension of returned dask array
field (str) – variable to be examined, eg ‘TREFHT’
- Returns
d0 (xarray) – xarray dataset of monthly mean CAM field with centered time coordinate
- data_access.time_set_midmonth(ds, time_name)[source]#
Return copy of ds with values of ds[time_name] replaced with mid-month values (day=15) rather than end-month values.
- Parameters
ds (xarray) – xarray dataset which currently has end month values that will be replaced with mid month values
time_name (str) – name of time component, eg ‘time’
- Returns
ds (xarray) – xarray dataset with end month values replaced with mid month values