NEXT package

Submodules

NEXT.coef_est module

Created on Wed Sep 18 11:03:40 2024

@author: dphilippus

This file handles data preprocessing and coefficient estimation.

NEXT.coef_est.build_model_from_data(tr_data)

Prepares a coefficient estimation model from the provided training data. Training data is assumed to have coefficients listed in col_order, which will be converted through PCA.

NEXT.coef_est.build_training_data(data)

Prepare a training dataset by fitting watershed models.

NEXT.coef_est.predict_all_coefficients(model, data, draw=False)

Predicts model coefficients for all sites.

NEXT.coef_est.predict_site_coefficients(model, data, draw=False, noise_factor=0.9)

Predicts model coefficients using the provided (pre-processed) data for a specific site. Then invert PCA to produce NEWT coefficients. If draw is True, generate a random draw.

NEXT.coef_est.preprocess(data, allow_no_id=True)

Convert raw input data into appropriate format, with all required covariates.

NEXT.coef_est.ssn_df(col)

NEXT.data module

Created on Tue Sep 24 10:47:37 2024

@author: dphilippus

This file contains automatic utilities for data retrieval to easily set up a NEXT model. The idea is that, ultimately, you provide a watershed and NEXT does the rest.

Organization: there are a set of low-level utilities that pull required data from specific sources, and a high-level data retrieval function that pulls all required data from specified or default sources.

All retrieval functions provide the required column names, plus a date column, except for the geometry and topo functions, which return a dictionary.

Training requirements: [‘tmax’, ‘prcp’, ‘vp’, ‘area’, ‘elev_min’, ‘elev’, ‘slope’, ‘forest’, ‘wetland’, ‘developed’, ‘ice_snow’, ‘water’, ‘lat’, ‘lon’, ‘id’, ‘temperature’] Prediction requirements: everything except temperature

NEXT.data.all_data_gpkg(path, start, end, weather='daymet', lc='nlcd', topo='3dep', obs=None, cumulative=False, handler=<function <lambda>>)

Wraps full_data to get everything for each site in a geopackage at path.

NEXT.data.all_data_reaches(coords, dist, buff, start, end, weather='daymet', lc='nlcd', topo='3dep', as_df=False)
NEXT.data.buffer(data, buffer)

Apply a buffer (in meters) to data that is in decimal degrees. Buffer is a straightforward Shapely/GeoPandas method in a projected CRS (with meters, etc), but it’s inappropriate with lat/lon. This function simply projects into meters, buffers, and reprojects. Assumes original CRS is 4326 (WGS84).

NEXT.data.centroid(geom)

Smart centroid function that works for GeoPandas rows _or_ Shapely shapes.

Parameters:

geom (Series or Polygon) – Geometry to be analyzed, either as a series (with .geometry) or a Shapely object.

Return type:

Centroid as a Shapely point.

NEXT.data.combined_areas(coordinates, dist=1)
NEXT.data.full_data(site, start, end, site_type='usgs', weather='daymet', lc='nlcd', topo='3dep', obs=None, **kwargs)

Retrieves all required data for a given site, from start to end. This high-level function allows the user to simply specify sources by name and handles the rest.

Parameters:
  • site (str) – Site identifier, format depending on site type. Examples are a USGS gage ID or a coordinate string of the form “lon:lat”.

  • start (str) – either full dates “YYYY-MM-DD” or just years, “YYYY”. If they are just years, then each year will be run individually for weather retrieval. Why does this matter to a high-level function? Because running one year at a time uses much less memory, so providing years is a good solution if you are running out of memory. They must both be in the same format.

  • end (str) – either full dates “YYYY-MM-DD” or just years, “YYYY”. If they are just years, then each year will be run individually for weather retrieval. Why does this matter to a high-level function? Because running one year at a time uses much less memory, so providing years is a good solution if you are running out of memory. They must both be in the same format.

  • site_type (str, optional) – Specify the type of site and data sources for weather, land cover, and topography.

  • weather (str, optional) – Specify the type of site and data sources for weather, land cover, and topography.

  • lc (str, optional) – Specify the type of site and data sources for weather, land cover, and topography.

  • topo (str, optional) – Specify the type of site and data sources for weather, land cover, and topography.

  • **kwargs (passed to geom_full_data)

Returns:

All required TempEst-NEXT prediction data in a DataFrame for the site.

Return type:

DataFrame

Notes

Currently supported sources are as follows. site_type: “usgs” (USGS gage ID) or “coordinates” (“lon:lat”). weather: “daymet”, “nldas”, “gfs” (forecasting only), “hrrr”. lc: “nlcd” topo: “3dep”

NEXT.data.gage_geom(usgs_id: str)

Get geometry associated with USGS gage ID.

Parameters:

usgs_id (str) – USGS gage ID (no “USGS-”, just the ID part).

Returns:

Watershed boundary, lat, lon, area in m2.

Return type:

(GeoDataFrame, float, float, float)

NEXT.data.geom_full_data(site, site_type, geom, lat, lon, area, start, end, weather='daymet', lc='nlcd', topo='3dep', obs=None, buffer_fallback=True)
NEXT.data.geom_static_data(site, site_type, geom, lat, lon, area, lc='nlcd', topo='3dep')
NEXT.data.get_area(geom)
NEXT.data.get_canopy(geom, date)

Get mean canopy cover for the specified geometry and date. Date can be anything parseable by numpy. It will be moved into the nearest date in the range (2011, 2021), which is supported by NLCD.

NEXT.data.get_endpoint(lstr)
NEXT.data.get_mean_direction(site, site_type)

Retrieve the mean direction of the last ~kilometer of the river. Site may be a tuple of (lon, lat) (“coordinates”), a USGS ID (“usgs”), or a COMID (“nhd”).

NEXT.data.get_river(site, site_type, dist)

Get river geometry upstream of a site.

Parameters:
  • site ((float, float) | str) – Site may be a tuple of (lon, lat) (“coordinates”), a USGS ID (“usgs”), or a COMID (“nhd”) with corresponding site_type.

  • site_type (str) – What kind of site identifier is it? (coordinates, usgs, nhd)

  • dist (number) – How far upstream should it look, in km?

Returns:

Upstream river shape

Return type:

gpd.GeoDataFrame

NEXT.data.get_upstream(coordinates, dist=1)

Retrieve upstream flow network from a given coordinates. Coordinates: (lon, lat) in WGS84 decimal degrees. Alternatively,

a string specifying a USGS gage ID or NHD+ COMID, of the form “usgs:12345” or “comid:12345”.

dist: Range in km. Set to the length of the reach of interest. Tributaries are used to construct subwatershed models. Mainstem is passed along as-is for use to extract a riparian buffer, etc.

NEXT.data.get_upstream_buffer(site, site_type, dist, buffer, geom=None, original=4326)

Retrieve a buffer around a specified distance upstream (of the nearest NHD+ point). Site may be a tuple of (lon, lat) (“coordinates”), a USGS ID (“usgs”), or a COMID (“nhd”). Distance and buffer should be in km.

NEXT.data.get_watershed(coordinates: tuple)

Retrieve watershed shape for a given set of coordinates.

Parameters:

coordinates ((float, float)) – lon, lat (in E, N)

Returns:

Watershed boundary, lat, lon, area in m2.

Return type:

(GeoDataFrame, float, float, float)

NEXT.data.gpkg_geoms(path, cumulative=False)

Parse geometries from a geopackage, e.g. for reverse-engineering geometries for an ngen setup Returns dictionary of: {id: (geometry, lat, lon, area)} Assumes columns: id, areasqkm (or tot_drainage_areasqkm), geometry Uses ws area if not cumulative, otherwise total area.

NEXT.data.lcov_nlcd(geom, start, end)
NEXT.data.merit_geom(merit_id)
NEXT.data.nhd_geom(nhd_id)
NEXT.data.nldi()
NEXT.data.obs_usgs(usgs_id, start, end)
NEXT.data.topo_3dep(geom, area)
NEXT.data.topo_merit(geom)
NEXT.data.unroll_coords(reaches: GeoDataFrame)

Extract all coordinate pairs in linestrings as a list.

Parameters:

reaches (gpd.GeoDataFrame) – A GeoDataFrame containing linestring geometries (i.e., reaches).

Returns:

List of all coordinate pairs.

Return type:

list[(float, float)]

NEXT.data.watershed_geom(site: str)

Watershed retriever with appropriate syntax. Uses a coordinate string which is ‘lon:lat’.

Parameters:

site (str) – ‘lon:lat’ in E/N, e.g. ‘-104.123:39.1235’.

Returns:

Watershed boundary, lat, lon, area in m2.

Return type:

(GeoDataFrame, float, float, float)

NEXT.data.weather_daymet(geom, start, end)
NEXT.data.weather_gfs(geom, start, end)

The “downloaded” version works better, but may not work on Windows. Only retrieve tmax, since GFS is not suitable for the full timeseires.

NEXT.data.weather_hrrr(geom, start, end)
NEXT.data.weather_nldas(geom, start, end)

NEXT.datatools module

This file covers automated data retrieval tools for ease-of-use.

NEXT.datatools.forecast_areal_summary(clipped_fcst, new_name, operator=<function <lambda>>)

Generate areal summary of a selected forecast, grouped by time.

NEXT.datatools.forecast_watershed_clip(forecast, basin)

Clip forecast output to a watershed.

NEXT.datatools.get_daily_forecasts(date, operator=<function <lambda>>, var='TMP')

Retrieve full forecast run, then returns daily summaries.

:47 because the last hour is two days out

NEXT.datatools.get_full_forecast(date, var='TMP')

Retrieve a full forecast run as an xarray. date: YYYYMMDD, using the 06z run (which should roughly correspond to day-of in US timezones) var: TMP, etc. Variable to retrieve.

Returns an xarray dataset for full CONUS, every hour for 48 hours.

See https://mesowest.utah.edu/html/hrrr/zarr_documentation/html/zarr_HowToDownload.html

NEXT.datatools.get_nldas(shape, name, new_name, start, end, operator=<function <lambda>>)

Retrieve and process NLDAS data, by date, for a given basin (shape). shape: basin geometry name: NLDAS variable name, e.g. “temp”, “prcp”, “humidity” new_name: Output variable name for consistency, e.g. “tmax” start: start date, YYYY-MM-DD end: end date operator: how to summarize hours-to-days, e.g. max for temp -> tmax, sum for prcp

Returns a data frame of date,new_name.

NEXT.datatools.get_shape_usgs(gid)

NEXT.model_manager module

Created on Wed Sep 18 11:50:08 2024

@author: dphilippus

This file contains a model-manager class that actually handles model building, etc. Mostly a wrapper around coef_est.

class NEXT.model_manager.NEXT(model, anomgam, anomnoise, drywet=None)

Bases: object

from_data(train_drywet=False)
from_pickle()
from_preproc_data(coef_anomaly, history, anomgam, anomnoise, train_drywet)
get_newt()
make_components(data, lookback=0, draw=False, quantiles=None, internal=False)
make_config(outfile, data=None)
make_newt(data, start_date='2020-01-01', reset=False, use_climate=False, climyears=0, draw=False, quantiles=None, use_drywet=False, **kwargs)
run(data, reset=False, **args)
to_pickle(file)
NEXT.model_manager.fit_anomgam(data, N=500000)

NEXT.reach_prep module

Created on Thu Nov 7 13:13:41 2024

@author: dphilippus

This file covers data preparation for the reach model. It is initially written for design of the reach model, not production implementation, but may prove useful for the latter.

At first pass, this simply reimplements the components designed for preliminary reach analysis in fully-automated form.

NEXT.reach_prep.compute_noreach(inputs, nxmod)

Compute ‘no-reach’ (area-weighted mean of contributing watersheds) temperature. It is assumed that the reach is <1 day long, so timing is ignored.

Parameters:
  • inputs (pandas.DataFrame) – Input dataframe as generated by get_all_data.

  • nxmod (NEXT.NEXT) – NEXT model to use for lumped watersheds.

Return type:

Data frame of date, tmod, where tmod is no-reach modeled temperature.

NEXT.reach_prep.get_all_data(raw_id, dist, buffer, start, end, cache_base=None, plot_ws=False)

Retrieve all required data for the specified location and tributaries, caching if desired.

Parameters:
  • raw_id (str) – USGS gage id.

  • dist (float) – Main reach length in km. Above that distance will be a lumped model.

  • buffer (float) – Reach buffer width in m for data retrieval. At least 100 m is recommended to avoid no-data errors.

  • start (str) – Start date in yyyy-mm-dd.

  • end (str) – End date in yyyy-mm-dd.

  • cache_base (str, optional) – Directory in which to cache results (recommended). If specified, results will be stored in cache_base/inputs_<raw_id>_<dist>km_<buffer>m.csv. Data retrieval is quite slow, so caching results is recommended.

  • plot_ws (bool, optional) – If true and there is not a cached input, plot the retrieved watershed geometry.

Returns:

  • A data frame covering the mainstem, upstream watershed, and tributary

  • watersheds with columns id, id_type (e.g., tributary, mainstem) plus all

  • TempEst-NEXT inputs.

NEXT.reach_prep.mk_range(low, high, N=5)

Generate an array covering N increments across the specified values, inclusive.

Parameters:
  • low (float) – Low value.

  • high (float) – High value.

  • N (int, optional) – Number of increments (>= 2). The default is 5.

Returns:

The desired range.

Return type:

np.array

NEXT.reach_prep.monoreach_maker(alpha, r, k, q)

Prepare a monoreach (uniform equilibration) prediction function using a simple equilibrium solution:

DeltaT = (Tair-T) * (1-exp(-alpha/Q^q)) + (beta*srad + k)/Q^q

This implementation assumes that stream temperature equilibration occurs through linear heat exchange with the air (thus, exponential decay of the difference), a constant solar radiation temperature input, and a constant “other flux” term (e.g., groundwater input), all of which has sensitivity inversely proportional to some power of discharge (which approximates depth).

All four arguments are calibrated.

The equation is not dimensionally consistent; each coefficient lumps several intermediate coefficients and would come out with bizarre and variable units (varying with q). Therefore, the inputs must be in consistent units (Celsius, W/m2, and m3/s).

Parameters:
  • alpha (float) – Air sensitivity coefficient.

  • r (float) – Radiation sensitivity coefficient.

  • k (float) – Constant flux coefficient.

  • q (float) – Flow sensitivity coefficient.

Returns:

function – Function which, given an input dataframe with tmax (air temperature), tmod, Q, srad, returns an array of temperature change predictions.

Return type:

pandas.DataFrame –> np.array

NEXT.reach_prep.noreach_withobs(inputs, nxmod, raw_id, start, end)

Compute ‘no-reach’ temperature and add observed temperature and discharge columns.

Parameters:
  • inputs (pandas.DataFrame) – Input dataframe as generated by get_all_data.

  • nxmod (NEXT.NEXT) – NEXT model to use for lumped watersheds.

  • raw_id (str) – USGS gage id.

  • start (str) – Start date in yyyy-mm-dd.

  • end (str) – End date in yyyy-mm-dd.

Returns:

  • Inputs data frame, with added columns tmod (no-reach temp), Q (observed flow),

  • temperature (observed), and delta (observed minus tmod).

NEXT.reach_prep.prepare_full_data(nxmod, raw_id, dist, buffer, start, end, cache_base=None, plot_ws=False)

Fully prepare data for further use, combining noreach_withobs and get_all_data. See documentation for those two. nxmod may be a str, in which case it is interpreted as a pickle file.

NEXT.reach_prep.reachperf(alpha, r, k, q, indat)

Computes performance for the given set of coefficients.

Parameters:
  • alpha (float) – Air sensitivity coefficient.

  • r (float) – Radiation sensitivity coefficient.

  • k (float) – Constant flux coefficient.

  • q (float) – Flow sensitivity coefficient.

  • indat (pandas.DataFrame) – Input data frame containing tmax, tmod, Q, srad, delta (observed).

  • perf ((array, array) -> float) – Function of (sim, obs) computing performance.

Returns:

Computed performance metric using the provided function.

Return type:

float

NEXT.reach_prep.search_reach_coefficients(indat, arange, rrange, krange, qrange, tolerance=0.0001, bias_tolerance=0.1, validate=True, log=False, maxit=100)

Searches the parameter space to identify optimal coefficients. Returns optimal coefficients, and their performance as R. If validate is true, return performance for a separate validation set.

Parameters:
  • indat (pandas.DataFrame) – Input data frame containing tmax, tmod, Q, srad, delta.

  • arange ((float, float)) – Minimum and maximum permitted values for alpha.

  • rrange ((float, float)) – Minimum and maximum permitted values for r.

  • krange ((float, float)) – Minimum and maximum permitted values for k.

  • qrange ((float, float)) – Minimum and maximum permitted values for q.

  • tolerance (float, optional) – Performance threshold for convergence. The default is 0.005.

  • validate (bool, optional) – Whether to report performance for a separate validation set.

  • log (bool, optional) – Whether to also return a data frame of iterations.

  • maxit (int, optional) – Maximum number of iterations.

Returns:

Dictionary of optimal alpha, r, k, q, and NSE

Return type:

dictionary

NEXT.wforecast module

Created on Thu Nov 21 11:59:00 2024

@author: dphilippus

This file contains specialized tools for weather forecast retrieval, since that can be a bit more involved than pulling daymet etc. and I don’t want to clutter up data.py.

NEXT.wforecast.download_gfs_gribs(start, basepath, time='06', until=384, res='0p25')
NEXT.wforecast.get_gfs(basin, date, varbs=['tmp2m', 'spfh2m', 'dswrfsfc', 'pratesfc'], new_names=['tmax', 'vp', 'srad', 'prcp'], operators={'prcp': <function <lambda>>, 'srad': <function <lambda>>, 'tmax': <function <lambda>>, 'vp': <function <lambda>>})

Retrieve GFS forecast for the basin and date. Summarize by day using operator. For now, since GFS is quite coarse-resolution, we just use the centroid of the watershed. Note that the GFS archive doesn’t go far back in time, so this is for true forecasts only.

Note precip rate is kg/m2/s = mm/s. Needs conversion to mm/3hr and sum.

This works on Windows, but it’s relatively unreliable.

NEXT.wforecast.get_gfs_downloaded(basin, start, basepath, var='t2m', new_name='tmax', op=<function <lambda>>, step_type='instant')

This works better than get_gfs, but it requires ecCodes and therefore won’t run on Windows.

NEXT.wforecast.get_gfs_timestep(fcst, time, lat, lon, varbs)
NEXT.wforecast.get_hrrr(date, var='TMP', operator=<function <lambda>>)

Retrieve a full forecast run as an xarray. date: YYYYMMDD, using the 06z run (which should roughly correspond to day-of in US timezones) var: TMP, etc. Variable to retrieve.

Returns an xarray dataset for full CONUS, every hour for 48 hours.

See https://mesowest.utah.edu/html/hrrr/zarr_documentation/html/zarr_HowToDownload.html

NEXT.wforecast.hrrr_areal_summary(basin, date, var, new_name, time_operator, areal_operator=<function <lambda>>)

Generate areal summary of a selected forecast, grouped by time.

NEXT.wforecast.hrrr_series(basin, dates, var, new_name, operator)
NEXT.wforecast.hrrr_watershed_clip(forecast, basin)

Clip forecast output to a watershed.

NEXT.wforecast.sphum_to_vp(q, p=100000.0)

Module contents