TempEst2

TempEst 2/SCHEMA is “TEMPerature ESTimation, version 2, using Seasonal Conditions Historical Estimation with Modeled daily Anomaly”. It estimates stream water temperature at a point, for streams of any size, using a data-driven model based on satellite remote sensing data.

This repository contains the model source code (model.R), Google Earth Engine data retrieval script (eeretrieval.py) and example data collection points (datapts.py), and R Notebooks demonstrating model usage (demo.Rmd) and a large set of model performance tests (validation.Rmd), used to make the performance claims reproducible. In addition, the HydroShare data (see below) includes knitted output of both Notebooks (demo.pdf, validation.pdf) and the full training/testing dataset (AllData.csv, ExtData.csv including maximum temperatures).

In addition to the GEE data retrieval script, there is a utility script in Python, points_above.py, for generating lists of coordinates in the correct format, evenly-spaced above given target site(s) - this is useful for quickly generating study points over rivers of interest.

Quick Guide

Dependencies in R: tidyverse, fields. Additional dependencies for Google Earth Engine data retrieval: Python with the GEE API configured.

Using a Trained Model

A pre-trained model is available in Releases as model.rda. load("model.rda") will load a function called model. The argument to model is a data frame with columns:

  • id (site ID; character)

  • lon (longitude; decimal degrees East)

  • lat (latitude; decimal degrees North)

  • day (Julian day; integer)

  • date (as an R Date)

  • elevation (mean in a 500-m radius about point of interest; meters)

  • water, shrubland, grassland, barren (land cover, abundance in a 500-m radius about point of interest; unitless, m2/m2)

  • lst (land surface temperature, daily daytime mean in a 500-m radius about point of interest; Celsius)

  • humidity (specific humidity, daily mean in a 500-m radius about point of interest; unitless, kg/kg)

Using this data frame as the argument to model, the function will return the same data frame with three or five new columns:

  • temp.mod (modeled daily mean temperature; Celsius)

  • temp.doy (modeled day-of-year mean temperature; Celsius)

  • temp.anom (modeled temperature anomaly relative to day-of-year mean; Celsius)

  • temp.plus (modeled maximum temperature relative to day-of-year mean; Celsius)

  • temp.max (modeled maximum daily temperature; Celsius)

temp.mod = temp.doy + temp.anom.

These three (five) columns are the final output, and can be used to assess actual estimated temperature (temp.mod); typical seasonal conditions (temp.doy), such as to assess general long-term behavior; and departure from typical seasonal conditions (temp.anom), such as to assess heat waves or other extremes.

Retrieving Prediction Data

Suitable prediction data can be retrieved in any way, but a script for retrieving the data from Google Earth Engine is included. In eeretrieval.py, the function getAllTimeseries will retrieve data for specified points and dates to Google Drive. Points are specified according to the example format in datapts.py: a list of [[[longitude, latitude], "point id"], ...].

Generating Evenly-Spaced Points

You’ll need a list of points of interest (USGS gage IDs, COMIDs, or coordinates - all in the same format), your desired maximum number of points per river, and a corresponding list of base IDs (and can optionally specify the resolution; the default is 1,000).

ID format:

  • USGS gage: "USGS-<ID>" with site_type="usgs"

  • COMID: just the COMID string with site_type="comid"

  • Coordinates: (lon, lat) in decimal degrees with site_type="coordinates"

Then, just run points_above.points_above_all([<site ids>], <site type>, <max points per river>, <resolution>, [<base ids>]) to get a list in the suitable format.

Example for Clear Creek and Sagehen Creek:

points_above_all(["USGS-10343500", "USGS-06719505"], "usgs", 50, 1000, ["Sagehen", "ClearCreek"])

Training a Model

Using all default options, full.schema() returns a function for training a model. This function’s argument is a data frame similar to above, but with one additional column: temperature, the observed temperature (daily mean, Celsius) for training. full.schema()(training data) then returns a function like model above. A typical approach to model training is for id to be set to training ga(u)ge IDs, such as the USGS stream temperature gage network, so that spatial observations can readily be linked to observed temperatures. Prediction data and observations are retrieved independently of the training observations, but an example for USGS gages is included in demo.Rmd.

full.schema provides some additional options.

First, by default the model is trained with the two kriging models included, but the full.schema function itself is a generic framework for linking a seasonality (sche) and anomaly (ma) component, and these can be specified separately, e.g. to test a different approach without changing any of the infrastructure.

The third argument to full.schema is rtn.model, which is FALSE by default. If it is set to TRUE, then, instead of returning a prediction function, full.schema(rtn.model=TRUE)(training data) will return a list of model components. This is useful to examine model behaviors directly, such as extracting coefficients from the kriging models. This is shown in demo.Rmd.

If the fourth argument to full.schema, use.max, is TRUE (default FALSE), then the model will be trained to predict both mean and maximum temperature, requiring a temperature.max column in the training data.

Citation and More Information

If you use this model in your research, please cite Philippus, Corona, Schneider, Rust, and Hogue, (2025), “Satellite-Based Spatial-Statistical Modeling of Daily Stream Water Temperatures at the CONUS Scale”, Journal of Hydrology, https://doi.org/10.1016/j.jhydrol.2025.133321. This paper also contains a detailed description of model design and performance characteristics.

The seasonality component is based on the “three-sine” stream annual temperature cycle function described in: Philippus, Corona, and Hogue, (2024), “Improved annual temperature cycle function for stream seasonal thermal regimes”, JAWRA, https://doi.org/10.1111/1752-1688.13228.

Full training datasets, pre-trained models, and a knitted model validation notebook PDF are available in CUAHSI HydroShare: https://doi.org/10.4211/hs.a8b243957f7946e388d10ab206990675.