Runoff Modeling in Drainage Networks Using Probabilistic Graphs
By Emiel Verstegen
Conventional hydrological models use a deterministic approach. One could think of it like a black box, having an input, parameters, relations and an output. The parameters are calibrated by comparing the model output with observations of the system response (for example river runoff). When assessing the uncertainty in models the focus is often on the parameter uncertainty and all other variables are considered to be true. Not taking into account all sources of uncertainty, results in unreliable knowledge about the parameter uncertainty.
A method to include more sources of uncertainty is applying a probabilistic model. Using this model, all variables are described by probability distributions. In a spatially distributed model this allows for a spatial estimation of variables and their uncertainty in all model components. Due to the large amount of variables in a distributed model, the complexity of the exact solution of the probabilistic model increases. To be able to efficiently calculate an approximate solution, the probabilistic model is factorized and structured in a factor graph, a form of a probabilistic graph. A factor graph contains factors and variables, where the factors represent the relations between variables and physical knowledge while the variables represent the belief about the data. A factor graph is bipartite, which means that factors are only connected with variables and vice versa.
Information propagates through the graph using message passing. This is a process where a factor updates each connected variable, based on a function of the other connected variables. If the graph has a tree structure, message passing starts at the highest level of the tree, progressing downward. This ensures that all information reaches the root of the tree. The process is then reversed in an upward sweep of message passing, propagating all gained knowledge also upstream. When approximations are used or when the tree contains cycles, multiple iterations (downward and upward sweeps) are needed for the variables to converge to their final value. The result is a posterior distribution of every variable in every cell.
In this research, a probabilistic graph is applied on a distributed runoff accumulation model. In each cell of the model a local runoff is calculated based on the precipitation, evaporation and an unknown bias term (initiated by bias parameters), which are all represented by a Gaussian distribution. Using flow paths, derived from a Digital Elevation Model, the local runoff is accumulated into accumulated runoff. A physical positivity constraint is added to the accumulated runoff and forcing data to prevent negative values. Multiple runoff observations (Gaussian distributed) are added resulting in spatial estimations of accumulated runoff, local runoff and bias, and an updated belief about the precipitation and evaporation. The bias parameters which initiate the bias in each cell contain uncertainty as well, allowing the parameters to be updated given the data received from the model. After the solution has converged, each dataset has been updated using the model structure (physical knowledge and constraints) and the prior knowledge from all other data sets.
By looking at the spatial distribution of the bias, conclusions can be drawn about the quality of the data and the water balance as a representation of the hydrological processes. Areas with a high posterior bias either have a mismatch between forcing data and runoff observations, the water balance does not represent the reality well, or the data conflicts with the positivity constraints.
The model is applied on the Volta basins, where 3 areas are identified where the bias is higher than in other areas of the basin, mainly influenced by the positivity constraints. These regions are next to a large river, next to lake Volta and the delta area at the mouth of the river. They are characterized by a negative prior local and accumulated runoff, indicating net evaporation while there is no water available. It is very likely that not the forcing data is faulty in these areas, but that an important hydrological process has not been incorporated. Given the large water bodies in the vicinity, ground water flow from the water body to the constrained cells is most probably the important hydrological process which is missing in the model. By applying probabilistic graphs on other (more complex) hydrological models allows for a better spatial estimation of both the variable values and their uncertainty, as well as a spatial evaluation of the performance of the model structure.
Student:
Emiel Verstegen
Contact:
e.verstegen@student.tudelft.nl
Graduation committee:
- Dr.ir. G. H. W. Schoups TU Delft
- Prof. dr. ir. N. C. van de Giesen TU Delft
- Prof. dr. ir. A. W. Heemink TU Delft