Tournament - Model comparison • bdrc

This vignette explores the ways you can compare the fit of the different discharge rating curve models provided in the bdrc package. The package includes four different models to fit a discharge rating curve of different complexities. These are:

plm0() - Power-law model with constant log-error variance.
plm() - Power-law model with stage-dependent log-error variance.
gplm0() - Generalized power-law model with constant log-error variance.
gplm() - Generalized power-law model with stage-dependent log-error variance.

The ‘0’ indicates that the variance of the log-transformed errors is modeled as a constant. The plm0 model is simply a Bayesian hierarchical implementation of the classical power-law model, widely used in hydrological practice. For more details on these models, refer to Hrafnkelsson et al. (2022). For an brief overview of the underlying theory, see our Background vignette. For guidance on applying these models to your data, refer to the Introduction vignette.

The tournament method

The tournament method uses the Widely Applicable Information Criterion (WAIC) (Watanabe, 2010) to select the most appropriate model for a given dataset. The WAIC balances two factors:

Goodness-of-fit: How well the model fits the observed data.
Model complexity: A penalty term accounting for the effective number of parameters.

A lower WAIC indicates a better balance between model fit and complexity, making it the preferred model.

The tournament process occurs in two rounds:

First round:
- gplm vs gplm0
- plm vs plm0
Final round:
- Winners of the first round face off

In each comparison, we calculate the WAIC difference of the less and more complex model, $\Delta\text{WAIC} = \text{WAIC}_{\text{less}} - \text{WAIC}_{\text{more}}$ . A positive $\Delta\text{WAIC}$ suggests the more complex model is more appropriate. However, to advance to the next round, $\Delta\text{WAIC}$ must exceed a pre-specified “winning criteria” (default value = 2). This threshold helps ensure that we only choose a more complex model if it offers a substantial improvement.

The default winning criteria of 2 is based on statistical guidelines for model selection (Burnham & Anderson, 2002), balancing the trade-off between model fit and complexity. It helps prevent overfitting by requiring evidence to justify a more complex model. In the following sections, we’ll provide code examples demonstrating how to conduct the power-law tournament with your own data using the bdrc package.

Introducing the Tournament Function

To demonstrate the tournament function, we’ll use a real-world dataset from the Krokfors stream gauging station in Sweden, which is included in the bdrc package:

> library(bdrc)
> data(krokfors)
> head(krokfors)
#>          W          Q
#> 1 9.478000 10.8211700
#> 2 8.698000  1.5010000
#> 3 9.009000  3.3190000
#> 4 8.097000  0.1595700
#> 5 9.104000  4.5462500
#> 6 8.133774  0.2121178

The Krokfors dataset contains 27 observations of discharge measurements from a stream gauging station. Each observation consists of two variables:

W: Water level (stage) measured in meters above a fixed reference point.
Q: Discharge, measured in cubic meters per second (m³/s).

The Krokfors data is particularly suitable for demonstrating the tournament function as it provides a real-world example of the type of data typically used in discharge rating curve analysis. The dataset will help illustrate how different models perform across the range of observed water levels and discharges.

Running a tournament

The tournament function is straightforward to use, requiring two main arguments:

formula: Specifies the relationship between discharge and water elevation (e.g., Q ~ W).
data: A data.frame with column names corresponding to the variables specified in the formula.

It is very important that the discharge is represented in cubic meters per second (m³/s) and the water elevation in meters (m).

Let’s run our first tournament:

> set.seed(1) # set seed for reproducibility
> t_obj <- tournament(Q ~ W, krokfors, parallel = TRUE, num_cores = 2) 
#> Running tournament  [                                                ] 0%
#> 
#> Progress:
#> Initializing Metropolis MCMC algorithm...
#> Multiprocess sampling (4 chains in 2 jobs) ...
#> 
#> MCMC sampling completed!
#> 
#> Diagnostics:
#> Acceptance rate: 25.33%.
#> ✔ All chains have mixed well (Rhat < 1.1).
#> ✔ Effective sample sizes sufficient (eff_n_samples > 400).
#> 
#>  ✔  gplm finished   [============                                    ] 25%
#> 
#> Progress:
#> Initializing Metropolis MCMC algorithm...
#> Multiprocess sampling (4 chains in 2 jobs) ...
#> 
#> MCMC sampling completed!
#> 
#> Diagnostics:
#> Acceptance rate: 31.14%.
#> ✔ All chains have mixed well (Rhat < 1.1).
#> ✔ Effective sample sizes sufficient (eff_n_samples > 400).
#> 
#>  ✔  gplm0 finished  [========================                        ] 50%
#> 
#> Progress:
#> Initializing Metropolis MCMC algorithm...
#> Multiprocess sampling (4 chains in 2 jobs) ...
#> 
#> MCMC sampling completed!
#> 
#> Diagnostics:
#> Acceptance rate: 25.66%.
#> ✔ All chains have mixed well (Rhat < 1.1).
#> ✔ Effective sample sizes sufficient (eff_n_samples > 400).
#> 
#>  ✔  plm finished    [====================================            ] 75%
#> 
#> Progress:
#> Initializing Metropolis MCMC algorithm...
#> Multiprocess sampling (4 chains in 2 jobs) ...
#> 
#> MCMC sampling completed!
#> 
#> Diagnostics:
#> Acceptance rate: 36.04%.
#> ✔ All chains have mixed well (Rhat < 1.1).
#> ✔ Effective sample sizes sufficient (eff_n_samples > 400).
#> 
#>  ✔  plm0 finished   [================================================] 100%

This output provides valuable information about the MCMC sampling process for each model:

Progress:
- Shows the overall progress of the tournament as each of the four models is fitted sequentially.
- Indicates the start of the Markov Chain Monte Carlo (MCMC) process for each model.
- Provides information on how the sampling process is being distributed across your computer’s CPU cores. In this example, we see that the 4 chain sampling process is being distributed across 2 CPU cores (4 chains in 2 jobs).
Diagnostics:
- Acceptance rate: The proportion of proposed parameter values that were accepted in the MCMC sampling scheme.
- Chain mixing: A message reporting if the different MCMC chains have mixed well.
- Effective sample size: A message reporting if enough independent samples have been drawn from the posterior distribution.

If some parameters do not meet the criteria for chain mixing or effective sample size (Gelman-rubin statistic (Rhat) < 1.1 for mixing and effective number of samples (eff_n_samples) > 400 for sample size), the function will print out a warning. This helps you identify potential issues with the MCMC sampling for specific parameters.

These diagnostics help ensure the reliability of our results. Good chain mixing and sufficient effective sample sizes indicate that our MCMC sampling has likely converged to the true posterior distribution. The function can be made to run silently by setting verbose = FALSE.

Note: By default, parallel=TRUE and the function automatically detects the number of available cores. Parallel processing is used to run multiple MCMC chains simultaneously, significantly reducing computation time for complex models or large datasets.

To see the tournament winner:

> t_obj # or alternatively print(t_obj)
#> Tournament winner: gplm0

If you’ve already fitted the four models separately and stored in objects (e.g., plm0.fit, plm.fit, gplm0.fit, gplm.fit), you can run the tournament more efficiently:

> t_obj <- tournament(list(plm0.fit, plm.fit, gplm0.fit, gplm.fit))

The tournament function creates a tournament object which contains four elements: summary, contestants, winner, and info. These components are structured as follows:

summary: A data frame with detailed information on the results of the different comparisons in the power-law tournament. The contents of this data frame depend on the method used ("WAIC", "DIC", or "PMP") and include:
- Common columns for all methods:
  - round: Indicates the tournament round (1 or 2). Round 1 compares the generalized and classical power-law models separately (gplm vs gplm0 and plm vs plm0), whereas Round 2 compares the winners from Round 1.
  - comparison: A numeric identifier for each model comparison (1, 2, or 3). There are two comparisons in Round 1 and one in Round 2.
  - complexity: Indicates whether a model is the “more” or “less” complex model in a given comparison. The “less” complex model is given the advantage defined by the winning_criteria.
  - model: The model type (plm0, plm, gplm0, or gplm).
  - winner: A logical value (TRUE/FALSE) indicating whether the model was selected as the winner.
- Method-specific columns:
  - For WAIC (Widely Applicable Information Criterion):
    - lppd: Log-pointwise predictive density, measuring how well the model predicts the observed data
    - eff_num_param: Effective number of parameters, estimating model complexity
    - WAIC: The computed WAIC value, balancing model fit and complexity
    - SE_WAIC: Standard error of the WAIC estimate
    - Delta_WAIC: Difference in WAIC between compared models
    - SE_Delta_WAIC: Standard error of the difference in WAIC
  - For DIC (Deviance Information Criterion):
    - D_hat: Minus two times the log-likelihood evaluated at the median of the posterior samples
    - eff_num_param: Effective number of parameters, estimating model complexity
    - DIC: The computed DIC value, balancing model fit and complexity
    - Delta_DIC: Difference in DIC between compared models
  - For PMP (Posterior Model Probability):
    - log_marg_lik: Logarithm of the marginal likelihood, estimated using the harmonic-mean estimator
    - PMP: Posterior model probability computed with Bayes factor, indicating the probability of each model given the data
contestants: A list containing all of the fitted model objects compared in the tournament: plm0, plm, gplm0 and gplm.
winner: The model object of the tournament winner, representing the most appropriate model given the data.
info: A list containing specific information about the tournament:
- The overall winner
- The method used for comparison ("WAIC", "DIC", or "PMP")
- The winning criteria applied

This structure allows users to easily access both the final results (winner) and the detailed process (summary) of the model comparison, as well as the full model objects for further analysis if needed.

The method-specific columns provide insights into the model selection process: * For WAIC, lower values indicate better models, with Delta_WAIC showing the magnitude of difference between models. * For DIC, like WAIC, lower values are preferred, with Delta_DIC showing the difference between models. * For PMP, higher probabilities indicate stronger evidence for a model, with values closer to 1 suggesting stronger support.

These metrics help users understand not just which model was selected, but also the strength of evidence for that selection and the relative performance of all compared models.

For a comprehensive summary of the tournament results:

> summary(t_obj)
#> 
#> === Tournament Model Comparison Summary ===
#> 
#> Method: WAIC 
#> Winning Criteria: Delta_WAIC > 2 
#> Overall Winner: gplm0 
#> 
#> Comparison 1 Results:
#> -------------------------------------------------------------------------------------------------- 
#> complexity   model  winner lppd     eff_num_param  WAIC       SE_WAIC    Delta_WAIC SE_Delta_WAIC
#> more         gplm          20.7794  6.8706         -27.8176   11.8918    0.5570     0.2416      
#> less         gplm0  <---   20.3710  6.7406         -27.2606   12.0360                           
#> 
#> Comparison 2 Results:
#> -------------------------------------------------------------------------------------------------- 
#> complexity   model  winner lppd     eff_num_param  WAIC       SE_WAIC    Delta_WAIC SE_Delta_WAIC
#> more         plm           5.5842   4.2574         -2.6536    6.6635     -0.4066    0.1904      
#> less         plm0   <---   5.6284   4.0984         -3.0601    6.6931                            
#> 
#> Comparison 3 Results:
#> -------------------------------------------------------------------------------------------------- 
#> complexity   model  winner lppd     eff_num_param  WAIC       SE_WAIC    Delta_WAIC SE_Delta_WAIC
#> more         gplm0  <---   20.3710  6.7406         -27.2606   12.0360    24.2005    9.1834      
#> less         plm0          5.6284   4.0984         -3.0601    6.6931                            
#> 
#> === End of Summary ===

The first round of the tournament favors gplm0 over gplm, and plm0 over plm, based on their respective $\Delta\text{WAIC}$ values. In the final round, gplm0 emerges as the overall winner, with a substantial positive $\Delta\text{WAIC}$ of ~24 when compared to plm0. This indicates that gplm0 provides the best balance between model complexity and fit to the data.

The standard error of the estimated WAIC difference, denoted as SE_Delta_WAIC, is rather large at ~9. This suggests some uncertainty in the magnitude of this difference. However, it’s important to note that despite the small sample size, the SE_Delta_WAIC is less than half the $\Delta\text{WAIC}$ estimate. This provides some confidence in the preference for gplm0, as the difference in WAIC is more than two standard errors away from zero. Lets further investigate this comparison.

Visualizing the model comparisons

To gain deeper insights into the model comparison results, we can use various visualization tools:

1. Tournament results diagram:

> plot(t_obj) #default plot type is type='tournament_results'

This diagram illustrates the tournament progression from the initial four models to the final winner. It shows each model’s residual plot at the bottom, with corresponding WAIC values above. Green WAIC values indicate the selected model for each comparison.

2. Log-likelihood comparison:

> plot(t_obj, type = 'boxplot')

This boxplot compares the posterior log-likelihood values for each model, with the corresponding WAIC values shown as red diamonds. The log-likelihood values are presented on the deviance scale (-2 * log-likelihood) for a direct comparison with WAIC. Lower values indicate better fit.

3. Rating curve comparison:

> plot(t_obj, type = 'rating_curve')

This plot compares the estimated rating curves from all four models. It shows water level (stage) vs. discharge, with solid lines representing the estimated curves and dotted lines showing 95% prediction intervals. This visualization helps assess model fit across different water levels. Optimal model fit is indicated by the solid line following the mean of observations and 95% of observations falling within the prediction interval.

4. Residual plot comparison:

> plot(t_obj, type = 'residuals')

The residual plots also help assess how well each model fits the data by showing differences between observed and predicted values across water levels. This can reveal patterns in model performance and highlight any systematic under- or over-predictions. Optimal model fit is indicated by the rating curve mean (solid horizontal line) following the mean of the observations, and 95% of observations falling within the 95% prediction interval (dotted lines).

5. Model Component Comparisons:

The differences between the four models lie in the treatment of the power-law exponent, $f(h)$ , and the log-error standard deviation, $\sigma^2_{\varepsilon}(h)$ , as either constants or stage-dependent functions. Therefore, it is insightful to examine the posterior distributions of these elements across the different models to determine whether stage dependence exists or if the assumption of constancy is appropriate.

Power-law exponent comparison:

> plot(t_obj, type = 'f')

The plots show that the exponent is clearly not constant over the water elevations. This observation supports the use of the generalized power-law models (gplm and gplm0) over the classical power-law models (plm and plm0), as they allow for this non-constant behavior of the power-law exponent.

Log-error standard deviation comparison:

> plot(t_obj, type = 'sigma_eps')

In contrast to the exponent, the log-error standard deviation does not show a clear non-constant trend over the water elevations. This observation suggests that models with constant error variance (plm0 and gplm0) might be sufficient for this dataset, aligning with the tournament results that selected gplm0.

6. Winning model panel:

Finally, the panel option is useful to gain insight into all different model components of the winning model, which in this case is gplm0:

> plot(t_obj, type = 'panel', transformed = TRUE)

This panel plot provides a comprehensive view of the winning model’s components, including the rating curve, residuals, power-law exponent, and the log-error standard deviation. This visualization is particularly useful for understanding how the different elements of the winning model work together and how they relate to the observed data. The transformed = TRUE argument transforms the rating curve and observations onto the modeling scale (log-scale).

Customizing tournaments

There are several ways to customize the tournament further:

1. Fixing the `c` parameters:

If the parameter of zero discharge $c$ is known, you might want to fix that parameter to the known value in the model. Assume $7.65$ m is the known value of $c$ . Then you can directly run a tournament with the $c$ parameter fixed in all the models:

> t_obj_known_c <- tournament(formula = Q ~ W , data = krokfors, c_param = 7.65)

2. Adjusting the winning criteria:

You can change the winning criteria (default value = 2) which sets the threshold that the more complex model in each model comparison must exceed. For example, increasing the value to winning_criteria = 5 raises the threshold, thus favoring model simplicity more. To re-evaluate a previously run tournament using a different winning criteria, you can simply the summary function:

> summary(t_obj, winning_criteria = 5)
#> 
#> === Tournament Model Comparison Summary ===
#> 
#> Method: WAIC 
#> Winning Criteria: Delta_WAIC > 5 
#> Overall Winner: gplm0 
#> 
#> Comparison 1 Results:
#> -------------------------------------------------------------------------------------------------- 
#> complexity   model  winner lppd     eff_num_param  WAIC       SE_WAIC    Delta_WAIC SE_Delta_WAIC
#> more         gplm          20.7794  6.8706         -27.8176   11.8918    0.5570     0.2416      
#> less         gplm0  <---   20.3710  6.7406         -27.2606   12.0360                           
#> 
#> Comparison 2 Results:
#> -------------------------------------------------------------------------------------------------- 
#> complexity   model  winner lppd     eff_num_param  WAIC       SE_WAIC    Delta_WAIC SE_Delta_WAIC
#> more         plm           5.5842   4.2574         -2.6536    6.6635     -0.4066    0.1904      
#> less         plm0   <---   5.6284   4.0984         -3.0601    6.6931                            
#> 
#> Comparison 3 Results:
#> -------------------------------------------------------------------------------------------------- 
#> complexity   model  winner lppd     eff_num_param  WAIC       SE_WAIC    Delta_WAIC SE_Delta_WAIC
#> more         gplm0  <---   20.3710  6.7406         -27.2606   12.0360    24.2005    9.1834      
#> less         plm0          5.6284   4.0984         -3.0601    6.6931                            
#> 
#> === End of Summary ===

For the method "WAIC", you can also specify a more complex winning criteria as a string expression:

> summary(t_obj, winning_criteria = "Delta_WAIC > 2 & Delta_WAIC - SE_Delta_WAIC > 0")
#> 
#> === Tournament Model Comparison Summary ===
#> 
#> Method: WAIC 
#> Winning Criteria: Delta_WAIC > 2 & Delta_WAIC - SE_Delta_WAIC > 0 
#> Overall Winner: gplm0 
#> 
#> Comparison 1 Results:
#> -------------------------------------------------------------------------------------------------- 
#> complexity   model  winner lppd     eff_num_param  WAIC       SE_WAIC    Delta_WAIC SE_Delta_WAIC
#> more         gplm          20.7794  6.8706         -27.8176   11.8918    0.5570     0.2416      
#> less         gplm0  <---   20.3710  6.7406         -27.2606   12.0360                           
#> 
#> Comparison 2 Results:
#> -------------------------------------------------------------------------------------------------- 
#> complexity   model  winner lppd     eff_num_param  WAIC       SE_WAIC    Delta_WAIC SE_Delta_WAIC
#> more         plm           5.5842   4.2574         -2.6536    6.6635     -0.4066    0.1904      
#> less         plm0   <---   5.6284   4.0984         -3.0601    6.6931                            
#> 
#> Comparison 3 Results:
#> -------------------------------------------------------------------------------------------------- 
#> complexity   model  winner lppd     eff_num_param  WAIC       SE_WAIC    Delta_WAIC SE_Delta_WAIC
#> more         gplm0  <---   20.3710  6.7406         -27.2606   12.0360    24.2005    9.1834      
#> less         plm0          5.6284   4.0984         -3.0601    6.6931                            
#> 
#> === End of Summary ===

This criteria requires that the difference in WAIC is greater than 2 and that this difference minus its standard error is greater than 0, providing a more conservative approach to model selection.

Note: Only Delta_WAIC and SE_Delta_WAIC are allowed as variables in the winning_criteria expressions.

3. Changing the model comparison method:

The default method is "WAIC", but you can also use "DIC" or "PMP":

"WAIC" (default):

WAIC (Watanabe, 2010) is a fully Bayesian method that uses the full set of posterior draws to calculate the best possible estimate of the expected log pointwise predictive density.
"DIC":

DIC (Spiegelhalter, 2002) is similar to WAIC but uses a point estimate of the posterior distribution instead of the full set of posterior draws.

> summary(t_obj, method = "DIC")
#> 
#> === Tournament Model Comparison Summary ===
#> 
#> Method: DIC 
#> Winning Criteria: Delta_DIC > 2 
#> Overall Winner: gplm0 
#> 
#> Comparison 1 Results:
#> -------------------------------------------------------------------------- 
#> complexity   model  winner D_hat      eff_num_param  DIC        Delta_DIC 
#> more         gplm          -42.7732   6.0631         -30.6470   0.8498    
#> less         gplm0  <---   -42.4880   6.3454         -29.7972             
#> 
#> Comparison 2 Results:
#> -------------------------------------------------------------------------- 
#> complexity   model  winner D_hat      eff_num_param  DIC        Delta_DIC 
#> more         plm           -11.3400   2.9739         -5.3923    -0.3058   
#> less         plm0   <---   -11.4712   2.8865         -5.6982              
#> 
#> Comparison 3 Results:
#> -------------------------------------------------------------------------- 
#> complexity   model  winner D_hat      eff_num_param  DIC        Delta_DIC 
#> more         gplm0  <---   -42.4880   6.3454         -29.7972   24.0991   
#> less         plm0          -11.4712   2.8865         -5.6982              
#> 
#> === End of Summary ===

"PMP" (posterior model probabilities):

The posterior model probabilities are computed with Bayes factors (Jeffreys, 1961; Kass and Raftery, 1995), assuming all models are equally likely a priori. The winning criteria for method "PMP" should be between 0 and 1, representing the threshold for the posterior probability of the more complex model. The default is 0.75, which should give similar results to the other two methods with their respective default values of 2. The method "PMP" is not chosen as the default method because the Bayes factor calculations can be quite unstable. Let’s now use this method, but raise the winning criteria from 0.75 to 0.9

> summary(t_obj, method = "PMP", winning_criteria = 0.9)
#> 
#> === Tournament Model Comparison Summary ===
#> 
#> Method: PMP 
#> Winning Criteria: PMP of more complex model > 0.9 
#> Overall Winner: gplm0 
#> 
#> Comparison 1 Results:
#> ------------------------------------------------------- 
#> complexity   model  winner log_marg_lik         PMP            
#> more         gplm          10.1504              0.4538         
#> less         gplm0  <---   10.3357              0.5462         
#> 
#> Comparison 2 Results:
#> ------------------------------------------------------- 
#> complexity   model  winner log_marg_lik         PMP            
#> more         plm    <---   0.9781               0.9260         
#> less         plm0          -1.5486              0.0740         
#> 
#> Comparison 3 Results:
#> ------------------------------------------------------- 
#> complexity   model  winner log_marg_lik         PMP            
#> more         gplm0  <---   10.3357              0.9999         
#> less         plm           0.9781               0.0001         
#> 
#> === End of Summary ===
#> 
#> ⚠ Warning: The Harmonic Mean Estimator (HME) is used to estimate the Bayes Factor for the posterior model probability (PMP), which is known to be unstable and potentially unreliable. We recommend using method "WAIC" (Widely Applicable Information Criterion) for model comparison instead.

Note that when using the summary function to change the method or winning criteria, the tournament is re-run with the new parameters using the original contestant models. This allows for efficient comparison of different methods or criteria without having to refit the models.

These examples demonstrate how different methods and winning criteria can be applied to model comparison. While it’s valuable to explore these options, we generally recommend using the WAIC method, as it provides a robust and theoretically sound approach to model selection. WAIC uses the full posterior distribution and is particularly well-suited for hierarchical models.

In the case of the Krokfors dataset, we see that the results of the tournament do not change despite these customizations, and the winner of the third and final comparison is still gplm0. This consistency across different methods and criteria suggests that gplm0 is indeed the most appropriate model for this particular dataset.

References

Burnham, K. P. and Anderson, D. R. (eds) (2002). Information and Likelihood Theory: A Basis for Model Selection and Inference. New York, NY: Springer New York. 49–97, doi: https://doi.org/10.1007/978-0-387-22456-5_2

Hrafnkelsson, B., Sigurdarson, H., Rögnvaldsson, S., Jansson, A. Ö., Vias, R. D., and Gardarsson, S. M. (2022). Generalization of the power-law rating curve using hydrodynamic theory and Bayesian hierarchical modeling, Environmetrics, 33(2):e2711. doi: https://doi.org/10.1002/env.2711

Jeffreys, H. (1961). Theory of Probability, Third Edition. Oxford University Press.

Kass, R., and A. Raftery, A. (1995). Bayes Factors. Journal of the American Statistical Association, 90, 773-795. doi: https://doi.org/10.1080/01621459.1995.10476572

Spiegelhalter, D., Best, N., Carlin, B., Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(4), 583–639. doi: https://doi.org/10.1111/1467-9868.00353

Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571–3594.

Tournament - Model comparison

Solvi Rognvaldsson, Rafael Daniel Vias, Birgir Hrafnkelsson and Axel Orn Jansson