--- title: "Single-arm trials" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Single-arm trials} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) set.seed(3081) ``` ```{r setup, message=FALSE} library(goldilocks) ``` The other vignettes describe two-arm randomised designs. Single-arm trials -- in which every subject receives the experimental therapy and the comparator is an external benchmark -- are common in early-phase oncology, rare-disease, and proof-of-concept studies. This vignette shows how to set up a Goldilocks single-arm design with `survival_adapt()`. Two practical constraints on single-arm designs in this package: - A single-arm trial is signalled by setting `hazard_control = NULL`. - Only `method = "bayes"` is supported for single-arm trials. The frequentist tests (`logrank`, `cox`, `chisq`) require two arms and will raise an error if used in this mode. ## The decision rule In a single-arm trial there is no concurrent control, so the "treatment effect" is replaced by the cumulative event probability on the treatment arm itself: $\text{effect} \;=\; p_{\text{treatment}} \;=\; \Pr(\text{event by end\_of\_study} \mid \text{data}).$ The argument `h0` plays the role of a benchmark on this scale: a target failure probability (or, equivalently, $1 - h_0$ is a target survival probability) drawn from external evidence such as a published rate, registry, or historical cohort. With `alternative = "less"` and `prob_ha`, the trial declares success when $$\Pr(p_{\text{treatment}} < h_0 \mid \text{data}) \;>\; \texttt{prob\_ha},$$ i.e. when the posterior assigns enough mass to "the experimental therapy has a lower failure rate than the benchmark". Choosing `alternative = "greater"` reverses the direction; `alternative = "two.sided"` is not allowed for `method = "bayes"`. The same posterior is used at each interim look to compute the predictive probability of eventual success, which drives the futility (`Fn`) and expected-success (`Sn`) stopping rules. Predictive probabilities are obtained by imputing remaining follow-up from the posterior predictive distribution of the (piecewise-)exponential model and re-evaluating the success criterion on each completed dataset. ## Setting up the design Suppose the existing standard of care has a 30% event probability by 24 months, and we are testing a new agent that we hope will reduce this to 20%. We use an interim look at 50 of 80 enrolled subjects: ```{r design} end_of_study <- 24 benchmark <- 0.30 # external standard-of-care failure rate target <- 0.20 # rate we hope the new therapy achieves # Convert the target failure rate into a constant hazard (so we can simulate) ht <- prop_to_haz(probs = target, endtime = end_of_study) ht ``` Now we run `survival_adapt()`: ```{r run, cache=TRUE} out <- survival_adapt( hazard_treatment = ht, hazard_control = NULL, # single-arm cutpoints = 0, N_total = 80, lambda = 5, # enrolments per month (constant) lambda_time = 0, interim_look = 50, end_of_study = end_of_study, prior = c(0.1, 0.1), # Gamma(0.1, 0.1) on the hazard block = 2, # default; inert in single-arm mode rand_ratio = c(1, 1), # default; inert in single-arm mode prop_loss = 0.05, alternative = "less", h0 = benchmark, # benchmark failure probability Fn = 0.05, Sn = 0.95, prob_ha = 0.95, N_impute = 50, N_mcmc = 2000, method = "bayes") out ``` A few points to highlight in the output: - `N_control = 0`: no concurrent control was simulated. - `margin = 0.30`: this is the value of `h0` that the trial is testing against. Note that it is on the cumulative-failure scale, not the survival scale. - `est_final` is the posterior mean of $p_{\text{treatment}}$ at `end_of_study`, *not* a treatment effect relative to control. - `post_prob_ha` is the posterior probability that $p_{\text{treatment}} < h_0$. ## Why `block` and `rand_ratio` still appear `survival_adapt()` shares its trial-data simulator with the two-arm case. In single-arm mode the simulator skips `randomization()` entirely and assigns every subject to the treatment arm; `block` and `rand_ratio` are therefore inert and can be left at their defaults. The minimum-`interim_look` rule (`interim_look >= max(block)`) only applies to two-arm designs, so a single-arm trial can use any `interim_look` strictly less than `N_total`. ## Operating characteristics A single trial does not tell you whether the design is well-calibrated. To estimate power and type I error, we run the design under each scenario using `sim_trials()`. The chunks below are not run when knitting (each takes a few minutes) but illustrate the workflow: ```{r oc, eval=FALSE} # Power: simulate under the alternative (true rate = 0.20) out_power <- sim_trials( N_trials = 1000, hazard_treatment = ht, hazard_control = NULL, cutpoints = 0, N_total = 80, lambda = 5, lambda_time = 0, interim_look = 50, end_of_study = end_of_study, prior = c(0.1, 0.1), block = 2, rand_ratio = c(1, 1), prop_loss = 0.05, alternative = "less", h0 = benchmark, Fn = 0.05, Sn = 0.95, prob_ha = 0.95, N_impute = 50, N_mcmc = 2000, method = "bayes") # Type I error: simulate under the null (true rate = benchmark = 0.30) ht_null <- prop_to_haz(probs = benchmark, endtime = end_of_study) out_t1error <- sim_trials( N_trials = 1000, hazard_treatment = ht_null, hazard_control = NULL, cutpoints = 0, N_total = 80, lambda = 5, lambda_time = 0, interim_look = 50, end_of_study = end_of_study, prior = c(0.1, 0.1), block = 2, rand_ratio = c(1, 1), prop_loss = 0.05, alternative = "less", h0 = benchmark, Fn = 0.05, Sn = 0.95, prob_ha = 0.95, N_impute = 50, N_mcmc = 2000, method = "bayes") summarise_sims(list(out_power$sims, out_t1error$sims)) ``` Calibration proceeds the same way as for two-arm designs: if the type I error under the null (where the true rate equals the benchmark) is above the desired level, raise `prob_ha`; if power is too low, increase `N_total` or relax the `Fn`/`Sn` thresholds. ## A practical caveat on benchmarks The validity of a single-arm Goldilocks trial rests entirely on the benchmark `h0` being a fair representation of the population the trial is enrolling. Drift in standard of care, differences in patient mix, and unmeasured confounding all bias the comparison in a way that randomisation would otherwise neutralise. A Bayesian framework can incorporate uncertainty about the benchmark itself -- e.g. by replacing a fixed `h0` with a prior distribution informed by historical data -- but this is outside the scope of the simple `h0` scalar that `survival_adapt()` exposes, and would require a custom analysis. When in doubt, simulating the design under several plausible values of the true rate (including ones near the benchmark) is a useful way to characterise its sensitivity. ## See also - The "Example: Two-armed RCT" vignette covers the corresponding two-arm randomised design with a log-rank decision rule. - The "Bayesian decisions with piecewise-exponential hazards" vignette covers the same decision rule used here, but in a two-arm setting and with non-constant hazards. The piecewise machinery applies directly to single-arm trials too (just keep `hazard_control = NULL` and pass a per-interval `hazard_treatment` vector). - `?survival_adapt` documents all arguments.