```
library(rpact)
packageVersion("rpact") # version should be version 3.0.1 or later
```

# Simulating Multi-Arm Designs with a Continuous Endpoint using rpact

# Introduction

This document provides examples for simulating multi-arm multi-stage (MAMS) designs for testing means in many-to-one comparisons. For designs with multiple arms, rpact enables the simulation of designs that use the **closed combination testing principle**. For a description of the methodology please refer to Part III of the book “Group Sequential and Confirmatory Adaptive Designs in Clinical Trials” by Gernot Wassmer & Werner Brannath. Essentially, we show in this vignette how to reproduce part of the simulation results provided in the paper “On Sample Size Determination in Multi-Arm Confirmatory Adaptive Designs” by Gernot Wassmer (Journal of Biopharmaceutical Statistics, 2011).

**First, load the rpact package**

`[1] '3.4.0'`

# Sample size calculation for the adaptive multi-arm situation

rpact enables the assessment of sample sizes in multiple arms including selection of treatment arms. We will first consider the simple case of a two-stage design with O’Brien & Fleming boundaries assuming three active treatment arms which are tested against control. Let the three treatment arms be referring to three different and increasing doses, “low”, “medium”, and “high”, say. We assume that the highest dose will have response difference 10 as compared to control, and that there will be a linear dose-response relationship. The standard deviation is assumed to be \(\sigma = 15\). At interim, the treatment arm with the highest observed response as compared to placebo is selected for testing at the second stage.

One way to adjust for the multiple comparison situation is to use the Bonferroni correction for testing the intersection tests in the closed system of hypotheses. It will show that using \(\alpha/3\) instead of \(\alpha\) for the sample size calculation for the highest dose in the two-arm fixed sample size case can serve as a reasonable first guess for the sample size for the multi-arm case. That is, for \(\alpha = 0.025\) and power \(1 - \beta = 90\%\) we calculate the sample size using the commands

```
<- getSampleSizeMeans(alpha = 0.025 / 3, beta = 0.1, alternative = 10, stDev = 15)
nsFixed kable(summary(nsFixed))
```

**Sample size calculation for a continuous endpoint**

Fixed sample analysis, significance level 0.83% (one-sided). The sample size was calculated for a two-sample t-test, H0: mu(1) - mu(2) = 0, H1: effect = 10, standard deviation = 15, power 90%.

Stage | Fixed |
---|---|

Efficacy boundary (z-value scale) | 2.394 |

Number of subjects | 124.5 |

One-sided local significance level | 0.0083 |

Efficacy boundary (t) | 6.526 |

Legend:

*(t)*: treatment effect scale

**Design plan parameters and output for means**

**Design parameters**

*Critical values*: 2.394*Significance level*: 0.008333*Type II error rate*: 0.1000*Test*: one-sided

**User defined parameters**

*Alternatives*: 10*Standard deviation*: 15

**Default parameters**

*Mean ratio*: FALSE*Theta H0*: 0*Normal approximation*: FALSE*Treatment groups*: 2*Planned allocation ratio*: 1

**Sample size and output**

*Number of subjects fixed*: 124.5*Number of subjects fixed (1)*: 62.2*Number of subjects fixed (2)*: 62.2*Critical values (treatment effect scale)*: 6.526

**Legend**

*(i)*: values of treatment arm i

yielding 125 as the total number of subjects and hence n = 63 subjects per treatment arm in order to achieve the desired power. As a first guess for the multi-arm two-stage case we choose 30 per stage and treatment arm and use the following commands for evaluating the MAMS design. Note that `plannedSubjects`

refers to the **cumulative sample sizes over the stages per selected active arm**:

```
<- getDesignInverseNormal(kMax = 2, alpha = 0.025, typeOfDesign = "OF")
designIN <- 1000
maxNumberOfIterations <- getSimulationMultiArmMeans(
simBonfMAMS design = designIN,
activeArms = 3,
muMaxVector = c(10),
stDev = 15,
plannedSubjects = c(30, 60),
intersectionTest = "Bonferroni",
typeOfShape = "linear",
typeOfSelection = "best",
successCriterion = "all",
maxNumberOfIterations = maxNumberOfIterations,
seed = 1234
)kable(summary(simBonfMAMS))
```

**Simulation of a continuous endpoint (multi-arm design)**

Sequential analysis with a maximum of 2 looks (inverse normal combination test design), overall significance level 2.5% (one-sided). The results were simulated for a multi-arm comparisons for means (3 treatments vs. control), H0: mu(i) - mu(control) = 0, power directed towards larger values, H1: mu_max = 10, standard deviation = 15, planned cumulative sample size = c(30, 60), effect shape = linear, intersection test = Bonferroni, selection = best, effect measure based on effect estimate, success criterion: all, simulation runs = 1000, seed = 1234.

Stage | 1 | 2 |
---|---|---|

Fixed weight | 0.707 | 0.707 |

Efficacy boundary (z-value scale) | 2.797 | 1.977 |

Stage levels (one-sided) | 0.0026 | 0.0240 |

Reject at least one | 0.8850 | |

Rejected arms per stage | ||

Treatment arm 1 | 0.0130 | 0.0080 |

Treatment arm 2 | 0.1120 | 0.0890 |

Treatment arm 3 | 0.3030 | 0.4510 |

Success per stage | 0.0060 | 0.8790 |

Exit probability for futility | 0.0040 | |

Expected number of subjects | 179.4 | |

Overall exit probability | 0.0100 | |

Stagewise number of subjects | ||

Treatment arm 1 | 30.0 | 0.7 |

Treatment arm 2 | 30.0 | 5.6 |

Treatment arm 3 | 30.0 | 23.7 |

Control arm | 30.0 | 30.0 |

Selected arms | ||

Treatment arm 1 | 1.0000 | 0.0220 |

Treatment arm 2 | 1.0000 | 0.1850 |

Treatment arm 3 | 1.0000 | 0.7830 |

Number of active arms | 3.000 | 1.000 |

Conditional power (achieved) | 0.5785 |

Legend:

*(i)*: treatment arm i

**Simulation of multi-arm means (inverse normal combination test design)**

**Design parameters**

*Information rates*: 0.500, 1.000*Critical values*: 2.797, 1.977*Futility bounds (non-binding)*: -Inf*Cumulative alpha spending*: 0.002583, 0.025000*Local one-sided significance levels*: 0.002583, 0.023996*Significance level*: 0.0250*Test*: one-sided

**User defined parameters**

*Seed*: 1234*Standard deviation*: 15*Planned cumulative subjects*: 30, 60*mu_max*: 10*Intersection test*: Bonferroni

**Default parameters**

*Maximum number of iterations*: 1000*Planned allocation ratio*: 1*Calculate subjects function*: default*Active arms*: 3*Effect matrix (1)*: 3.333*Effect matrix (2)*: 6.667*Effect matrix (3)*: 10.000*Type of shape*: linear*Slope*: 1*Adaptations*: TRUE*Type of selection*: best*Effect measure*: effectEstimate*Success criterion*: all*Epsilon value*: NA*r value*: NA*Threshold*: -Inf

**Results**

*Iterations [1]*: 1000*Iterations [2]*: 990*Reject at least one*: 0.8850*Rejected arms per stage (1) [1]*: 0.0130*Rejected arms per stage (1) [2]*: 0.0080*Rejected arms per stage (2) [1]*: 0.1120*Rejected arms per stage (2) [2]*: 0.0890*Rejected arms per stage (3) [1]*: 0.3030*Rejected arms per stage (3) [2]*: 0.4510*Futility stop per stage*: 0.0040*Early stop*: 0.0100*Success per stage [1]*: 0.0060*Success per stage [2]*: 0.8790*Selected arms (1) [1]*: 1.0000*Selected arms (1) [2]*: 0.0220*Selected arms (2) [1]*: 1.0000*Selected arms (2) [2]*: 0.1850*Selected arms (3) [1]*: 1.0000*Selected arms (3) [2]*: 0.7830*Selected arms (4) [1]*: 1.0000*Selected arms (4) [2]*: 0.9900*Number of active arms [1]*: 3.000*Number of active arms [2]*: 1.000*Expected number of subjects*: 179.4*Sample sizes (1) [1]*: 30*Sample sizes (1) [2]*: 0.7*Sample sizes (2) [1]*: 30*Sample sizes (2) [2]*: 5.6*Sample sizes (3) [1]*: 30*Sample sizes (3) [2]*: 23.7*Sample sizes (4) [1]*: 30*Sample sizes (4) [2]*: 30*Conditional power (achieved) [1]*: NA*Conditional power (achieved) [2]*: 0.5785

**Legend**

*(i)*: values of treatment arm i*[k]*: values at stage k

We see that the power, which is the probability to reject at least one of the three corresponding hypotheses, is about 88% if a linear dose-response relationship is assumed. Note that there is a small probability to stop the trial for futility which is due to the use of the Bonferroni correction yielding adjusted \(p\)-values equal to 1 at interim (making a rejection at stage 2 impossible).

Using the Dunnett test for testing the intersection hypotheses increases the power to about 90% which is obtained by selecting `intersectionTest = "Dunnett"`

:

```
<- getSimulationMultiArmMeans(
simDunnettMAMS design = designIN,
activeArms = 3,
typeOfShape = "linear",
muMaxVector = c(10),
stDev = 15,
plannedSubjects = c(30, 60),
intersectionTest = "Dunnett",
typeOfSelection = "best",
successCriterion = "all",
maxNumberOfIterations = maxNumberOfIterations,
seed = 1234
)kable(summary(simDunnettMAMS))
```

**Simulation of a continuous endpoint (multi-arm design)**

Sequential analysis with a maximum of 2 looks (inverse normal combination test design), overall significance level 2.5% (one-sided). The results were simulated for a multi-arm comparisons for means (3 treatments vs. control), H0: mu(i) - mu(control) = 0, power directed towards larger values, H1: mu_max = 10, standard deviation = 15, planned cumulative sample size = c(30, 60), effect shape = linear, intersection test = Dunnett, selection = best, effect measure based on effect estimate, success criterion: all, simulation runs = 1000, seed = 1234.

Stage | 1 | 2 |
---|---|---|

Fixed weight | 0.707 | 0.707 |

Efficacy boundary (z-value scale) | 2.797 | 1.977 |

Stage levels (one-sided) | 0.0026 | 0.0240 |

Reject at least one | 0.8990 | |

Rejected arms per stage | ||

Treatment arm 1 | 0.0130 | 0.0080 |

Treatment arm 2 | 0.1140 | 0.0910 |

Treatment arm 3 | 0.3080 | 0.4580 |

Success per stage | 0.0060 | 0.8930 |

Expected number of subjects | 179.6 | |

Overall exit probability | 0.0060 | |

Stagewise number of subjects | ||

Treatment arm 1 | 30.0 | 0.7 |

Treatment arm 2 | 30.0 | 5.6 |

Treatment arm 3 | 30.0 | 23.7 |

Control arm | 30.0 | 30.0 |

Selected arms | ||

Treatment arm 1 | 1.0000 | 0.0220 |

Treatment arm 2 | 1.0000 | 0.1870 |

Treatment arm 3 | 1.0000 | 0.7850 |

Number of active arms | 3.000 | 1.000 |

Conditional power (achieved) | 0.5761 |

Legend:

*(i)*: treatment arm i

**Simulation of multi-arm means (inverse normal combination test design)**

**Design parameters**

*Information rates*: 0.500, 1.000*Critical values*: 2.797, 1.977*Futility bounds (non-binding)*: -Inf*Cumulative alpha spending*: 0.002583, 0.025000*Local one-sided significance levels*: 0.002583, 0.023996*Significance level*: 0.0250*Test*: one-sided

**User defined parameters**

*Seed*: 1234*Standard deviation*: 15*Planned cumulative subjects*: 30, 60*mu_max*: 10

**Default parameters**

*Maximum number of iterations*: 1000*Planned allocation ratio*: 1*Calculate subjects function*: default*Active arms*: 3*Effect matrix (1)*: 3.333*Effect matrix (2)*: 6.667*Effect matrix (3)*: 10.000*Type of shape*: linear*Slope*: 1*Intersection test*: Dunnett*Adaptations*: TRUE*Type of selection*: best*Effect measure*: effectEstimate*Success criterion*: all*Epsilon value*: NA*r value*: NA*Threshold*: -Inf

**Results**

*Iterations [1]*: 1000*Iterations [2]*: 994*Reject at least one*: 0.8990*Rejected arms per stage (1) [1]*: 0.0130*Rejected arms per stage (1) [2]*: 0.0080*Rejected arms per stage (2) [1]*: 0.1140*Rejected arms per stage (2) [2]*: 0.0910*Rejected arms per stage (3) [1]*: 0.3080*Rejected arms per stage (3) [2]*: 0.4580*Futility stop per stage*: 0.0000*Early stop*: 0.0060*Success per stage [1]*: 0.0060*Success per stage [2]*: 0.8930*Selected arms (1) [1]*: 1.0000*Selected arms (1) [2]*: 0.0220*Selected arms (2) [1]*: 1.0000*Selected arms (2) [2]*: 0.1870*Selected arms (3) [1]*: 1.0000*Selected arms (3) [2]*: 0.7850*Selected arms (4) [1]*: 1.0000*Selected arms (4) [2]*: 0.9940*Number of active arms [1]*: 3.000*Number of active arms [2]*: 1.000*Expected number of subjects*: 179.6*Sample sizes (1) [1]*: 30*Sample sizes (1) [2]*: 0.7*Sample sizes (2) [1]*: 30*Sample sizes (2) [2]*: 5.6*Sample sizes (3) [1]*: 30*Sample sizes (3) [2]*: 23.7*Sample sizes (4) [1]*: 30*Sample sizes (4) [2]*: 30*Conditional power (achieved) [1]*: NA*Conditional power (achieved) [2]*: 0.5761

**Legend**

*(i)*: values of treatment arm i*[k]*: values at stage k

Changing `successCriterion = "all"`

to `successCriterion = "atLeastOne"`

reduces the expected number of subjects considerably because the trial is stopped at interim in many more cases:

```
<- getSimulationMultiArmMeans(
simDunnettMAMSatLeastOne design = designIN,
activeArms = 3,
typeOfShape = "linear",
muMaxVector = c(10),
stDev = 15,
plannedSubjects = c(30, 60),
intersectionTest = "Dunnett",
typeOfSelection = "best",
successCriterion = "atLeastOne",
maxNumberOfIterations = maxNumberOfIterations,
seed = 1234
)kable(summary(simDunnettMAMSatLeastOne))
```

**Simulation of a continuous endpoint (multi-arm design)**

Sequential analysis with a maximum of 2 looks (inverse normal combination test design), overall significance level 2.5% (one-sided). The results were simulated for a multi-arm comparisons for means (3 treatments vs. control), H0: mu(i) - mu(control) = 0, power directed towards larger values, H1: mu_max = 10, standard deviation = 15, planned cumulative sample size = c(30, 60), effect shape = linear, intersection test = Dunnett, selection = best, effect measure based on effect estimate, success criterion: at least one, simulation runs = 1000, seed = 1234.

Stage | 1 | 2 |
---|---|---|

Fixed weight | 0.707 | 0.707 |

Efficacy boundary (z-value scale) | 2.797 | 1.977 |

Stage levels (one-sided) | 0.0026 | 0.0240 |

Reject at least one | 0.8990 | |

Rejected arms per stage | ||

Treatment arm 1 | 0.0130 | 0.0080 |

Treatment arm 2 | 0.1140 | 0.0910 |

Treatment arm 3 | 0.3080 | 0.4580 |

Success per stage | 0.3420 | 0.5570 |

Expected number of subjects | 159.5 | |

Overall exit probability | 0.3420 | |

Stagewise number of subjects | ||

Treatment arm 1 | 30.0 | 0.8 |

Treatment arm 2 | 30.0 | 6.3 |

Treatment arm 3 | 30.0 | 22.9 |

Control arm | 30.0 | 30.0 |

Selected arms | ||

Treatment arm 1 | 1.0000 | 0.0180 |

Treatment arm 2 | 1.0000 | 0.1380 |

Treatment arm 3 | 1.0000 | 0.5020 |

Number of active arms | 3.000 | 1.000 |

Conditional power (achieved) | 0.3989 |

Legend:

*(i)*: treatment arm i

**Simulation of multi-arm means (inverse normal combination test design)**

**Design parameters**

*Information rates*: 0.500, 1.000*Critical values*: 2.797, 1.977*Futility bounds (non-binding)*: -Inf*Cumulative alpha spending*: 0.002583, 0.025000*Local one-sided significance levels*: 0.002583, 0.023996*Significance level*: 0.0250*Test*: one-sided

**User defined parameters**

*Seed*: 1234*Standard deviation*: 15*Planned cumulative subjects*: 30, 60*mu_max*: 10*Success criterion*: atLeastOne

**Default parameters**

*Maximum number of iterations*: 1000*Planned allocation ratio*: 1*Calculate subjects function*: default*Active arms*: 3*Effect matrix (1)*: 3.333*Effect matrix (2)*: 6.667*Effect matrix (3)*: 10.000*Type of shape*: linear*Slope*: 1*Intersection test*: Dunnett*Adaptations*: TRUE*Type of selection*: best*Effect measure*: effectEstimate*Epsilon value*: NA*r value*: NA*Threshold*: -Inf

**Results**

*Iterations [1]*: 1000*Iterations [2]*: 658*Reject at least one*: 0.8990*Rejected arms per stage (1) [1]*: 0.0130*Rejected arms per stage (1) [2]*: 0.0080*Rejected arms per stage (2) [1]*: 0.1140*Rejected arms per stage (2) [2]*: 0.0910*Rejected arms per stage (3) [1]*: 0.3080*Rejected arms per stage (3) [2]*: 0.4580*Futility stop per stage*: 0.0000*Early stop*: 0.3420*Success per stage [1]*: 0.3420*Success per stage [2]*: 0.5570*Selected arms (1) [1]*: 1.0000*Selected arms (1) [2]*: 0.0180*Selected arms (2) [1]*: 1.0000*Selected arms (2) [2]*: 0.1380*Selected arms (3) [1]*: 1.0000*Selected arms (3) [2]*: 0.5020*Selected arms (4) [1]*: 1.0000*Selected arms (4) [2]*: 0.6580*Number of active arms [1]*: 3.000*Number of active arms [2]*: 1.000*Expected number of subjects*: 159.5*Sample sizes (1) [1]*: 30*Sample sizes (1) [2]*: 0.8*Sample sizes (2) [1]*: 30*Sample sizes (2) [2]*: 6.3*Sample sizes (3) [1]*: 30*Sample sizes (3) [2]*: 22.9*Sample sizes (4) [1]*: 30*Sample sizes (4) [2]*: 30*Conditional power (achieved) [1]*: NA*Conditional power (achieved) [2]*: 0.3989

**Legend**

*(i)*: values of treatment arm i*[k]*: values at stage k

For this example, we might conclude that choosing 30 subjects per treatment arm and stage is a reasonable choice. If, however, the effect sizes are smaller for the low and medium dose, the power might decrease and the sample size therefore should be increased. For example, assuming effect sizes of only 1 and 2 in the low and medium dose group, respectively, the test characteristics can be obtained by using the `typeOfShape = userDefined`

option. The effect sizes of interest are specified through `effectMatrix`

(which needs to be a matrix because you can also generally consider more that one parameter configuration per simulation run):

```
<- getSimulationMultiArmMeans(
simDunnettMAMS design = designIN,
activeArms = 3,
typeOfShape = "userDefined",
effectMatrix = matrix(c(1, 2, 10), ncol = 3),
stDev = 15,
plannedSubjects = c(30, 60),
intersectionTest = "Dunnett",
typeOfSelection = "best",
successCriterion = "atLeastOne",
maxNumberOfIterations = maxNumberOfIterations,
seed = 1234
)kable(summary(simDunnettMAMS))
```

**Simulation of a continuous endpoint (multi-arm design)**

Sequential analysis with a maximum of 2 looks (inverse normal combination test design), overall significance level 2.5% (one-sided). The results were simulated for a multi-arm comparisons for means (3 treatments vs. control), H0: mu(i) - mu(control) = 0, power directed towards larger values, H1: mu_max = 10, standard deviation = 15, planned cumulative sample size = c(30, 60), effect shape = user defined, intersection test = Dunnett, selection = best, effect measure based on effect estimate, success criterion: at least one, simulation runs = 1000, seed = 1234.

Stage | 1 | 2 |
---|---|---|

Fixed weight | 0.707 | 0.707 |

Efficacy boundary (z-value scale) | 2.797 | 1.977 |

Stage levels (one-sided) | 0.0026 | 0.0240 |

Reject at least one | 0.8970 | |

Rejected arms per stage | ||

Treatment arm 1 | 0 | 0.0010 |

Treatment arm 2 | 0.0050 | 0.0040 |

Treatment arm 3 | 0.3060 | 0.5860 |

Success per stage | 0.3060 | 0.5910 |

Expected number of subjects | 161.6 | |

Overall exit probability | 0.3060 | |

Stagewise number of subjects | ||

Treatment arm 1 | 30.0 | 0.3 |

Treatment arm 2 | 30.0 | 0.8 |

Treatment arm 3 | 30.0 | 29.0 |

Control arm | 30.0 | 30.0 |

Selected arms | ||

Treatment arm 1 | 1.0000 | 0.0060 |

Treatment arm 2 | 1.0000 | 0.0180 |

Treatment arm 3 | 1.0000 | 0.6700 |

Number of active arms | 3.000 | 1.000 |

Conditional power (achieved) | 0.2592 |

Legend:

*(i)*: treatment arm i

**Simulation of multi-arm means (inverse normal combination test design)**

**Design parameters**

*Information rates*: 0.500, 1.000*Critical values*: 2.797, 1.977*Futility bounds (non-binding)*: -Inf*Cumulative alpha spending*: 0.002583, 0.025000*Local one-sided significance levels*: 0.002583, 0.023996*Significance level*: 0.0250*Test*: one-sided

**User defined parameters**

*Seed*: 1234*Standard deviation*: 15*Planned cumulative subjects*: 30, 60*Effect matrix (1)*: 1*Effect matrix (2)*: 2*Effect matrix (3)*: 10*Type of shape*: userDefined*Success criterion*: atLeastOne

**Derived from user defined parameters**

*mu_max*: 10

**Default parameters**

*Maximum number of iterations*: 1000*Planned allocation ratio*: 1*Calculate subjects function*: default*Active arms*: 3*Slope*: 1*Intersection test*: Dunnett*Adaptations*: TRUE*Type of selection*: best*Effect measure*: effectEstimate*Epsilon value*: NA*r value*: NA*Threshold*: -Inf

**Results**

*Iterations [1]*: 1000*Iterations [2]*: 694*Reject at least one*: 0.8970*Rejected arms per stage (1) [1]*: 0.0000*Rejected arms per stage (1) [2]*: 0.0010*Rejected arms per stage (2) [1]*: 0.0050*Rejected arms per stage (2) [2]*: 0.0040*Rejected arms per stage (3) [1]*: 0.3060*Rejected arms per stage (3) [2]*: 0.5860*Futility stop per stage*: 0.0000*Early stop*: 0.3060*Success per stage [1]*: 0.3060*Success per stage [2]*: 0.5910*Selected arms (1) [1]*: 1.0000*Selected arms (1) [2]*: 0.0060*Selected arms (2) [1]*: 1.0000*Selected arms (2) [2]*: 0.0180*Selected arms (3) [1]*: 1.0000*Selected arms (3) [2]*: 0.6700*Selected arms (4) [1]*: 1.0000*Selected arms (4) [2]*: 0.6940*Number of active arms [1]*: 3.000*Number of active arms [2]*: 1.000*Expected number of subjects*: 161.6*Sample sizes (1) [1]*: 30*Sample sizes (1) [2]*: 0.3*Sample sizes (2) [1]*: 30*Sample sizes (2) [2]*: 0.8*Sample sizes (3) [1]*: 30*Sample sizes (3) [2]*: 29*Sample sizes (4) [1]*: 30*Sample sizes (4) [2]*: 30*Conditional power (achieved) [1]*: NA*Conditional power (achieved) [2]*: 0.2592

**Legend**

*(i)*: values of treatment arm i*[k]*: values at stage k

It is interesting (though actually clear) that the power and other test characteristics are quite similar and so the validity of the chosen sample size might be considered as robust against possible deviations for the originally assumed linear dose-response relationship. Note that ’Stagewise number of subjects` denote the **conditional expected sample size in treatment arm i** and so account for the fact that a treatment arm is selected given the fact that the second stage was reached.

# Choosing the weight

Since treatment arms are discontinued over the two stages, the sample size and hence the information over the stages is not the same. Despite of this, we used the **unweighted inverse normal method** (with weight = \(1/\sqrt{2}\)) for combining the two stages. The pre-fixed weight, however, does not have a substantial impact on the power of the procedure which is shown in the following plot. The simulated power values show that in a medium range of the weights the power does not change substantially and hence it is reasonable to choose equal weights for the two stages:

```
<- c()
powerValues <- seq(0.05, 0.95, 0.05)
weights
for (w in weights) {
<- getDesignInverseNormal(
designIN kMax = 2, alpha = 0.025,
informationRates = c(w, 1), typeOfDesign = "OF"
)<- c(
powerValues
powerValues,getSimulationMultiArmMeans(
design = designIN,
activeArms = 3,
typeOfShape = "linear",
muMaxVector = c(10),
stDev = 15,
plannedSubjects = c(30, 60),
intersectionTest = "Dunnett",
typeOfSelection = "best",
successCriterion = "atLeastOne",
maxNumberOfIterations = maxNumberOfIterations,
seed = 12345
$rejectAtLeastOne
)
)
}
plot(weights, powerValues,
type = "l", lwd = 3, ylim = c(0.7, 1),
xlab = "weight", ylab = "Power"
)lines(weights, rep(0.9, length(weights)), lty = 2)
```

# Assessing different selection rules

You might assess different selection rules by using the parameter `typeOfSelection`

. Five options are available: `best`

, `rBest`

, `epsilon`

, `all`

, and `userDefined`

. For `rbest`

(select the r best treatment arms), the parameter `rValue`

has to be specified, for `epsilon`

(select treatment arm not worse than epsilon compared to the best), the parameter `epsilonValue`

has to be specified.

## User defined selection rule

If `userDefined`

is selected, `selectArmsFunction`

needs to be specified that depends on `effectVector`

. Note that `effectVector`

is either the test statistic or the effect difference (in absolute terms) which can be selected through the parameter `effectMeasure`

. For example, using the function

```
<- getDesignInverseNormal(kMax = 2, alpha = 0.025, typeOfDesign = "OF")
designIN <- function(effectVector) {
mySelectionFunction <- (effectVector >= c(5, 5, 5))
selectedArms return(selectedArms)
}
```

defines a selection rule where all treatment arms with effect sizes exceeding 5 (with the default `effectMeasure = effectEstimate`

) are selected. Running

```
<- getSimulationMultiArmMeans(
simSelectionMAMS design = designIN,
activeArms = 3,
typeOfShape = "linear",
muMaxVector = c(10),
stDev = 15,
plannedSubjects = c(30, 60),
intersectionTest = "Dunnett",
typeOfSelection = "userDefined",
selectArmsFunction = mySelectionFunction,
successCriterion = "atLeastOne",
maxNumberOfIterations = maxNumberOfIterations,
seed = 1234
)kable(summary(simSelectionMAMS))
```

shows that for the second stage the expected number of selected treatment arms is 1.803 indicating that there are cases where more than one arm is selected for the second stage:

**Simulation of multi-arm means (inverse normal combination test design)**

**Design parameters**

*Information rates*: 0.500, 1.000*Critical values*: 2.797, 1.977*Futility bounds (non-binding)*: -Inf*Cumulative alpha spending*: 0.002583, 0.025000*Local one-sided significance levels*: 0.002583, 0.023996*Significance level*: 0.0250*Test*: one-sided

**User defined parameters**

*Seed*: 1234*Standard deviation*: 15*Planned cumulative subjects*: 30, 60*mu_max*: 10*Type of selection*: userDefined*Success criterion*: atLeastOne

**Default parameters**

*Maximum number of iterations*: 1000*Planned allocation ratio*: 1*Calculate subjects function*: default*Active arms*: 3*Effect matrix (1)*: 3.333*Effect matrix (2)*: 6.667*Effect matrix (3)*: 10.000*Type of shape*: linear*Slope*: 1*Intersection test*: Dunnett*Adaptations*: TRUE*Effect measure*: effectEstimate

**Results**

*Iterations [1]*: 1000*Iterations [2]*: 604*Reject at least one*: 0.8840*Rejected arms per stage (1) [1]*: 0.0140*Rejected arms per stage (1) [2]*: 0.0470*Rejected arms per stage (2) [1]*: 0.0870*Rejected arms per stage (2) [2]*: 0.2640*Rejected arms per stage (3) [1]*: 0.3050*Rejected arms per stage (3) [2]*: 0.5330*Futility stop per stage*: 0.0700*Early stop*: 0.3960*Success per stage [1]*: 0.3260*Success per stage [2]*: 0.5580*Selected arms (1) [1]*: 1.0000*Selected arms (1) [2]*: 0.1360*Selected arms (2) [1]*: 1.0000*Selected arms (2) [2]*: 0.3840*Selected arms (3) [1]*: 1.0000*Selected arms (3) [2]*: 0.5690*Selected arms (4) [1]*: 1.0000*Selected arms (4) [2]*: 0.6040*Number of active arms [1]*: 3.000*Number of active arms [2]*: 1.803*Expected number of subjects*: 170.8*Sample sizes (1) [1]*: 30*Sample sizes (1) [2]*: 6.8*Sample sizes (2) [1]*: 30*Sample sizes (2) [2]*: 19.1*Sample sizes (3) [1]*: 30*Sample sizes (3) [2]*: 28.3*Sample sizes (4) [1]*: 30*Sample sizes (4) [2]*: 30*Conditional power (achieved) [1]*: NA*Conditional power (achieved) [2]*: 0.4021

**Legend**

*(i)*: values of treatment arm i*[k]*: values at stage k

Using `getData()`

enables to show how often this is the case. The following code lines calculate how often 1, 2, and 3 treatment arms were selected for the second stage:

```
<- getData(simSelectionMAMS)
dat <- as.matrix(table(dat[dat$stageNumber == 2, ]$iterationNumber))
tab round(table(tab[, 1]) / nrow(tab), 5)
```

` 1 2 3 `

0.36258 0.47185 0.16556

Note that these probabilities are **conditional probabilities** (conditional on performing the second stage) and sum to one whereas the probabilities for selecting arm 1, 2, or 3 provided in the summary are **unconditional**, i.e., not conditioned on reaching the second stage. In particular, they may become small if the study often stops at interim.

## epsilon selection rule

We now consider a three-stage inverse normal combination test design where no early stops for efficacy are foreseen. At the end, the full significance level of \(\alpha = 0.025\) should be used. This is achieved by the definition of the design through

```
<- getDesignInverseNormal(
designIN3Stages typeOfDesign = "asUser",
userAlphaSpending = c(0, 0, 0.025)
)
```

`Changed type of design to 'noEarlyEfficacy'`

As above, we plan a design with three active treatment arms to be tested against control and assume a linear dose-response relationship. We want to consider a range of maximum values for the effect and therefore specify `muMaxVector = seq(0,12,3)`

, i.e., including the null hypothesis case. For the selection of treatment arms, we define the epsilon selection rule with \(\epsilon = 2\), i.e., for the subsequent stages, the treatment arm with the highest response and all treatment arms that differ less than 2 with the highest response are selected. To exclude treatment arms with non-positive response, we additionally specify `threshold = 0`

. In order to simulate a situation with a maximum of 60 subject per treatment arm we set `plannedSubjects = c(20, 40, 60)`

:

```
<- getSimulationMultiArmMeans(
simSelectionEpsilonMAMS design = designIN3Stages,
activeArms = 3,
typeOfShape = "linear",
muMaxVector = seq(0, 12, 2),
stDev = 15,
plannedSubjects = c(20, 40, 60),
intersectionTest = "Dunnett",
typeOfSelection = "epsilon",
epsilonValue = 2,
threshold = 0,
successCriterion = "atLeastOne",
maxNumberOfIterations = maxNumberOfIterations,
seed = 1234
)
options("rpact.summary.output.size" = "medium")
# kable(summary(simSelectionEpsilonMAMS))
kable(simSelectionEpsilonMAMS)
```

**Simulation of multi-arm means (inverse normal combination test design)**

**Design parameters**

*Information rates*: 0.333, 0.667, 1.000*Critical values*: Inf, Inf, 1.96*Futility bounds (non-binding)*: -Inf, -Inf*Cumulative alpha spending*: 0.0000, 0.0000, 0.0250*Local one-sided significance levels*: 0.0000, 0.0000, 0.0250*Significance level*: 0.0250*Test*: one-sided

**User defined parameters**

*Seed*: 1234*Standard deviation*: 15*Planned cumulative subjects*: 20, 40, 60*mu_max*: 0, 2, 4, 6, 8, 10, 12*Type of selection*: epsilon*Success criterion*: atLeastOne*Epsilon value*: 2*Threshold*: 0

**Default parameters**

*Maximum number of iterations*: 1000*Planned allocation ratio*: 1*Calculate subjects function*: default*Active arms*: 3*Effect matrix (1)*: 0.0000, 0.6667, 1.3333, 2.0000, 2.6667, 3.3333, 4.0000*Effect matrix (2)*: 0.0000, 1.3333, 2.6667, 4.0000, 5.3333, 6.6667, 8.0000*Effect matrix (3)*: 0.0000, 2.0000, 4.0000, 6.0000, 8.0000, 10.0000, 12.0000*Type of shape*: linear*Slope*: 1*Intersection test*: Dunnett*Adaptations*: TRUE, TRUE*Effect measure*: effectEstimate*r value*: NA

**Results**

*Iterations [1]*: 1000, 1000, 1000, 1000, 1000, 1000, 1000*Iterations [2]*: 759, 835, 925, 947, 979, 994, 996*Iterations [3]*: 645, 765, 892, 934, 976, 992, 996*Reject at least one*: 0.0230, 0.0790, 0.2160, 0.4750, 0.7200, 0.8990, 0.9680*Rejected arms per stage (1) [1]*: 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000*Rejected arms per stage (1) [2]*: 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000*Rejected arms per stage (1) [3]*: 0.0070, 0.0120, 0.0230, 0.0280, 0.0260, 0.0160, 0.0070*Rejected arms per stage (2) [1]*: 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000*Rejected arms per stage (2) [2]*: 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000*Rejected arms per stage (2) [3]*: 0.0060, 0.0170, 0.0680, 0.1020, 0.1620, 0.1730, 0.1920*Rejected arms per stage (3) [1]*: 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000*Rejected arms per stage (3) [2]*: 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000*Rejected arms per stage (3) [3]*: 0.0110, 0.0510, 0.1450, 0.3980, 0.6210, 0.7920, 0.8560*Overall futility stop*: 0.3550, 0.2350, 0.1080, 0.0660, 0.0240, 0.0080, 0.0040*Futility stop per stage [1]*: 0.2410, 0.1650, 0.0750, 0.0530, 0.0210, 0.0060, 0.0040*Futility stop per stage [2]*: 0.1140, 0.0700, 0.0330, 0.0130, 0.0030, 0.0020, 0.0000*Early stop [1]*: 0.2410, 0.1650, 0.0750, 0.0530, 0.0210, 0.0060, 0.0040*Early stop [2]*: 0.1140, 0.0700, 0.0330, 0.0130, 0.0030, 0.0020, 0.0000*Success per stage [1]*: 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000*Success per stage [2]*: 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000*Success per stage [3]*: 0.0230, 0.0790, 0.2160, 0.4750, 0.7200, 0.8990, 0.9680*Selected arms (1) [1]*: 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000*Selected arms (1) [2]*: 0.3500, 0.3280, 0.2980, 0.2300, 0.1590, 0.1000, 0.0630*Selected arms (1) [3]*: 0.2450, 0.2410, 0.2060, 0.1380, 0.0730, 0.0410, 0.0160*Selected arms (2) [1]*: 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000*Selected arms (2) [2]*: 0.3480, 0.3930, 0.4280, 0.4060, 0.4020, 0.3480, 0.3370*Selected arms (2) [3]*: 0.2600, 0.2920, 0.3440, 0.3160, 0.3080, 0.2260, 0.2160*Selected arms (3) [1]*: 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000*Selected arms (3) [2]*: 0.3630, 0.4770, 0.6060, 0.7030, 0.7840, 0.8530, 0.8830*Selected arms (3) [3]*: 0.2720, 0.4050, 0.5570, 0.6810, 0.7670, 0.8430, 0.8690*Selected arms (4) [1]*: 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000*Selected arms (4) [2]*: 0.7590, 0.8350, 0.9250, 0.9470, 0.9790, 0.9940, 0.9960*Selected arms (4) [3]*: 0.6450, 0.7650, 0.8920, 0.9340, 0.9760, 0.9920, 0.9960*Number of active arms [1]*: 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000*Number of active arms [2]*: 1.398, 1.435, 1.440, 1.414, 1.374, 1.309, 1.288*Number of active arms [3]*: 1.205, 1.226, 1.241, 1.215, 1.176, 1.119, 1.105*Expected number of subjects*: 144.8, 154.7, 165.1, 167.1, 169, 167.9, 167.5*Sample sizes (1) [1]*: 20, 20, 20, 20, 20, 20, 20*Sample sizes (1) [2]*: 9.2, 7.9, 6.4, 4.9, 3.2, 2, 1.3*Sample sizes (1) [3]*: 7.6, 6.3, 4.6, 3, 1.5, 0.8, 0.3*Sample sizes (2) [1]*: 20, 20, 20, 20, 20, 20, 20*Sample sizes (2) [2]*: 9.2, 9.4, 9.3, 8.6, 8.2, 7, 6.8*Sample sizes (2) [3]*: 8.1, 7.6, 7.7, 6.8, 6.3, 4.6, 4.3*Sample sizes (3) [1]*: 20, 20, 20, 20, 20, 20, 20*Sample sizes (3) [2]*: 9.6, 11.4, 13.1, 14.8, 16, 17.2, 17.7*Sample sizes (3) [3]*: 8.4, 10.6, 12.5, 14.6, 15.7, 17, 17.4*Sample sizes (4) [1]*: 20, 20, 20, 20, 20, 20, 20*Sample sizes (4) [2]*: 20, 20, 20, 20, 20, 20, 20*Sample sizes (4) [3]*: 20, 20, 20, 20, 20, 20, 20*Conditional power (achieved) [1]*: NA, NA, NA, NA, NA, NA, NA*Conditional power (achieved) [2]*: 0, 0, 0, 0, 0, 0, 0*Conditional power (achieved) [3]*: 0.1018, 0.1637, 0.2969, 0.4578, 0.6416, 0.8353, 0.9184

**Legend**

*(i)*: values of treatment arm i*[k]*: values at stage k

Note that we explicitly use the `options("rpact.summary.output.size" = "medium")`

command because otherwise the output turns out to be too long. You might also illustrate the results through the generic `plot`

command. For example, generating Overall Power/Early Stopping and Selected Arms per Stage plots are achieved by

`plot(simSelectionEpsilonMAMS, type = c(5, 3), grid = 0)`

# Closing remarks

This vignette can only give a brief introduction into possible configurations that can be considered within the simulation tool for multi-arm designs. Other than described here, for real-trial applications typically there is much more to take into account to adequately address the situation. For example, it might be of interest to additionally assess a sample size reassessment strategy. This can be performed as for simulating a single hypothesis situation (for example, see the vignette “Simulation of a trial with a binary endpoint and unblinded sample size re-calculation”).

For testing rates, the function `getSimulationMultiArmRates()`

and for survival designs, the function `getSimulationMultiArmSurvival()`

is available with very similar options as compared to the considered case. For survival designs, we note that - other than for the single hypothesis case - the function does not generate survival times on the subjects level, but normally distributed log-rank test statistics. As a consequence, for this case no estimates of analysis times, study duration, and expected number of subjects can be obtained.

System: rpact 3.4.0, R version 4.2.2 (2022-10-31 ucrt), platform: x86_64-w64-mingw32

To cite R in publications use:

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

To cite package ‘rpact’ in publications use:

Wassmer G, Pahlke F (2023). *rpact: Confirmatory Adaptive Clinical Trial Design and Analysis*. https://www.rpact.org, https://www.rpact.com, https://github.com/rpact-com/rpact, https://rpact-com.github.io/rpact/.