Causal Effect Analysis of Baseline Change

Author

Todd R. Johnson

Code

#remotes::install_github("jtextor/dagitty/r")
library(dagitty)
library(ggdag)
library(tidyverse)
library(DT)
library(gt)
library(gtsummary)
library(knitr)
library(kableExtra)
#  usethis::use_github_pages("main") # Tell github to publish on github pages

Construction Zone

This is a work in progress and subject to change nearly daily. At this stage there are numerous inconsistencies as I work to create a single simulated dataset that demonstrates the desired issues at hand while analyzing change from baseline.

The goal of this notebook is to demonstrate how to estimate treatment effects of a dichotomous (binary) treatment on change from a continuous baseline measure in both experimental and observational designs. For example, suppose we want to estimate the effect of a new drug vs. placebo for reducing blood pressure after 6 months of use. Here, the blood pressure at treatment assignment is the baseline value, which we will denote as $Y_0$, and blood pressure measured 6 months after treatment assignment is the follow-up measure, denoted $Y_1$.

There are two common approaches to defining the outcome: (1) Compare the difference in the follow-up measures between the experimental and control group; or (2) Compare the difference in change scores between the two groups, where a change score, $\Delta Y$, for an individual is the difference between the follow-up blood pressure $Y_1$ and the baseline blood pressure $Y_0$: $\Delta Y = Y_1 - Y_0$

Technical Note

There is considerable controversy over whether and when to use change scores vs follow-up measures as the effect estimate. See Tennant et al. (2021) and Glymour (2022) for a discussion of some of the issues. A number of papers, going back decades, discuss this topic. However, my understanding is that much of the controversy stems from the fact that few researchers use proper causal thinking and methods when designing their studies and estimating effects. The controversy is further amplified because there are many different dimensions to the problem with different researchers focusing on a different subset of those dimensions. Finally, many older papers were written without the benefit of modern causal inference theory and methods, so they lack the theoretical and mathematical foundation to properly address the issues.

If you read Tennant et al. (2021), you will see that despite the title warning that “change scores do not estimate causal effects” he later in the paper contradicts the title by correctly stating that change scores do indeed estimate causal effects when the baseline measure is a competing exposure of the follow-up measure, meaning that the baseline measure does not cause the treatment, nor is caused by the treatment. This is true of the example that I use throughout this paper. In his paper, he uses other examples in which the “baseline measure” is partly caused by the treatment or is a cause of treatment. In my opinion, such measures are not truly baseline measures. In addition, if you suspect this to be true, then the resulting causal DAG will directly guide how you select the effect to estimate and what is needed to do the estimation.

In this notebook I use the notation from the potential outcomes approach to causal inference to refer to the variables of interest. Don’t worry, I will repeat these definitions as they are used below.

Variable	Description
$Y_0$	Baseline measure (continuous)
$Y_1$	Follow-up measure (continuous)
$X$	Treatment (binary). $x=1$ when treated, otherwise $x = 0$
$\Delta Y$	Change score. $\Delta Y = Y_1 - Y_0$
$Y_1^x, \Delta Y^x$	Follow-up measure and change score when treated ($x = 1$)
$Y_1^{x'}, \Delta Y^{x'}$	Follow-up measure and change score when untreated ($x = 0$)

We will use five increasingly complex data generating models to explore this problem:

A model where the potential outcomes (the outcome measure under both treatment and control) are known for all participants.
A Randomized Controlled Trial (RCT) with no loss to follow-up.
An RCT with differential loss to follow-up.
An observational study with treatment assignment confounding
An observational study with treatment assignment confounding and loss to follow-up.

We will also consider four causal estimands¹ based on the two approaches discussed above:

¹ A causal estimand is the value we wish to estimate, such as the difference in follow-up measures between the experimental and the control group.

The average treatment effect (ATE) on the outcome at follow-up, expressed as the difference between the follow-up measure under treatment vs. control:
\[\mathrm{E}(Y_1^x) - \mathrm{E}(Y_1^{x'}) \tag{1}\]

Here, $Y_1$ is the follow-up outcome, $x$ is the treatment, and $x'$ is the control. $Y_1^x$ represents the value that a participant would have if treated, whereas $Y_1^{x'}$ represents the value that a participant would have if untreated. Hence this effect estimate is the difference between the expected outcome if all patients in the study were treated and the expected outcome if all patients were not treated.
The average treatment effect on the change score $\Delta Y$:

\[E(\Delta Y^x) - E(\Delta Y^{x'})\]
This is the difference between the expected value of the change score if all patients had been treated $\Delta Y_1^x$ and the expected value if all patients had been untreated $\Delta Y_1^{x'}$. We can expand this equation to:
\[ \mathrm{E}(Y_1^x - Y_0^x) - \mathrm{E}(Y_1^{x'} - Y_0^{x'}) \tag{2}\]
The conditional treatment effect on the follow-up outcome, conditioned on the observed baseline value $Y_0$:
\[ E[Y_1^x - Y_1^{x'} | Y_0 = y_0] \tag{3}\]
This estimand represents the effect of treatment on the part of the follow-up outcome that hasn’t already been determined by the baseline value.
The conditional treatment effect on the change score, conditioned on the observed baseline value $Y_0$:
\[ E[\Delta Y^x - \Delta Y^{x'}) | Y_0 = y_0] \tag{4}\]
This estimand represents the effect of treatment on the part of the change score that hasn’t already been determined by the baseline value. Note that because the change score includes the baseline value $Y_0$, this estimand is equivalent to Equation 3.

Estimands vs Estimators

A causal estimand expresses the value that we wish to estimate. A statistical estimator is how we estimate that value. For any given estimand there are often multiple possible statistical estimators, each with possible strengths and weaknesses. For example, with an RCT we can estimate ATE by subtracting the differences in the means of the outcome variable across the groups or by using linear regression or a number of other approaches.

There is considerable controversy in the literature over whether and when to use the difference in outcome at follow-up Equation 1 vs the difference in the change score Equation 2 as well as whether to adjust for baseline values. We will explore some of these issues below.

Model 1: Additive Treatment Effect, Potential Outcomes Known

The Causal DAG for the first and simplest model looks like this:

Code

#, fig.width = 12, fig.height = 8

# Define coordinates for a more complex DAG
coords <- list(
  x = c(Y0 = 0, L = 0, W = 2,   X = 3,   Y1 = 3, DY = 1.5)*.5,
  y = c(Y0 = 1, L = 2, W = 2.5, X = 2.5, Y1 = 1, DY = 0)*.5
)
  
# Create the DAG using dagify
dagMod1 <- dagify(
  Y1 ~ Y0 + X + L,
  W ~ L + X,  # Side effect influenced by both L and treatment
  DY ~ Y0 + Y1,
  exposure = "X",
  outcome = "Y1",
  coords = coords
)

# Convert the dagitty graph to a ggdag group for prettier printing
tidy_dagMod1 <- tidy_dagitty(dagMod1) 

# Add labels and status
tidy_dagMod1$data$math_label <- with(tidy_dagMod1$data, case_when(
  name == "X" ~ "X",
  name == "L" ~ "L",
  name == "W" ~ "W",
  name == "Y0" ~ "Y[0]",
  name == "Y1" ~ "Y[1]", 
  name == "DY" ~ "Delta*Y"
))

tidy_dagMod1$data$status <- with(tidy_dagMod1$data, case_when(
  name == "Y1" ~ "Outcome",
  name == "DY" ~ "Outcome",
  name == "X" ~ "Treatment",
  name %in% c("Y0", "L", "W") ~ "Observed"
))

ggdag(tidy_dagMod1) +
  geom_dag_point(aes(color = status), size = 15) +
  geom_dag_text(aes(label = math_label), parse = TRUE, size = 3.5, color = "white") +
  geom_dag_edges(aes(xend = xend, yend = yend)) +
  scale_color_manual(
    name = "Status",
    values = c(
      "Outcome" = "#E31A1C",
      "Treatment" = "#1F78B4", 
      "Observed" = "#33A02C"
#      "Unobserved" = "#FF7F00"
    )
  ) +
  theme_dag_blank()

Causal DAG for Model 1 where group is the treatment variable and either time1 or change1 is the outcome.

The DAG uses common sense names instead of the typical variables used in potential outcomes:

$Y_0$: Baseline value

$Y_1$: Follow-up value

$\Delta Y$: Change score

$X$: Treatment indicator where $X=0$ is control and $X=1$ is treatment

L: A binary prognostic factor that increases time1 by 10 when $L = 1$. Depending on what we wish to demonstrate we may treat L as measured or unmeasured.

W: A (binary) side-effect resulting from a combination of L and the treatment, group. W is more likely when L=1 and when treated group = treated. We will use W later to model loss to follow-up.

Now let’s simulate potential outcomes for the causal DAG shown above. In this simulation, the causal effect of treatment on time1 is 10, meaning that treated patients will see their baseline score increase by 10 plus a random amount of normally distributed noise with mean 0 and standard deviation of 3 ($\mu = 0$, $\sigma = 3$), whereas control will see only the random change. The simulation calculates the potential outcomes time1_x0 and time1_x1 for all participants, where x1 is the outcome under treatment, and x0 under control. Likewise, the simulation also calculates potential outcomes of the change score: change1_x0, and change1_x1.

Finally, we use the potential outcomes for each participant to calculate the individual treatment effect for both time1 and change1. An individual treatment effect for a participant is the difference between that person’s outcome under treatment and outcome under control.

Show the model equations:

\[ \begin{align} time1^0 &= baseline + 10L \\ time1^1 &= baseline + 10L + 10 \\ change1^0 &= time1^0 - baseline \\ &= (baseline + 10L) - baseline \\ &= 10L \\ change1^1 &= time1^1 - baseline \\ &= (baseline + 10L + 10) - baseline \\ &= 10L + 10 \end{align} \]

Data Generating Model

Here is the data generating model code:

# Set seed for reproducibility
set.seed(124)

# 300 participants
n <- 300

# Generate baseline scores
baseline <- rnorm(n, mean = 50, sd = 10)

# Set additive effect
additive_effect <- 10  # improvement of +10 to time1 for treatment group

# Generate a prognostic factor (L) - binary (0: favorable, 1: unfavorable)
L <- prognostic_factor <- rbinom(n, 1, 0.4)

baseline_outcome_effect <- 10 # Effect of prognostic factor on outcome: time1

# Calculate potential outcomes for time1 for all participants
# time1 under control adds a small amount of random noise, so on average there is no effect of control on the baseline value
time1_x0 <- baseline +
  baseline_outcome_effect*L + 
  rnorm(n, mean = 0, sd = 3)

# time1 under treatment is on average 10 units higher than baseline
time1_x1 <- baseline +
  additive_effect +
  baseline_outcome_effect*L + 
  rnorm(n, mean = 0, sd = 3)

# Calculate potential outcomes for the change score
change1_x0 <- time1_x0 - baseline
change1_x1 <- time1_x1 - baseline

# Calculate individual treatment effects (ITE) for time1 and change1 for each participant
ite_time1 <- time1_x1 - time1_x0
ite_change1 <- change1_x1 - change1_x0

# Generate potential outcomes for W (side effect) 
# W is more likely when L=1 (unfavorable prognosis) and when treated
# First, show the probability of W (side effect) by L and Group
# Adjust these to create higher censoring in treatment group with L=1
prob_W_L0_control <- 0.05     # 5% side effects for L=0 control
prob_W_L1_control <- 0.15     # 15% side effects for L=1 control  
prob_W_L0_treatment <- 0.20   # 20% side effects for L=0 treatment
prob_W_L1_treatment <- 0.70   # 70% side effects for L=1 treatment (much higher!)
W_x0 <- rbinom(n, 1, prob_W_L0_control + prob_W_L0_treatment*L)  
W_x1 <- rbinom(n, 1, prob_W_L1_control + prob_W_L1_treatment*L)  

dataMod1 <- data.frame(
  id = 1:n,
  baseline = baseline,
  L = L,
  time1_x0 = time1_x0,
  time1_x1 = time1_x1,
  change1_x0 = change1_x0,
  change1_x1 = change1_x1,
  W_x0 = W_x0,
  W_x1 = W_x1,
  ite_time1 = ite_time1,
  ite_change1 = ite_change1
)

# Create the kable table with first 10 rows
dataMod1 %>%
  head(10) %>%
  kable(format = "html", digits = 2) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "left"
  )

id	baseline	L	time1_x0	time1_x1	change1_x0	change1_x1	W_x0	W_x1	ite_time1	ite_change1
1	36.15	0	40.38	49.00	4.23	12.85	1	0	8.62	8.62
2	50.38	0	49.84	55.03	-0.54	4.65	0	0	5.19	5.19
3	42.37	1	50.51	54.88	8.14	12.51	0	1	4.38	4.38
4	52.12	1	57.69	70.04	5.56	17.92	0	1	12.35	12.35
5	64.26	1	71.98	83.12	7.72	18.87	0	1	11.15	11.15
6	57.44	0	57.00	73.46	-0.45	16.02	0	1	16.46	16.46
7	57.00	1	68.11	77.16	11.11	20.16	0	1	9.05	9.05
8	47.71	1	55.23	65.35	7.52	17.64	0	1	10.12	10.12
9	51.97	0	47.60	62.51	-4.37	10.54	0	1	14.91	14.91
10	62.07	0	56.82	67.17	-5.25	5.10	0	0	10.35	10.35

In this data there is no variable for group, because we generated the potential outcomes for all participants under both control and treatment. We will use group in the next model to simulate treatment assignment for an RCT.

Since we have the generative model we can calculate the precise ATE (Average treatment effect) on time1 as follows.

Under control time1 is given by the equation

\[ time1 = baseline + 10L \]

and under treatment time1 is given by the equation

\[ time1 = baseline + 10L + 10 \]

The average treatment effect (ATE) is the difference between these two equations:

\[ ATE_{time1} = baseline + 10L + 10 - (baseline + 10L) \]

which reduces to:

\[ ATE_{time1} = 10 \]

Likewise, the ATE on the change score change is the difference between the change under treatment minus the change with no treatment:

\[ \begin{align} ATE_{change} &= baseline - (baseline + 10L + 10) - (baseline - (baseline + 10L)) \\ &= baseline - baseline - 10L - 10 - (baseline - baseline - 10L) \\ &= -10L - 10 - baseline + baseline + 10L \\ &= -10L - 10 + 10L \\ &= 10 \end{align} \]

Summary Statistics

Let’s look at a summary of the data:

Code

library(gtsummary)
table1 <- dataMod1 |> select(-id) |> tbl_summary()
table1

Characteristic	N = 300¹
baseline	49 (43, 55)
L	114 (38%)
time1_x0	53 (47, 60)
time1_x1	63 (56, 69)
change1_x0	2.3 (-0.5, 8.3)
change1_x1	12.5 (9.3, 18.0)
W_x0	23 (7.7%)
W_x1	132 (44%)
ite_time1	9.9 (6.9, 12.5)
ite_change1	9.9 (6.9, 12.5)
¹ Median (Q1, Q3); n (%)

From the summary, we can see, as expected, that under treatment time1_x1 and change1_x1 are close to 10 points higher than control (time1_x0 and change1_x0).

ATE on Follow-Up (Mean difference)

Since we have the potential outcomes for all patients we can calculate the Average Treatment Effect of the treatment on time1 (Equation 1) as the difference between the mean follow-up outcome between all patients when treated vs not treated:

po_estimate <- mean(dataMod1$time1_x1) - mean(dataMod1$time1_x0)
cat("Mean difference time1 (Treatment - Control):", round(po_estimate, 3), "\n")

Mean difference time1 (Treatment - Control): 9.825

This is close to the true value of 10, but not perfect due to random noise added by the model and the relatively small sample size (300 participants).

ATE on Change Scores (Mean difference)

We can also compute the difference in change scores between the patients when they are treated vs untreated (Equation 5):

po_change1 <- mean(dataMod1$change1_x1) - mean(dataMod1$change1_x0)
cat("Mean difference change1 (Treatment - Control):", round(po_change1, 3), "\n")

Mean difference change1 (Treatment - Control): 9.825

This is identical to the difference in follow-up outcomes. This is because with potential outcomes the baseline value under treatment and control for each patient is the same and hence cancel out across the two means below. To see this, we start with the estimand for the change score:

\[ \mathrm{E}(Y_1^x - Y_0^x) - \mathrm{E}(Y_1^{x'} - Y_0^{x'}) \tag{5}\]

Since

\[\mathrm{E}(X - Y) = \mathrm{E}(X) - \mathrm{E}(Y)\]

we can rewrite Equation 5 as:

\[ \mathrm{E}(Y_1^x) - \mathrm{E}(Y_0^x) - \mathrm{E}(Y_1^{x'}) + \mathrm{E}(Y_0^{x'}) \]

When the baseline values under treatment and control are identical for each patient the second and fourth terms are identical and therefore $- \mathrm{E}(Y_0^{x}) + \mathrm{E}(Y_0^{x'}) = 0$ leaving only

\[ \mathrm{E}(Y_1^x) - \mathrm{E}(Y_1^{x'}) \]

which is just the difference between the follow-up outcomes under treatment and control.

So why analyze change scores? First, Glymour (2022) notes that in many clinical and epidemiological studies change is more meaningful as an effect than the difference between the follow-up outcomes. Second if baseline scores are imbalanced between treatment and control, such as in an observational study or insufficiently randomized RCT, analyzing change in follow-up scores can bias the effect estimate.

To calculate a p-value and confidence interval, we can do a paired t-test (to reflect that we have two correlated measures per participant):

# Perform paired t-test
t_test_result <- t.test(dataMod1$time1_x1, dataMod1$time1_x0, paired = TRUE)

# View results
#print(t_test_result)

# Extract specific values
mean_difference <- t_test_result$estimate
confidence_interval <- t_test_result$conf.int
p_value <- t_test_result$p.value

# Display results nicely
cat("Mean difference time1 (Treatment - Control):", round(mean_difference, 3), "\n")
cat("95% Confidence Interval:", round(confidence_interval[1], 3), "to", round(confidence_interval[2], 3), "\n")
cat("P-value:", round(p_value, 4), "\n")

Mean difference time1 (Treatment - Control): 9.825 
95% Confidence Interval: 9.348 to 10.303 
P-value: 0

The paired t-test shows a statistically significant difference between treatment and control conditions.

The t-test for the change score change1 gives identical results:

Code

# Perform paired t-test
t_test_result <- t.test(dataMod1$change1_x1, dataMod1$change1_x0, paired = TRUE)

# View results
#print(t_test_result)

# Extract specific values
mean_difference <- t_test_result$estimate
confidence_interval <- t_test_result$conf.int
p_value <- t_test_result$p.value

# Display results nicely
cat("Mean difference change1 (Treatment - Control):", round(mean_difference, 3), "\n")
cat("95% Confidence Interval:", round(confidence_interval[1], 3), "to", round(confidence_interval[2], 3), "\n")
cat("P-value:", round(p_value, 4), "\n")

Mean difference change1 (Treatment - Control): 9.825 
95% Confidence Interval: 9.348 to 10.303 
P-value: 0

For later comparison to regression results, we can use regression instead of t-tests. However, since we have two measures for each participant (both potential outcomes) we need to use a linear mixed-effects model to account for within unit correlation.

The results summarize our analysis thus far. Here, Beta is the effect estimate.

Code

library(lme4)
library(lmerTest)  # for p-values
library(lme4)
library(merDeriv)

# effect on time1: reshape data to long format (2 rows per participant) for time1
dataMod1_Long <- dataMod1 |> 
  pivot_longer(cols = c(time1_x0, time1_x1),
                         names_to = "treatment",
               values_to = "time1_outcome") |>
  mutate(group = ifelse(treatment == "time1_x1", 1, 0)) |>
  mutate(group = factor(group, labels = c("Control", "Treatment")))

# Mixed effects model accounting for within-unit correlation
res_time1_po <- lmer(time1_outcome ~ group + (1|id),
                   data = dataMod1_Long)
tbl_time1_po <- tbl_regression(res_time1_po,
                               show_single_row = group)

# effect on change1: reshape data to long format (2 rows per participant) for change1
dataMod1_Long <- dataMod1 |> 
  pivot_longer(cols = c(change1_x0, change1_x1),
                         names_to = "treatment",
               values_to = "change1_outcome") |>
  mutate(group = ifelse(treatment == "change1_x1", 1, 0)) |>
  mutate(group = factor(group, labels = c("Control", "Treatment")))

# Mixed effects model accounting for within-unit correlation
res_change1_po <- lmer(change1_outcome ~ group + (1|id), 
                   data = dataMod1_Long)
tbl_change1_po <- tbl_regression(res_change1_po,
                                 show_single_row = group)

tbl_stack(
  list(
    tbl_time1_po |> modify_table_body(~.x %>% mutate(label = "time1 (PO)")),
    tbl_change1_po |> modify_table_body(~.x %>% mutate(label = "change1 (PO)"))
  ) 
) |>
  modify_caption("**Treatment Effects (Beta) Across Different Models and Estimators**")  |>
  modify_footnote(
    label ~ "PO = Potential Outcomes"
  )

**Treatment Effects (Beta) Across Different Models and Estimators**
Characteristic¹	Beta	95% CI	p-value
time1 (PO)	9.8	9.3, 10	<0.001
change1 (PO)	9.8	9.3, 10	<0.001
Abbreviation: CI = Confidence Interval
¹ PO = Potential Outcomes

Given the small sample size and random noise, the potential outcomes analysis gives us the gold standard causal effect estimate by which we can judge all other effect estimates that use the same data.

We can now consider an RCT using the same data, but with randomization of treatment assignment.

Model 2: RCT with Additive Treatment Effect

This model uses the same causal DAG and potential outcomes generated for Model 1. We copy the dataset from Model 1, then add a group variable that randomly assigns each patient to either the control or treatment group. Based on the assigned treatment, we then copy the appropriate potential outcomes to the observed time1 and change1. Finally we save all of the data, then create a new data frame that includes only the variables we would see in an RCT.

Technical Note

In contrast to Model 1, where we knew the outcome under treatment and control for all participants, in Model 2 we can observe only one of those outcomes based on the treatment group a participant was assigned to. Recall that the average treatment effect is defined as the difference between the mean outcome if all participants were treated and the mean outcome if all participants were untreated. In most real-world data, including RCTs, we can only observe the outcome under one treatment condition. This is why the potential outcomes framework treats causal inference as a missing data problem. However, in RCTs we randomize treatment assignment, which means that we can often assume that (given a large enough number of participants), the participants in each treatment group are exchangeable, meaning that if we swapped which group was treated and which untreated it would not affect the outcome. As a result, we can assume that the outcomes of the two different groups are reasonable estimates of the outcome if all participants had been treated or untreated.

Data Generation

dataMod2 <- dataMod1

# Assign participants to treatment vs control
group <- rep(c(0, 1), each = n/2) # 0 = control, 1 = treatment

# Copy the potential outcome for time1 and change1 based on treatment assignment
time1 <- ifelse(group == 0, time1_x0, time1_x1)
change1 <- ifelse(group == 0, change1_x0, change1_x1)
W <- ifelse(group == 0, W_x0, W_x1)  # Observed side effect based on treatment

# Add the new variables to the Model 2 dataset
# Convert group to a factor, but use levels to make sure Treatment is first so that t.test
# substracts treatment means from control means
dataMod2$group <- factor(group, levels = c(0, 1), labels = c("Control", "Treatment"))
dataMod2$time1 <- time1
dataMod2$change1 <- change1
dataMod2$W <- W

# dataMod2 contains all of the potential outcomes, lets copy it to preserve it and keep only the data we would see in an RCT
dataMod2_all <- dataMod2
dataMod2 <- dataMod2 |> select(id, group, baseline, L, W, time1, change1)
real_cols <- which(sapply(dataMod2, function(x) {
  is.numeric(x) && any(x %% 1 != 0, na.rm = TRUE)
}))
datatable(dataMod2, 
          options = list(pageLength = 10),
          class = "display compact stripe") |>
  formatRound(columns = real_cols, digits = 2)

Summary Statistics

Here is the overall summary of the data:

Code

table1_Mod2 <- dataMod2 |> 
  select(-id) |> 
  tbl_summary() |>
  bold_labels()
table1_Mod2

Characteristic	N = 300¹
group
Control	150 (50%)
Treatment	150 (50%)
baseline	49 (43, 55)
L	114 (38%)
W	81 (27%)
time1	58 (50, 66)
change1	9 (3, 14)
¹ n (%); Median (Q1, Q3)

And here is a summary showing the difference between control and treatment:

Code

table2_Mod2 <- dataMod2 |>
  select(-id) |>
  mutate(group = as.factor(group)) |>
  tbl_summary(by = group) |>
  add_p() |>
  bold_labels()
table2_Mod2

Characteristic	Control N = 150¹	Treatment N = 150¹	p-value²
baseline	50 (44, 57)	49 (43, 55)	0.6
L	61 (41%)	53 (35%)	0.3
W	14 (9.3%)	67 (45%)	<0.001
time1	53 (47, 60)	63 (55, 69)	<0.001
change1	3 (-1, 9)	13 (9, 18)	<0.001
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

We can see that both time1 and change1 differ significantly between the two groups. Even though patients were randomized to the treatment and control groups, the baseline values between control and treatment groups are slightly different, though the difference is not significant. We can explore this more using a covariate balance plot:

 cobalt (Version 4.6.1, Build Date: 2025-08-20)

Note: `s.d.denom` not specified; assuming "pooled".

# Load required libraries
library(ggplot2)
library(plotly)

mean_baseline = mean(dataMod2$baseline)

# Calculate predictions at mean baseline for each group
pred1 <- predict(res_time1_group, newdata = data.frame(group = levels(dataMod2$group)[1], baseline = mean_baseline))
pred2 <- predict(res_time1_group, newdata = data.frame(group = levels(dataMod2$group)[2], baseline = mean_baseline))

# Create the plot
p <- ggplot(dataMod2, aes(x = baseline, y = time1, color = group)) +
  # Add points
  geom_point(alpha = 0.6, size = 2) +
  
  # Add regression lines for each group
  geom_smooth(method = "lm", se = TRUE, alpha = 0.2) +
  
  # Add vertical line at mean baseline
  geom_vline(xintercept = mean_baseline, linetype = "dashed", color = "black", size = 0.8) +
  
  # Add points for the endpoints of the vertical line (these will be invisible)
  geom_point(aes(x = mean_baseline, y = pred1, 
                text = paste("Group:", levels(dataMod2$group)[1], 
                           "<br>Baseline:", round(mean_baseline, 2),
                           "<br>Predicted Time2:", round(pred1, 2))),
             color = "red", size = 3, alpha = 0.8, inherit.aes = FALSE) +
  
  geom_point(aes(x = mean_baseline, y = pred2,
                text = paste("Group:", levels(dataMod2$group)[2], 
                           "<br>Baseline:", round(mean_baseline, 2),
                           "<br>Predicted Time2:", round(pred2, 2))),
             color = "red", size = 3, alpha = 0.8, inherit.aes = FALSE) +
  
  # Add the vertical line showing difference between groups at mean baseline
  geom_segment(
    x = mean_baseline, 
    xend = mean_baseline,
    y = pred1,
    yend = pred2,
    color = "red", 
    size = 2,
    arrow = arrow(ends = "both", angle = 90, length = unit(0.1, "inches"))
  ) +
  
  # Customize the plot
  labs(
    title = "Time2 vs Baseline by Group",
    subtitle = paste("Vertical line shows difference between groups at mean baseline (", 
                    round(mean_baseline, 2), ")", sep = ""),
    x = "Baseline",
    y = "Time2",
    color = "Group"
  ) +
  
  # Clean theme
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 11, color = "gray40"),
    legend.position = "bottom"
  )

# Convert to plotly for interactive tooltips
ggplotly(p, tooltip = "text") %>%
  layout(title = list(text = "Time1 vs Baseline by Group<br><sub>Vertical line shows difference between groups at mean baseline</sub>"))

A general rule of thumb is that the standardized mean differences between covariates should fall between $+/-0.1$. As seen above, both L andbaseline fall within the threshold. (Ignore W for now.) If the baseline values of the treatment and control group are balanced, then an unbiased average treatment effect is just the mean of time2 for those in the treatment group minus the mean for those in the control group. Let’s calculate that here and see if the slight baseline imbalance creates a biased estimate:

means <- dataMod2 |>
  group_by(group) |>
  summarize(mean = mean(time1))
rct_time1_mean_diff <- filter(means, group=="Treatment")$mean - filter(means, group=="Control")$mean
cat("Mean difference in time1 from RCT (Treatment - Control):", round(rct_time1_mean_diff, 3), "\n")

Mean difference in time1 from RCT (Treatment - Control): 9.146

Recall that the true causal effect is 10 and the estimate from this data based on potential outcomes is 9.8250863. The estimate from the RCT is lower than both. To get a p-value and confidence interval, we can use regression to estimate the difference in means for time1 between the treatment and control groups:

res_time1_group <- lm(time1 ~ group, data = dataMod2)
tbl_time1_group <- tbl_regression(res_time1_group, show_single_row = group)
tbl_time1_group

Characteristic	Beta	95% CI	p-value
group	9.1	6.6, 12	<0.001
Abbreviation: CI = Confidence Interval

The treatment group showed significantly higher time1 scores compared to the control group; however, the effect estimate is biased as a result of the slight baseline imbalance. Going back to the data, the mean baseline for the control group is higher than the mean in the treatment group:

baseline_means <- dataMod2 |>
  group_by(group) |>
  summarize(mean = mean(baseline))
gt(baseline_means) |>
  tab_style(style = cell_text(weight = 'bold'),
            locations = cells_column_labels()
  )

group	mean
Control	49.87414
Treatment	49.30925

The difference between the mean baselines is

rct_mean_baseline_diff <- filter(baseline_means, group=="Treatment")$mean - 
  filter(baseline_means, group=="Control")$mean
cat("Mean difference in baseline from RCT (Treatment - Control):", round(rct_mean_baseline_diff, 3), "\n")

Mean difference in baseline from RCT (Treatment - Control): -0.565

This baseline imbalance results in a biased effect estimate when we use the difference in follow-up scores. Those differences do not take the imbalance into effect, because they do not use the baseline score.

If we compare the analyses we have done so far, we see that analyzing time1 from the RCT gives a biased effect estimate with a confidence interval that is much wider than that from the potential outcomes analysis:

Code

tbl_stack(
  list(
    tbl_time1_po |> modify_table_body(~.x %>% mutate(label = "time1 (PO)")),
    tbl_change1_po |> modify_table_body(~.x %>% mutate(label = "change1 (PO)")),
    tbl_time1_group |> modify_table_body(~.x %>% mutate(label = "time1 (RCT)"))
  ) 
) |>
  modify_caption("**Treatment Effects Across Different Models and Estimators**") |>
  modify_footnote(
    label ~ "PO = Potential Outcomes; RCT = Randomized Controlled Trial"
  ) %>%
  modify_footnote(
    estimate ~ "CI = Confidence Interval"
  ) |>
  modify_table_styling(
    columns = everything(),
    rows = label == "time1 (RCT)",  # Replace with your specific row label
    text_format = "bold"
  )

**Treatment Effects Across Different Models and Estimators**
Characteristic¹	Beta²	95% CI	p-value
time1 (PO)	9.8	9.3, 10	<0.001
change1 (PO)	9.8	9.3, 10	<0.001
time1 (RCT)	9.1	6.6, 12	<0.001
Abbreviation: CI = Confidence Interval
¹ PO = Potential Outcomes; RCT = Randomized Controlled Trial
² CI = Confidence Interval

Now lets consider the effect of treatment on the change score, change1:

means <- dataMod2 |>
  group_by(group) |>
  summarize(mean = mean(change1))
rct_change1_mean_diff <- filter(means, group=="Treatment")$mean - filter(means, group=="Control")$mean
cat("Mean difference in change1 from RCT (Treatment - Control):", round(rct_change1_mean_diff, 3), "\n")

Mean difference in change1 from RCT (Treatment - Control): 9.711

This estimate removes the bias due to the imbalanced baseline scores. Let’s use regression to generate stats:

res_change1_group <- lm(change1 ~ group, data = dataMod2)
tbl_change1_group <- tbl_regression(res_change1_group, show_single_row = group)
tbl_change1_group

Characteristic	Beta	95% CI	p-value
group	9.7	8.4, 11	<0.001
Abbreviation: CI = Confidence Interval

Here, we see that the means are statistically significantly different. Also, note the tighter confidence interval as compared to the estimates from analysis of the follow-up scores, time1 (RCT):

Code

tbl_stack(
  list(
    tbl_time1_po |> modify_table_body(~.x %>% mutate(label = "time1 (PO)")),
    tbl_change1_po |> modify_table_body(~.x %>% mutate(label = "change1 (PO)")),
    tbl_time1_group |> modify_table_body(~.x %>% mutate(label = "time1 (RCT)")),
      tbl_change1_group |> modify_table_body(~.x %>% mutate(label = "change1 (RCT)"))
  ) 
) |>
  modify_caption("**Treatment Effects Across Different Models and Estimators**") |>
  modify_footnote(
    label ~ "PO = Potential Outcomes; RCT = Randomized Controlled Trial"
  ) %>%
  modify_footnote(
    estimate ~ "CI = Confidence Interval"
  ) |>
  modify_table_styling(
    columns = everything(),
    rows = label == "change1 (RCT)",  # Replace with your specific row label
    text_format = "bold"
  )

**Treatment Effects Across Different Models and Estimators**
Characteristic¹	Beta²	95% CI	p-value
time1 (PO)	9.8	9.3, 10	<0.001
change1 (PO)	9.8	9.3, 10	<0.001
time1 (RCT)	9.1	6.6, 12	<0.001
change1 (RCT)	9.7	8.4, 11	<0.001
Abbreviation: CI = Confidence Interval
¹ PO = Potential Outcomes; RCT = Randomized Controlled Trial
² CI = Confidence Interval

As we have seen, if the baseline values are imbalanced, the effect estimate of the difference in means of time2 will be biased. This is why many analysts recommend estimating the mean differences in follow-up scores or change scores using analysis of covariance (ANCOVA) where the outcome is regressed on the treatment and adjusted for the baseline. This corresponds to the estimands presented earlier: Equation 3 and Equation 4.

Let’s do this for the follow-up score, time1:

res_time1_ANCOVA <- lm(time1 ~ group + baseline, data = dataMod2)
tbl_time1_ANCOVA <- tbl_regression(res_time1_ANCOVA,
                                   show_single_row = c("group"),
                                   include = "group")
tbl_time1_ANCOVA

Characteristic	Beta	95% CI	p-value
group	9.7	8.5, 11	<0.001
Abbreviation: CI = Confidence Interval

Using ANCOVA on the follow-up scores produces the same estimate as the change scores analysis. In other words, ANCOVA provides an unbiased estimate of the effect.

Now lets try ANCOVA on the change score:

res_change1_ANCOVA <- lm(change1 ~ group + baseline, data = dataMod2)
tbl_change1_ANCOVA <- tbl_regression(res_change1_ANCOVA, show_single_row = c("group"), include = "group")
tbl_change1_ANCOVA

Characteristic	Beta	95% CI	p-value
group	9.7	8.5, 11	<0.001
Abbreviation: CI = Confidence Interval

Here, there is little change from the marginal (non-ANCOVA) effect estimate.

Let’s look at the two new results in the context of our prior analyses:

Code

tbl_stack(
  list(
    tbl_time1_po |> modify_table_body(~.x %>% mutate(label = "time1 (PO)")),
    tbl_change1_po |> modify_table_body(~.x %>% mutate(label = "change1 (PO)")),
    tbl_time1_group |> modify_table_body(~.x %>% mutate(label = "time1 (RCT)")),
    tbl_change1_group |> modify_table_body(~.x %>% mutate(label = "change1 (RCT)")),
    tbl_time1_ANCOVA |> modify_table_body(~.x %>% mutate(label = "time1 (RCT ANCOVA)")),
        tbl_change1_ANCOVA |> modify_table_body(~.x %>% mutate(label = "change1 (RCT ANCOVA)"))
  ) 
) |>
  modify_caption("**Treatment Effects Across Different Models and Estimators**") |>
  modify_footnote(
    label ~ "PO = Potential Outcomes; RCT = Randomized Controlled Trial"
  ) %>%
  modify_footnote(
    estimate ~ "CI = Confidence Interval"
  ) |>
  modify_table_styling(
    columns = everything(),
    rows = label == c("time1 (RCT ANCOVA)", "change1 (RCT ANCOVA)"),
    text_format = "bold"
  )

**Treatment Effects Across Different Models and Estimators**
Characteristic¹	Beta²	95% CI	p-value
time1 (PO)	9.8	9.3, 10	<0.001
change1 (PO)	9.8	9.3, 10	<0.001
time1 (RCT)	9.1	6.6, 12	<0.001
change1 (RCT)	9.7	8.4, 11	<0.001
time1 (RCT ANCOVA)	9.7	8.5, 11	<0.001
change1 (RCT ANCOVA)	9.7	8.5, 11	<0.001
Abbreviation: CI = Confidence Interval
¹ PO = Potential Outcomes; RCT = Randomized Controlled Trial
² CI = Confidence Interval

Here we see that the analysis of change scores using either the marginal effect of group on change1 or ANCOVA recovers an unbiased estimate similar to the marginal effect of change1 (RCT). When analyzing follow-up scores, the marginal effect of group on time1 (time1 (RCT) is biased and has a wide confidence interval, but ANCOVA eliminates the bias and produces a confidence interval equivalent to the change score analyses.

Recall that with the noise in our data and the small sample size, the potential outcomes analysis (time1 (PO) and change1 (PO)) provide the gold standard estimate of the true effect, which in this model is 10.

Causal DAG Theory and Adjustment

According to Causal DAG theory, since there is no backdoor path from baseline to time1 we do not need to adjust for baseline in order to estimate the direct effect of group on time1; however, as we have seen above, the unadjusted analysis produces a biased estimate. This difference is due to the use of adjustment for two separate purposes:

To block confounding paths in causal DAGs; and
To improve precision and remove finite sample bias when estimating effects.

Since baseline is not a confounder and there are no other confounders in the causal DAG for our data generating model, by causal DAG theory baseline does not need to be adjusted for. However, adjusting for baseline provides a more efficient estimator of the direct effect, because it reduces residual variance and eliminates bias from chance baseline imbalance.

Summary

Thus far we have seen that there are several nuances to analyzing change from baseline, even with an RCT. We also saw that ANCOVA provided an equally good estimate when analyzing both follow-up scores and change scores.

Now we turn to an RCT with differential loss to follow-up.

Model 3: RCT, Additive Effect, Differential Loss to Follow-Up (L Measured)

To be finished.

Model 3 uses the RCT data, but simulates differential loss to follow-up, meaning that patients drop out of the study before the outcome time1 can be measured. When a patient is lost to follow-up we say that that patient has been censored. In this simulation, patients are censored based on the presence of the side-effect W which is affected by their treatment group and the prognostic factor L according to the following probabilities. In this Model we assume that L is measured for all participants.

Code

# Create the data frame with the specified percentages

# Calculate censoring probabilities
prob_ltfu_unfavorable_control <- 0.15 # Probability of LTFU for L=1 in control group
prob_ltfu_unfavorable_treatment <- 0.7 # Probability of LTFU for L=1 in treatment group
prob_ltfu_favorable <- 0.02 # Probability of LTFU for L=0 in both groups


table_data <- data.frame(
L = c("0", "1"),
Control = c(paste0(prob_ltfu_favorable * 100, "%"), 
             paste0(prob_ltfu_unfavorable_control * 100, "%")),
Treatment = c(paste0(prob_ltfu_favorable * 100, "%"), 
               paste0(prob_ltfu_unfavorable_treatment * 100, "%"))
)

# Create the gt table
gt_table <- table_data %>%
  gt() %>%
  tab_header(
    title = "Probability of Loss to Follow-up"
  ) %>%
  cols_label(
    L = "L",
    Control = "Control",
    Treatment = "Treatment"
  ) %>%
  tab_spanner(
    label = "Group",
    columns = c(Control, Treatment)
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_title()
  ) %>%
  opt_align_table_header(align = "center")

# Display the table
gt_table

Probability of Loss to Follow-up
L	Group
L	Control	Treatment
0	2%	2%
1	15%	70%

Code

# First, show the probability of W (side effect) by L and Group

table_W <- data.frame(
  L = c("0", "1"),
  Control = c(paste0(prob_W_L0_control * 100, "%"), 
              paste0(prob_W_L1_control * 100, "%")),
  Treatment = c(paste0(prob_W_L0_treatment * 100, "%"), 
                paste0(prob_W_L1_treatment * 100, "%"))
)

# Create the gt table for W probabilities
gt_table_W <- table_W %>%
  gt() %>%
  tab_header(
    title = "Probability of Side Effect (W) by L and Treatment Group"
  ) %>%
  cols_label(
    L = "L",
    Control = "Control",
    Treatment = "Treatment"
  ) %>%
  tab_spanner(
    label = "Group",
    columns = c(Control, Treatment)
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_title()
  ) %>%
  opt_align_table_header(align = "center")

# Display the W probability table
gt_table_W

Probability of Side Effect (W) by L and Treatment Group
L	Group
L	Control	Treatment
0	5%	20%
1	15%	70%

Code

# Now show the probability of censoring given W
# From the data generation:
# prob_ltfu_W0 <- 0.02  # Low probability of LTFU when no side effect
# prob_ltfu_W1 <- 0.50  # High probability of LTFU when side effect present

prob_ltfu_W0 <- 0.02
prob_ltfu_W1 <- 0.80

table_censoring <- data.frame(
  W = c("0 (No side effect)", "1 (Side effect present)"),
  Probability_of_LTFU = c(paste0(prob_ltfu_W0 * 100, "%"), 
                           paste0(prob_ltfu_W1 * 100, "%"))
)

# Create the gt table for censoring probabilities
gt_table_censoring <- table_censoring %>%
  gt() %>%
  tab_header(
    title = "Probability of Loss to Follow-up by Side Effect Status"
  ) %>%
  cols_label(
    W = "W",
    Probability_of_LTFU = "Probability of LTFU"
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_title()
  ) %>%
  opt_align_table_header(align = "center")

# Display the censoring probability table
gt_table_censoring

Probability of Loss to Follow-up by Side Effect Status
W	Probability of LTFU
0 (No side effect)	2%
1 (Side effect present)	80%

Code

# Calculate and show the overall LTFU probabilities by L and Group
# P(LTFU | L, Group) = P(LTFU | W=1) * P(W=1 | L, Group) + P(LTFU | W=0) * P(W=0 | L, Group)

prob_ltfu_L0_control <- prob_ltfu_W1 * prob_W_L0_control + prob_ltfu_W0 * (1 - prob_W_L0_control)
prob_ltfu_L1_control <- prob_ltfu_W1 * prob_W_L1_control + prob_ltfu_W0 * (1 - prob_W_L1_control)
prob_ltfu_L0_treatment <- prob_ltfu_W1 * prob_W_L0_treatment + prob_ltfu_W0 * (1 - prob_W_L0_treatment)
prob_ltfu_L1_treatment <- prob_ltfu_W1 * prob_W_L1_treatment + prob_ltfu_W0 * (1 - prob_W_L1_treatment)

table_overall <- data.frame(
  L = c("0", "1"),
  Control = c(paste0(round(prob_ltfu_L0_control * 100, 1), "%"), 
              paste0(round(prob_ltfu_L1_control * 100, 1), "%")),
  Treatment = c(paste0(round(prob_ltfu_L0_treatment * 100, 1), "%"), 
                paste0(round(prob_ltfu_L1_treatment * 100, 1), "%"))
)

# Create the gt table for overall LTFU probabilities
gt_table_overall <- table_overall %>%
  gt() %>%
  tab_header(
    title = "Overall Probability of Loss to Follow-up by L and Treatment Group",
    subtitle = "Calculated as: P(LTFU|L,Group) = P(LTFU|W=1)×P(W=1|L,Group) + P(LTFU|W=0)×P(W=0|L,Group)"
  ) %>%
  cols_label(
    L = "L",
    Control = "Control",
    Treatment = "Treatment"
  ) %>%
  tab_spanner(
    label = "Group",
    columns = c(Control, Treatment)
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_title()
  ) %>%
  opt_align_table_header(align = "center")

# Display the overall table
gt_table_overall

Overall Probability of Loss to Follow-up by L and Treatment Group
Calculated as: P(LTFU\|L,Group) = P(LTFU\|W=1)×P(W=1\|L,Group) + P(LTFU\|W=0)×P(W=0\|L,Group)
L	Group
L	Control	Treatment
0	5.9%	17.6%
1	13.7%	56.6%

Patients with L see an increase by 10 in time1, but are much more likely to drop out of the study if they are also in the treatment group. This will tend to decrease the difference in time1 between the treatment and control groups, hence resulting in underestimating the treatment effect.

Let’s generate the censored dataset:

# Copy dataMod2
dataMod3 <- dataMod2

# Simulate differential loss to follow-up (LTFU) based on W only
dataMod3$ltfu_prob <- ifelse(dataMod3$W == 1, prob_ltfu_W1, prob_ltfu_W0)

# Generate censoring indicator (1 = censored/dropped out, 0 = observed)
dataMod3$censored <- rbinom(n, 1, dataMod3$ltfu_prob)

# Apply censoring to the data (set values to NA)
dataMod3$time1[dataMod3$censored == 1] <- NA
dataMod3$change1[dataMod3$censored == 1] <- NA

Summary Statistics

As expected, the summary statistics comparing control and treatment groups show differential loss to follow-up, as seen in the different rates of censoring:

Code

table2_Mod3 <- dataMod3 |>
  select(-id) |>
  mutate(group = as.factor(group)) |>
  tbl_summary(by = group) |>
  add_p() |>
  bold_labels()
table2_Mod3

Characteristic	Control N = 150¹	Treatment N = 150¹	p-value²
baseline	50 (44, 57)	49 (43, 55)	0.6
L	61 (41%)	53 (35%)	0.3
W	14 (9.3%)	67 (45%)	<0.001
time1	52 (47, 60)	61 (54, 67)	<0.001
Unknown	14	56
change1	1.8 (-1.3, 8.1)	10.1 (8.2, 14.4)	<0.001
Unknown	14	56
ltfu_prob			<0.001
0.02	136 (91%)	83 (55%)
0.8	14 (9.3%)	67 (45%)
censored	14 (9.3%)	56 (37%)	<0.001
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

Censored patients have missing outcomes, which means that we must drop them from the dataset prior to applying any of the methods we used above.

Dropping these patients from the dataset changes the population and the distribution of the variables in our data:

# Create the new dataset retaining only uncensored participants
dataMod3_Censored <- dataMod3 |>
  filter(censored == 0) |>
  select(-censored)
  
table2_Mod3_Censored <- dataMod3_Censored|>
  select(-id) |>
  mutate(group = as.factor(group)) |>
  tbl_summary(by = group) |>
  add_p() |>
  bold_labels()
table2_Mod3_Censored

Characteristic	Control N = 136¹	Treatment N = 94¹	p-value²
baseline	50 (44, 57)	49 (43, 56)	0.7
L	50 (37%)	14 (15%)	<0.001
W	3 (2.2%)	14 (15%)	<0.001
time1	52 (47, 60)	61 (54, 67)	<0.001
change1	1.8 (-1.3, 8.1)	10.1 (8.2, 14.4)	<0.001
ltfu_prob			<0.001
0.02	133 (98%)	80 (85%)
0.8	3 (2.2%)	14 (15%)
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Pearson’s Chi-squared test

In the uncensored participants L, the prognostic factor that causes better outcomes, is now unbalanced. Participants who in the treated group have a much lower incidence of L. We can see this in a covariate balance plot:

Note: `s.d.denom` not specified; assuming "pooled".

Let’s look at how that affects our effect estimates on time1 and change1 using ANCOVA:

Code

tbl_time1_ANCOVA_LTFU_Naive <- 
  tbl_regression(lm(time1 ~ group + baseline, data = dataMod3),
                 show_single_row = c("group"),
                 include = "group")
tbl_change1_ANCOVA_LTFU_Naive <- 
  tbl_regression(lm(change1 ~ group + baseline, data = dataMod3),
                 show_single_row = c("group"),
                 include = "group")
tbl_stack(
  list(
    tbl_time1_po |> modify_table_body(~.x %>% mutate(label = "time1 (PO)")),
    tbl_change1_po |> modify_table_body(~.x %>% mutate(label = "change1 (PO)")),
    tbl_time1_group |> modify_table_body(~.x %>% mutate(label = "time1 (RCT)")),
    tbl_change1_group |> modify_table_body(~.x %>% mutate(label = "change1 (RCT)")),
    tbl_time1_ANCOVA |> modify_table_body(~.x %>% mutate(label = "time1 (RCT ANCOVA)")),
        tbl_change1_ANCOVA |> modify_table_body(~.x %>% mutate(label = "change1 (RCT ANCOVA)")),
        tbl_time1_ANCOVA_LTFU_Naive |> modify_table_body(~.x %>% mutate(label = "time1 (RCT LTFU ANCOVA)")),
    tbl_change1_ANCOVA_LTFU_Naive |> modify_table_body(~.x %>% mutate(label = "change1 (RCT LTFU ANCOVA)"))
  ) 
) |>
  modify_caption("**Treatment Effects Across Different Models and Estimators**") |>
  modify_footnote(
    label ~ "PO = Potential Outcomes; RCT = Randomized Controlled Trial; LTFU (Lost to Follow-up)"
  )  |>
  modify_table_styling(
    columns = everything(),
    rows = label == c("time1 (RCT LTFU ANCOVA)", "change1 (RCT LTFU ANCOVA)"),
    text_format = "bold"
  )

**Treatment Effects Across Different Models and Estimators**
Characteristic¹	Beta	95% CI	p-value
time1 (PO)	9.8	9.3, 10	<0.001
change1 (PO)	9.8	9.3, 10	<0.001
time1 (RCT)	9.1	6.6, 12	<0.001
change1 (RCT)	9.7	8.4, 11	<0.001
time1 (RCT ANCOVA)	9.7	8.5, 11	<0.001
change1 (RCT ANCOVA)	9.7	8.5, 11	<0.001
time1 (RCT LTFU ANCOVA)	8.2	6.8, 9.6	<0.001
change1 (RCT LTFU ANCOVA)	8.2	6.8, 9.6	<0.001
Abbreviation: CI = Confidence Interval
¹ PO = Potential Outcomes; RCT = Randomized Controlled Trial; LTFU (Lost to Follow-up)

The naive analysis of the data, that completely drops censored patients, underestimates the treatment effect.

Adjusting for Loss To Follow-up

When we ignore patients who are lost to follow-up we create a selection bias. To see this, let’s look at the causal DAG for Model 3 including the censoring indicator:

Code

# Create the DAG using dagify
dagMod3 <- dagify(
  time1 ~ baseline + group + L,
  W ~ L + group,  # Side effect influenced by both L and treatment
  censored ~ W,   # Censoring depends only on W (side effect)
  change1 ~ baseline + time1,
  exposure = "group",
  outcome = "time1",
  coords = data.frame(
  name = c("baseline", "L", "W", "censored", "group", "time1", "change1"),
    x = c(0, 0, 2, 2, 4, 3, 1.5)*.5,
    y = c(1, 2, 2.5, 2, 2, 1, 0)*.5
  )
)

ggdag_status(dagMod3, var = "censored",
             text = TRUE, node_size = 25, text_size = 4) +
  geom_dag_edges(start_cap = ggraph::circle(10, 'mm'),
                 end_cap = ggraph::circle(10, 'mm'),  # Larger end cap for bigger nodes
                 arrow_directed = grid::arrow(length = grid::unit(15, "pt"), 
                                             type = "closed")) +
  guides(shape = "none") + 
  theme_dag() +
  theme(plot.margin = margin(20, 20, 20, 20))

To estimate the causal effect of group on time1 we need to block any open backdoor paths between group and time1 while keeping forward, causal, paths open. A backdoor path is an acyclic path from the treatment variable to the outcome variable where at least one arrow points back toward the treatment. There are two paths between group and time:

Code

paths <- paths(dagMod3)
df_paths <- data.frame(path = paths$paths,
                       open = paths$open)

df_paths %>%
  gt() %>%
  cols_label(
    path = "Path",
    open = "Open"
  ) %>%
  tab_header(
    title = "Causal Paths Analysis") %>%
  cols_align(
    align = "center",
    columns = open
  ) %>%
  cols_align(
    align = "left", 
    columns = path
  )

Causal Paths Analysis
Path	Open
group -> W <- L -> time1	FALSE
group -> time1	TRUE

group -> time1 is a causal path that is open, meaning that association can flow through it. group -> W <- L -> time1 is a backdoor path that is closed, because association is blocked since W is a collider on this path. A collider is a node on a path with both arrows pointing toward it. Colliders block the flow of association, as long as we don’t condition or adjust for the collider or a child node of the collider.

It can help to visualize only the open paths between the treatment and the outcome:

ggdag_paths(dagMod3, from = "group", to = "time1", shadow=TRUE) + theme_dag()

The problem with censoring is that if we analyze only patients who remained in the study (were not censored) then we are selecting only uncensored patients, which means that we are setting $censored = 1$, hence we are adjusting censored. Since censored is a child of W, this opens the flow of association across W. This non-causal association can introduce bias into our effect estimates. Again, it is helpful to view the open paths in the context of the DAG when we adjust for both L and censored:

Code

# ggdag_adjust(dagMod3, var = "censored",
#              text = TRUE, node_size = 25, text_size = 2.5) +
#   guides(shape = "none") + 
#   theme_dag() +
#   theme(plot.margin = margin(20, 20, 20, 20)) 

ggdag_paths(dagMod3, from = "group", to = "time1", shadow=TRUE, adjust_for = c("censored")) +
    aes(shape = ifelse(name %in% c("censored"), "square", "circle")) +
  scale_shape_manual(values = c(circle = 16, square = 15), guide = "none") +
  theme_dag() +
  theme(legend.position = "right")

The naive analysis that completely ignores censored participants answers the causal question: What is the effect of treatment on the outcome among patients who were not lost to follow-up in the presence of bias due to censoring. How do we get an unbiased effect? According to Pearl’s causal inference framework (Pearl 2010), adjusting for L will block the non-causal path opened by censoring. After adjustment for L, we see that the backdoor path is closed:

ggdag_paths(dagMod3, from = "group", to = "time1",
            shadow=TRUE,
            adjust_for = c("censored", "L")) +
  aes(shape = ifelse(name %in% c("L", "censored"), "square", "circle")) +
  scale_shape_manual(values = c(circle = 16, square = 15), guide = "none") +
  theme_dag() +
  theme(legend.position = "right")

To adjust for L we can add L as a covariate to the regression:

tbl_time1_ANCOVA_LTFU_Adj <- 
  tbl_regression(lm(time1 ~ group + baseline + L, data = dataMod3),
                 show_single_row = c("group"),
                 include = "group")
tbl_change1_ANCOVA_LTFU_Adj <- 
  tbl_regression(lm(change1 ~ group + baseline + L, data = dataMod3),
                 show_single_row = c("group"),
                 include = "group")

Adding to our results we get:

Code

tbl_stack(
  list(
    tbl_time1_po |> modify_table_body(~.x %>% mutate(label = "time1 (PO)")),
    tbl_change1_po |> modify_table_body(~.x %>% mutate(label = "change1 (PO)")),
    tbl_time1_group |> modify_table_body(~.x %>% mutate(label = "time1 (RCT)")),
    tbl_change1_group |> modify_table_body(~.x %>% mutate(label = "change1 (RCT)")),
    tbl_time1_ANCOVA |> modify_table_body(~.x %>% mutate(label = "time1 (RCT ANCOVA)")),
        tbl_change1_ANCOVA |> modify_table_body(~.x %>% mutate(label = "change1 (RCT ANCOVA)")),
            tbl_time1_ANCOVA_LTFU_Naive |> modify_table_body(~.x %>% mutate(label = "time1 (RCT LTFU ANCOVA)")),
    tbl_change1_ANCOVA_LTFU_Naive |> modify_table_body(~.x %>% mutate(label = "change1 (RCT LTFU ANCOVA)")),
     tbl_time1_ANCOVA_LTFU_Adj |> modify_table_body(~.x %>% mutate(label = "time1 (RCT LTFU L Adjusted)")),
         tbl_change1_ANCOVA_LTFU_Adj |> modify_table_body(~.x %>% mutate(label = "change1 (RCT LTFU L Adjusted)"))
  ) 
) |>
  modify_caption("**Treatment Effects Across Different Models and Estimators**") |>
  modify_footnote(
    label ~ "PO = Potential Outcomes; RCT = Randomized Controlled Trial; LTFU (Lost to Follow-up)"
  ) %>%
  modify_footnote(
    estimate ~ "CI = Confidence Interval"
  ) |>
  modify_table_styling(
    columns = everything(),
    rows = label == c("time1 (RCT LTFU L Adjusted)", "change1 (RCT LTFU L Adjusted)"),
    text_format = "bold"
  )

**Treatment Effects Across Different Models and Estimators**
Characteristic¹	Beta²	95% CI	p-value
time1 (PO)	9.8	9.3, 10	<0.001
change1 (PO)	9.8	9.3, 10	<0.001
time1 (RCT)	9.1	6.6, 12	<0.001
change1 (RCT)	9.7	8.4, 11	<0.001
time1 (RCT ANCOVA)	9.7	8.5, 11	<0.001
change1 (RCT ANCOVA)	9.7	8.5, 11	<0.001
time1 (RCT LTFU ANCOVA)	8.2	6.8, 9.6	<0.001
change1 (RCT LTFU ANCOVA)	8.2	6.8, 9.6	<0.001
time1 (RCT LTFU L Adjusted)	10	9.6, 11	<0.001
change1 (RCT LTFU L Adjusted)	10	9.6, 11	<0.001
Abbreviation: CI = Confidence Interval
¹ PO = Potential Outcomes; RCT = Randomized Controlled Trial; LTFU (Lost to Follow-up)
² CI = Confidence Interval

Adjusting for L has given us an unbiased estimate of the effect of the treatment on the outcome. One subtle point that we will return to below is that adjusting for L allows us to estimate the effect of the treatment on the outcome, had no one been lost to follow-up. We can think of this as estimating the effect of a joint intervention on treatment and censoring, where we intervene to avoid censoring. If $Y_1$ is the outcome, $X$ is the treatment indicator and $C=0$ means a participant was not censored, then this estimand is

\[ E[Y_1^{(X=1, C=0)} - Y_1^{(X=0, C=0)}] \]

or in terms of the variables in our causal DAG:

\[ E[time1^{group=\text{Treatment}, censored = 0)} - time1^{(group=\text{Control}, censored = 0)}] \]

In the presence of censoring, it is important to explicitly state the causal estimand as a joint intervention to be clear about the effect we are interested in.

In this example, we assumed that L was measured; however, covariates that affect censoring are often unmeasured and may even be unknown. In such cases, we can turn to other approaches, such Inverse Probability of Censoring Weighting (IPCW), which we describe next.

Adjusting for Loss to Follow-up using IPCW

IPCW (Inverse Probability of Censoring Weighting) creates a pseudopopulation from our original population by weighting each participant’s outcome by the inverse probability of remaining uncensored. Censored patients have a weight of 0. In the most basic form of IPCW, the weighted means of the treated and control groups estimate the potential outcomes of the entire population.

To create the weights, we use logistic regression to fit a model that predicts the probability of remaining uncensored conditional on the treatment and any covariates required to block all back door paths from censoring to the outcome. Here, we will look at change1 for the outcome, because it adjusts for the baseline value. For our model, there are two such paths:

Code

ggdag_paths(dagMod3, from = "censored", to = "change1", shadow=TRUE) +
    aes(shape = ifelse(name %in% c("censored"), "square", "circle")) +
  scale_shape_manual(values = c(circle = 16, square = 15), guide = "none") +
  theme_dag() +
  theme(legend.position = "right")

# Step 1: Create uncensored indicator (opposite of censored)
dataMod3$uncensored <- 1 - dataMod3$censored

# Step 2: Fit logistic regression model to predict probability of NOT being censored
# Based on the DAG: censored ~ W, so we model uncensored ~ W
censoring_model <- glm(uncensored ~ group + W, data = dataMod3, family = binomial(link="logit"))

summary(censoring_model)


Call:
glm(formula = uncensored ~ group + W, family = binomial(link = "logit"), 
    data = dataMod3)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)      3.6632     0.4738   7.732 1.06e-14 ***
groupTreatment  -0.2312     0.5336  -0.433    0.665    
W               -4.7998     0.5351  -8.970  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 325.96  on 299  degrees of freedom
Residual deviance: 138.05  on 297  degrees of freedom
AIC: 144.05

Number of Fisher Scoring iterations: 6

Next we use the censoring model to predict the probability of being uncensored for each participant in the dataset.

# Step 3: Calculate predicted probabilities of being uncensored
dataMod3$prob_uncensored <- predict(censoring_model, dataMod3, type = "response")

Next we calculate the IPCW weights as the inverse of the probability of remaining uncensored for all uncensored participants and as 0 for censored particpants:

# Step 4: Calculate IPCW weights (only for uncensored observations)
dataMod3$ipcw_weight <- ifelse(dataMod3$uncensored == 1, 1/dataMod3$prob_uncensored, 0)

Finally, we calculated the weighted mean in the treated and control groups and find the difference to estimate the effect:

# Step 5: Calculate weighted means for treated and control groups (uncensored only)
uncensored_data <- dataMod3[dataMod3$uncensored == 1, ]

# Weighted mean for treated group (group = 1)
treated_mean <- weighted.mean(uncensored_data$change1[uncensored_data$group == "Treatment"],
                              uncensored_data$ipcw_weight[uncensored_data$group == "Treatment"], na.rm = TRUE)

# Weighted mean for control group (group = 0)
control_mean <- weighted.mean(uncensored_data$change1[uncensored_data$group == "Control"], 
                             uncensored_data$ipcw_weight[uncensored_data$group == "Control"], na.rm = TRUE)

# Step 6: Calculate treatment effect estimate
treatment_effect <- treated_mean - control_mean

cat("IPCW Treatment Effect Estimate:\n")

IPCW Treatment Effect Estimate:

cat("Treated group mean:", round(treated_mean, 3), "\n")

Treated group mean: 13.659

cat("Control group mean:", round(control_mean, 3), "\n")

Control group mean: 3.659

cat("Treatment effect (difference):", round(treatment_effect, 3), "\n")

Treatment effect (difference): 10

# Optional: Check weight distribution
cat("\nWeight summary:\n")


Weight summary:

print(summary(uncensored_data$ipcw_weight))

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.026   1.026   1.026   1.306   1.032   4.927

See (Howe et al. 2011) for a detailed explanation of limitations on IPCW.

Note: `s.d.denom` not specified; assuming "pooled".

ggdag_adjustment_set(dagMod3) + theme_dag()

dagMod3_censor_adjustment <- adjust_for(dagMod3,var = "censored")

gg_dagMod3_censored <- ggdag_adjust(dagMod3, var = "censored") +
  guides(shape = "none") # remove redundant shapes legend

gg_dagMod3_censored

# Get the tidy DAG out of the graph
dagMod3_censored <- gg_dagMod3_censored$data

#List all paths from treatment `group` to outcome `time2`

# `paths` currently ignores the adjusted nodes when determining which paths are open, so we have to specify adjusted nodes in the call

paths <- ggdag_paths(dagMod3_censored) #Z = adjustedNodes(dagMod3))
df_paths <- data.frame(path = paths$paths,
                       open = paths$open)
kable(df_paths, 
      col.names = c("Path", "Open"),
      caption = "Causal Paths and Their Status",
      align = c("l", "c"))



# Print the adjustment sets (if any)

adjustmentSets(dagMod3)

tidy_dagMod3 <- tidy_dagitty(dag1) |>
  mutate(shape = ifelse(name == "censored", "square", "circle"))

# Indicate that `censored` is adjusted for in this model

tidy_dag1 <- adjust_for(tidy_dag1, "censored")

# Plot with custom node shapes

# Get the status (exposure, outcome) information

status_data <- ggdag_status(tidy_dag1)$data

status_data <- status_data %>% mutate(gstatus = as.character(status), gstatus = ifelse(name == "censored", "adjusted", as.character(status)),

                                      gstatus = factor(gstatus,levels = c(levels(status), "adjusted")))

# Plot with custom node shapes while preserving status coloring

ggplot(status_data, aes(x = x, y = y, xend = xend, yend = yend)) +

  geom_dag_edges(arrow_directed = grid::arrow(length = unit(0.5, "cm"), type = "closed")) +

  geom_dag_point(aes(color = gstatus, shape = shape), size = 24) +

  geom_dag_text(color = "white") +

  scale_shape_manual(values = c(circle = 16, square = 15), guide = "none") +

  scale_color_discrete(name = "Status", na.value = "grey50") +

  theme_dag() +

  theme(legend.position = "right")

# `paths` currently ignores the adjusted nodes when determining which paths are open, so we have to specify the adjusted nodes in the call

#paths(dag1, Z = adjustedNodes(dag1))

We see that there are 18 paths from group to time2. $open is a boolean list showing whether each path is open or closed. All of these paths are open and since many of these paths are noncausal “backdoor” paths, meaning the arrows do not all point from group to time2 this means that if we measure the association between group and time2 in the data, that association is likely to exhibit bias from confounding. For example, paths 1 through 11 are all backdoor paths that go through censored. A collider is a variable on a path in which both arrows point toward it. As you can see, censored is a collider on every path it appears in. Colliders block association when they are not adjusted, but allow association to flow and bias the analysis when they are adjusted.

To properly measure the total causal effect of group on time2 we need to measure the association that flows through all paths that point from group to time2. Let’s use paths to list only the causal paths:

#paths(dag1, Z = adjustedNodes(dag1), directed = TRUE)

There are two causal paths, both open. To measure the total causal effect, we need to block the non-causal open paths by adjusting for variables, if possible. We can use ggdag to list the adjustment sets for the total causal effect of group on time2. This returns nothing because there is no way to adjust variables to block the open path caused through the collider censored.

#adjustmentSets(dag1, effect = "total")

We will use IP Weighting for censoring below, but before that, lets analyze this dataset as if there were no loss to follow-up.

References

Glymour, M Maria. 2022. “Commentary: Modelling Change in a Causal Framework.” International Journal of Epidemiology 51 (5): 1615–21. https://doi.org/10.1093/ije/dyac151.

Howe, Chanelle J., Stephen R. Cole, Joan S. Chmiel, and Alvaro Muñoz. 2011. “Limitation of Inverse Probability-of-Censoring Weights in Estimating Survival in the Presence of Strong Selection Bias.” American Journal of Epidemiology 173 (5): 569–77. https://doi.org/10.1093/aje/kwq385.

Pearl, Judea. 2010. “An Introduction to Causal Inference.” The International Journal of Biostatistics 6 (2). https://doi.org/10.2202/1557-4679.1203.

Tennant, Peter W G, Kellyn F Arnold, George T H Ellison, and Mark S Gilthorpe. 2021. “Analyses of ‘Change Scores’ Do Not Estimate Causal Effects in Observational Data.” International Journal of Epidemiology 51 (5): 1604–15. https://doi.org/10.1093/ije/dyab050.

Variable	Description
\(Y_0\)	Baseline measure (continuous)
\(Y_1\)	Follow-up measure (continuous)
\(X\)	Treatment (binary). \(x=1\) when treated, otherwise \(x = 0\)
\(\Delta Y\)	Change score. \(\Delta Y = Y_1 - Y_0\)
\(Y_1^x, \Delta Y^x\)	Follow-up measure and change score when treated (\(x = 1\))
\(Y_1^{x'}, \Delta Y^{x'}\)	Follow-up measure and change score when untreated ($x = 0$)