Causal Effect Analysis of Baseline Change

Author

Todd R. Johnson

Code
#remotes::install_github("jtextor/dagitty/r")
library(dagitty)
library(ggdag)
library(tidyverse)
library(DT)
library(gt)
library(gtsummary)
library(knitr)
library(kableExtra)
#  usethis::use_github_pages("main") # Tell github to publish on github pages
Construction Zone

This is a work in progress and subject to change nearly daily. At this stage there are numerous inconsistencies as I work to create a single simulated dataset that demonstrates the desired issues at hand while analyzing change from baseline.

The goal of this notebook is to demonstrate how to estimate treatment effects of a dichotomous (binary) treatment on change from a continuous baseline measure in both experimental and observational designs. For example, suppose we want to estimate the effect of a new drug vs. placebo for reducing blood pressure after 6 months of use. Here, the blood pressure at treatment assignment is the baseline value and blood pressure measured 6 months after treatment assignment is the follow-up measure.

There are two common approaches to defining the outcome: (1) Compare the difference in the follow-up measures between the experimental and control group; or (2) Compare the difference in change scores between the two groups, where a change score for an individual is the difference between the follow-up blood pressure and the baseline blood pressure.

We will use five increasingly complex data generating models to explore this problem:

  1. A ground truth model where the potential outcomes (the outcome measure under both treatment and control) are known for all participants.
  2. A Randomized Controlled Trial (RCT) with no loss to follow-up.
  3. An RCT with differential loss to follow-up.
  4. An observational study with treatment assignment confounding
  5. An observational study with treatment assignment counfounding and loss to follow-up.

We will also consider four causal estimands based on the two approaches discussed above:

  1. The average treatment effect (ATE) on the outcome at follow-up, expressed as the difference between the outcome under treatment vs. control:
    \[\mathrm{E}(Y_1^x) - \mathrm{E}(Y_1^{x'}) \tag{1}\]
    Here, we use the potential outcomes notation, where \(Y_1\) is the follow-up outcome, \(x\) is the treatment, and \(x'\) is the control. \(Y_1^x\) represents the value that a participant would have if treated. Hence this effect estimate is the difference between the expected outcome if all patients in the study were treated and the expected outcome if all patients were not treated.

  2. The average treatment effect on the change score:
    \[ \mathrm{E}(Y_1^x - Y_0^x) - \mathrm{E}(Y_1^{x'} - Y_0^{x'}) \tag{2}\]
    This is the difference between the change score \(Y_1 - Y_0\) under treatment vs. control, where \(Y_0\) is the baseline score.

  3. The conditional treatment effect on the follow-up outcome, conditioned on the observed baseline value \(Y_0\):
    \[ E[Y_1^x - Y_1^{x'} | Y_0 = y_0] \tag{3}\]
    This estimand represents the effect of treatment on the part of the follow-up outcome that hasn’t already been determined by the baseline value.

  4. The conditional treatment effect on the change score, conditioned on the observed baseline value \(Y_0\):
    \[ E[(Y_1^x - Y_0^x) - (Y_1^{x'} - Y_0^{x'}) | Y_0 = y_0] \tag{4}\]
    This estimand represents the effect of treatment on the part of the change score that hasn’t already been determined by the baseline value. Note that because the change score includes the baseline value \(Y_0\), this estimand is equivalent to Equation 3.

Estimands vs Estimators

A causal estimand expresses the value that we wish to estimate. A statistical estimator is how we estimate that value. For any given estimand there are often multiple possible estimators, each with possible strengths and weaknesses. For example, with an RCT we can estimate ATE by subtracting the differences in the means of the outcome variable across the groups or by using linear regression or a number of other approaches.

There is considerable controversy in the literature over whether and when to use the difference in outcome at follow-up Equation 1 vs the difference in the change score Equation 2 as well as whether to adjust for baseline values. We will explore some of these issues below.

Model 1: Additive Treatment Effect, Potential Outcomes Known

The Causal DAG for the first and simplest model looks like this:

Code
# Create the DAG using dagify
dagMod1 <- dagify(
  time1 ~ baseline + group + L,
  change1 ~ baseline + time1,
  exposure = "group",
  outcome = "time1",
  coords = data.frame(
    name = c("baseline", "L", "group", "time1", "change1"),
    x = c(0, 0, 4, 3, 1.5)*.5,
    y = c(1, 2, 2, 1, 0)*.5
  )
)

# Convert the dagitty graph to a ggdag group for prettier printing
tidy_dagMod1 <- tidy_dagitty(dagMod1) 

# Plot with custom node shapes
# Get the status (exposure, outcome) information and set variables without a status to "observed"
status_dataMod1 <- ggdag_status(tidy_dagMod1)$data |>
  mutate(status = ifelse(is.na(status), "observed", as.character(status)))

# Plot with custom node shapes while preserving status coloring
ggplot(status_dataMod1, aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_dag_edges(arrow_directed = grid::arrow(length = unit(0.5, "cm"), type = "closed")) +
  geom_dag_point(aes(color = status), size = 24) +
  geom_dag_text(color = "white") +
  scale_shape_manual(values = c(circle = 16, square = 15), guide = "none") +
  scale_color_discrete(name = "Status", na.value = "grey50") +
  theme_dag() +
  theme(legend.position = "right")

Causal DAG for Model 1 where group is the treatment variable and either time1 or change1 is the outcome.

The DAG uses common sense names instead of the typical variables used in potential outcomes:

\(Y_0\): baseline

\(Y_1\): time1 or change1 depending on which outcome we are interested in

\(X\): group where 0 is control and 1 is treatment

L: A binary prognostic factor that increases time1 by a fixed amount when \(L = 1\). Depending on what we wish to demonstrate we may treat L as measured or unmeasured.

Now let’s simulate potential outcomes for the causal DAG shown above. In this simulation, the causal effect of treatment on time1 is 10, meaning that treated patients will see their baseline score increase by 10 plus a random amount of normally distributed noise with mean 0 and standard deviation of 3 (\(\mu = 0\), \(\sigma = 3\)), whereas control will see only the random change. The simulation calculates the potential outcomes time1_x0 and time1_x1 for all participants, where x1 is the outcome under treatment, and x0 under control. Likewise, the simulation also calculates potential outcomes of the change score: change1_x0, and change1_x1.

Finally, we use the potential outcomes for each participant to calculate the individual treatment effect for both time1 and change1. An individual treatment effect for a participant is the difference between that person’s outcome under treatment and outcome under control.

Data Generating Model

Here is the data generating model code:

# Set seed for reproducibility
set.seed(124)

# 300 participants
n <- 300

# Generate baseline scores
baseline <- rnorm(n, mean = 50, sd = 10)

# Set additive effect
additive_effect <- 10  # improvement of +10 to time1 for treatment group

# Generate a prognostic factor (L) - binary (0: favorable, 1: unfavorable)
L <- prognostic_factor <- rbinom(n, 1, 0.4)

baseline_outcome_effect <- 10 # Effect of prognostic factor on outcome: time1

# Calculate potential outcomes for time1 for all participants
# time1 under control adds a small amount of random noise, so on average there is no effect of control on the baseline value
time1_x0 <- baseline +
  baseline_outcome_effect*L + 
  rnorm(n, mean = 0, sd = 3)

# time1 under treatment is on average 10 units higher than baseline
time1_x1 <- baseline +
  additive_effect +
  baseline_outcome_effect*L + 
  rnorm(n, mean = 0, sd = 3)

# Calculate potential outcomes for the change score
change1_x0 <- time1_x0 - baseline
change1_x1 <- time1_x1 - baseline

# Calculate individual treatment effects (ITE) for time1 and change1 for each participant
ite_time1 <- time1_x1 - time1_x0
ite_change1 <- change1_x1 - change1_x0

dataMod1 <- data.frame(
  id = 1:n,
  baseline = baseline,
  L = L,
  time1_x0 = time1_x0,
  time1_x1 = time1_x1,
  change1_x0 = change1_x0,
  change1_x1 = change1_x1,
  ite_time1 = ite_time1,
  ite_change1 = ite_change1
)

# Create the kable table with first 10 rows
dataMod1 %>%
  head(10) %>%
  kable(format = "html", digits = 2) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "left"
  )
id baseline L time1_x0 time1_x1 change1_x0 change1_x1 ite_time1 ite_change1
1 36.15 0 40.38 49.00 4.23 12.85 8.62 8.62
2 50.38 0 49.84 55.03 -0.54 4.65 5.19 5.19
3 42.37 1 50.51 54.88 8.14 12.51 4.38 4.38
4 52.12 1 57.69 70.04 5.56 17.92 12.35 12.35
5 64.26 1 71.98 83.12 7.72 18.87 11.15 11.15
6 57.44 0 57.00 73.46 -0.45 16.02 16.46 16.46
7 57.00 1 68.11 77.16 11.11 20.16 9.05 9.05
8 47.71 1 55.23 65.35 7.52 17.64 10.12 10.12
9 51.97 0 47.60 62.51 -4.37 10.54 14.91 14.91
10 62.07 0 56.82 67.17 -5.25 5.10 10.35 10.35

In this data there is no variable for group, because we generated the potential outcomes for all participants under both control and treatment. We will use group in the next model to simulate treatment assignment for an RCT.

The two equations for determining time1 under control and treatment are:

\[ time1 = baseline + 5L \]

and

\[ time1 = baseline + 5L + 10 \]

The average treatment effect is the difference between these two equations:

\[ ATE = baseline + 5L + 10 - (baseline + 5L) \]

which reduces to:

\[ ATE = 10 \]

Summary Statistics

Let’s look at a summary of the data:

Code
library(gtsummary)
table1 <- dataMod1 |> select(-id) |> tbl_summary()
table1
Characteristic N = 3001
baseline 49 (43, 55)
L 114 (38%)
time1_x0 53 (47, 60)
time1_x1 63 (56, 69)
change1_x0 2.3 (-0.5, 8.3)
change1_x1 12.5 (9.3, 18.0)
ite_time1 9.9 (6.9, 12.5)
ite_change1 9.9 (6.9, 12.5)
1 Median (Q1, Q3); n (%)

From the summary, we can see, as expected, that under treatment time1_x1 and change1_x1 are close to 10 points higher than control.

ATE on Follow-Up (Mean difference)

Since we have the potential outcomes for all patients we can calculate the Average Treatment Effect of the treatment on time1 (Equation 1) as the difference between the mean follow-up outcome between the treated and control groups:

po_estimate <- mean(dataMod1$time1_x1) - mean(dataMod1$time1_x0)
cat("Mean difference time1 (Treatment - Control):", round(po_estimate, 3), "\n")
Mean difference time1 (Treatment - Control): 9.825 

This is close to the true value of 10, but not perfect due to random noise added by the model and the relatively small sample size (150 participants).

ATE on Change Scores (Mean difference)

We can also compute the difference in change scores between the treated and untreated (Equation 5):

po_change1 <- mean(dataMod1$change1_x1) - mean(dataMod1$change1_x0)
cat("Mean difference change1 (Treatment - Control):", round(po_change1, 3), "\n")
Mean difference change1 (Treatment - Control): 9.825 

This is identical to the difference in follow-up outcomes. This is because with potential outcomes the baseline value under treatment and control for each patient is the same and hence cancel out across the two means below. To see this, we start with the estimand for the change score:

\[ \mathrm{E}(Y_1^x - Y_0^x) - \mathrm{E}(Y_1^{x'} - Y_0^{x'}) \tag{5}\]

Since

\[\mathrm{E}(X - Y) = \mathrm{E}(X) - \mathrm{E}(Y)\]

we can rewrite Equation 5 as:

\[ \mathrm{E}(Y_1^x) - \mathrm{E}(Y_0^x) - \mathrm{E}(Y_1^{x'}) + \mathrm{E}(Y_0^{x'}) \]

When the baseline values under treatment and control are identical for each patient the second and fourth terms are identical and therefore \(- \mathrm{E}(Y_0^{x}) + \mathrm{E}(Y_0^{x'}) = 0\) leaving only

\[ \mathrm{E}(Y_1^x) - \mathrm{E}(Y_1^{x'}) \]

which is just the difference between the follow-up outcomes under treatment and control.

So why analyze change scores? First, Glymour (2022) notes that in many clinical and epidemiological studies change is more meaningful as an effect than the difference between the follow-up outcomes. Second if baseline scores are imbalanced between treatment and control, such as in an observational study or insufficiently randomized RCT, analyzing change in follow-up scores can bias the effect estimate.

To calculate a p-value and confidence interval, we can do a paired t-test (to reflect that we have two correlated measures per participant):

# Perform paired t-test
t_test_result <- t.test(dataMod1$time1_x1, dataMod1$time1_x0, paired = TRUE)

# View results
#print(t_test_result)

# Extract specific values
mean_difference <- t_test_result$estimate
confidence_interval <- t_test_result$conf.int
p_value <- t_test_result$p.value

# Display results nicely
cat("Mean difference time1 (Treatment - Control):", round(mean_difference, 3), "\n")
cat("95% Confidence Interval:", round(confidence_interval[1], 3), "to", round(confidence_interval[2], 3), "\n")
cat("P-value:", round(p_value, 4), "\n")
Mean difference time1 (Treatment - Control): 9.825 
95% Confidence Interval: 9.348 to 10.303 
P-value: 0 

The paired t-test shows a statistically significant difference between treatment and control conditions.

The t-test for the change score change1 gives identical results:

# Perform paired t-test
t_test_result <- t.test(dataMod1$change1_x1, dataMod1$change1_x0, paired = TRUE)

# View results
#print(t_test_result)

# Extract specific values
mean_difference <- t_test_result$estimate
confidence_interval <- t_test_result$conf.int
p_value <- t_test_result$p.value

# Display results nicely
cat("Mean difference change1 (Treatment - Control):", round(mean_difference, 3), "\n")
cat("95% Confidence Interval:", round(confidence_interval[1], 3), "to", round(confidence_interval[2], 3), "\n")
cat("P-value:", round(p_value, 4), "\n")
Mean difference change1 (Treatment - Control): 9.825 
95% Confidence Interval: 9.348 to 10.303 
P-value: 0 

For later comparison to regression results, we can use regression instead of t-tests. However, since we have two measures for each participant (both potential outcomes) we need to use a linear mixed-effects model to account for within unit correlation.

The results summarize our analysis thus far. Here, Beta is the effect estimate.

Code
library(lme4)
library(lmerTest)  # for p-values
library(lme4)
library(merDeriv)

# effect on time1: reshape data to long format (2 rows per participant) for time1
dataMod1_Long <- dataMod1 |> 
  pivot_longer(cols = c(time1_x0, time1_x1),
                         names_to = "treatment",
               values_to = "time1_outcome") |>
  mutate(group = ifelse(treatment == "time1_x1", 1, 0)) |>
  mutate(group = factor(group, labels = c("Control", "Treatment")))

# Mixed effects model accounting for within-unit correlation
res_time1_po <- lmer(time1_outcome ~ group + (1|id),
                   data = dataMod1_Long)
tbl_time1_po <- tbl_regression(res_time1_po,
                               show_single_row = group)

# effect on change1: reshape data to long format (2 rows per participant) for time1
dataMod1_Long <- dataMod1 |> 
  pivot_longer(cols = c(change1_x0, change1_x1),
                         names_to = "treatment",
               values_to = "change1_outcome") |>
  mutate(group = ifelse(treatment == "change1_x1", 1, 0)) |>
  mutate(group = factor(group, labels = c("Control", "Treatment")))

# Mixed effects model accounting for within-unit correlation
res_change1_po <- lmer(change1_outcome ~ group + (1|id), 
                   data = dataMod1_Long)
tbl_change1_po <- tbl_regression(res_change1_po,
                                 show_single_row = group)

tbl_stack(
  list(
    tbl_time1_po |> modify_table_body(~.x %>% mutate(label = "time1 (PO)")),
    tbl_change1_po |> modify_table_body(~.x %>% mutate(label = "change1 (PO)"))
  ) 
) |>
  modify_caption("**Treatment Effects (Beta) Across Different Models and Estimators**")  |>
  modify_footnote(
    label ~ "PO = Potential Outcomes"
  ) 
Treatment Effects (Beta) Across Different Models and Estimators
Characteristic1 Beta 95% CI p-value
time1 (PO) 9.8 9.3, 10 <0.001
change1 (PO) 9.8 9.3, 10 <0.001
Abbreviation: CI = Confidence Interval
1 PO = Potential Outcomes

Given the small sample size and random noise, the potential outcomes analysis gives us the gold standard causal effect estimate by which we can judge all other effect estimates that use the same data.

We can now consider an RCT using the same data, but with randomization of treatment assignment.

Model2: RCT with Additive Treatment Effect

This model uses the same causal DAG and potential outcomes generated for Model 1. We copy the dataset from Model 1, but now we add the group variable that reflects the treatment each patient was randomized to. Based on the assigned treatment, we then copy the appropriate potential outcomes to the observed time1 and change1. Finally we save all of the data, then create a new data frame that includes only the variables we would see in an RCT.

Data Generation

dataMod2 <- dataMod1

# Assign participants to treatment vs control
group <- rep(c(0, 1), each = n/2) # 0 = control, 1 = treatment

# Copy the potential outcome for time1 and change1 based on treatment assignment
time1 <- ifelse(group == 0, time1_x0, time1_x1)
change1 <- ifelse(group == 0, change1_x0, change1_x1)

# Add the new variables to the Model 2 dataset
# Convert group to a factor, but use levels to make sure Treatment is first so that t.test
# substracts treatment means from control means
dataMod2$group <- factor(group, levels = c(0, 1), labels = c("Control", "Treatment"))
dataMod2$time1 <- time1
dataMod2$change1 <- change1

# dataMod2 contains all of the potential outcomes, lets copy it to preserve it and keep only the data we would see in an RCT
dataMod2_all <- dataMod2
dataMod2 <- dataMod2 |> select(id, group, baseline, L, time1, change1)
real_cols <- which(sapply(dataMod2, function(x) {
  is.numeric(x) && any(x %% 1 != 0, na.rm = TRUE)
}))
datatable(dataMod2, 
          options = list(pageLength = 10),
          class = "display compact stripe") |>
  formatRound(columns = real_cols, digits = 2)

Summary Statistics

Here is the overall summary of the data:

Code
table1_Mod2 <- dataMod2 |> 
  select(-id) |> 
  tbl_summary() |>
  bold_labels()
table1_Mod2
Characteristic N = 3001
group
    Control 150 (50%)
    Treatment 150 (50%)
baseline 49 (43, 55)
L 114 (38%)
time1 58 (50, 66)
change1 9 (3, 14)
1 n (%); Median (Q1, Q3)

And here is a summary showing the difference between control and treatment:

Code
table2_Mod2 <- dataMod2 |>
  select(-id) |>
  mutate(group = as.factor(group)) |>
  tbl_summary(by = group) |>
  add_p() |>
  bold_labels()
table2_Mod2
Characteristic Control
N = 1501
Treatment
N = 1501
p-value2
baseline 50 (44, 57) 49 (43, 55) 0.6
L 61 (41%) 53 (35%) 0.3
time1 53 (47, 60) 63 (55, 69) <0.001
change1 3 (-1, 9) 13 (9, 18) <0.001
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

We can see that both time1 and change1 differ significantly between the two groups. Note that even in this RCT, the baseline values between control and treatment groups are slightly different, though the difference is not significant. We can explore this more using a covariate balance plot:

 cobalt (Version 4.6.0, Build Date: 2025-04-15)
Note: `s.d.denom` not specified; assuming "pooled".

# Load required libraries
library(ggplot2)
library(plotly)

mean_baseline = mean(dataMod2$baseline)

# Calculate predictions at mean baseline for each group
pred1 <- predict(res_time1_group, newdata = data.frame(group = levels(dataMod2$group)[1], baseline = mean_baseline))
pred2 <- predict(res_time1_group, newdata = data.frame(group = levels(dataMod2$group)[2], baseline = mean_baseline))

# Create the plot
p <- ggplot(dataMod2, aes(x = baseline, y = time1, color = group)) +
  # Add points
  geom_point(alpha = 0.6, size = 2) +
  
  # Add regression lines for each group
  geom_smooth(method = "lm", se = TRUE, alpha = 0.2) +
  
  # Add vertical line at mean baseline
  geom_vline(xintercept = mean_baseline, linetype = "dashed", color = "black", size = 0.8) +
  
  # Add points for the endpoints of the vertical line (these will be invisible)
  geom_point(aes(x = mean_baseline, y = pred1, 
                text = paste("Group:", levels(dataMod2$group)[1], 
                           "<br>Baseline:", round(mean_baseline, 2),
                           "<br>Predicted Time2:", round(pred1, 2))),
             color = "red", size = 3, alpha = 0.8, inherit.aes = FALSE) +
  
  geom_point(aes(x = mean_baseline, y = pred2,
                text = paste("Group:", levels(dataMod2$group)[2], 
                           "<br>Baseline:", round(mean_baseline, 2),
                           "<br>Predicted Time2:", round(pred2, 2))),
             color = "red", size = 3, alpha = 0.8, inherit.aes = FALSE) +
  
  # Add the vertical line showing difference between groups at mean baseline
  geom_segment(
    x = mean_baseline, 
    xend = mean_baseline,
    y = pred1,
    yend = pred2,
    color = "red", 
    size = 2,
    arrow = arrow(ends = "both", angle = 90, length = unit(0.1, "inches"))
  ) +
  
  # Customize the plot
  labs(
    title = "Time2 vs Baseline by Group",
    subtitle = paste("Vertical line shows difference between groups at mean baseline (", 
                    round(mean_baseline, 2), ")", sep = ""),
    x = "Baseline",
    y = "Time2",
    color = "Group"
  ) +
  
  # Clean theme
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 11, color = "gray40"),
    legend.position = "bottom"
  )

# Convert to plotly for interactive tooltips
ggplotly(p, tooltip = "text") %>%
  layout(title = list(text = "Time1 vs Baseline by Group<br><sub>Vertical line shows difference between groups at mean baseline</sub>"))

A general rule of thumb is that the standardized mean differences between covariates should fall between \(+/-0.1\). As seen above, both L andbaseline fall within the threshold. If the baseline values of the treatment and control group are balanced, then an unbiased average treatment effect is just the mean of time2 for those in the treatment group minus the mean for those in the control group. Let’s calculate that here and see if the slight baseline imbalance creates a biased estimate:

means <- dataMod2 |>
  group_by(group) |>
  summarize(mean = mean(time1))
rct_time1_mean_diff <- filter(means, group=="Treatment")$mean - filter(means, group=="Control")$mean
cat("Mean difference in time1 from RCT (Treatment - Control):", round(rct_time1_mean_diff, 3), "\n")
Mean difference in time1 from RCT (Treatment - Control): 9.146 

Recall that the true causal effect is 10 and the estimate from this data based on potential outcomes is 9.8250863. The estimate from the RCT is lower than both. Let’s use regression to estimate the difference in means for time1 between the treatment and control groups and give us a p-value and confidence interval:

res_time1_group <- lm(time1 ~ group, data = dataMod2)
tbl_time1_group <- tbl_regression(res_time1_group, show_single_row = group)
tbl_time1_group
Characteristic Beta 95% CI p-value
group 9.1 6.6, 12 <0.001
Abbreviation: CI = Confidence Interval

The treatment group showed significantly higher time1 scores compared to the control group; however, the effect estimate is biased as a result of the slight baseline imbalance. Going back to the data, the mean baseline for the control group is higher than the mean in the treatment group:

baseline_means <- dataMod2 |>
  group_by(group) |>
  summarize(mean = mean(baseline))
gt(baseline_means) |>
  tab_style(style = cell_text(weight = 'bold'),
            locations = cells_column_labels()
  )
group mean
Control 49.87414
Treatment 49.30925

The difference between the mean baselines is

rct_mean_baseline_diff <- filter(baseline_means, group=="Treatment")$mean - 
  filter(baseline_means, group=="Control")$mean
cat("Mean difference in baseline from RCT (Treatment - Control):", round(rct_mean_baseline_diff, 3), "\n")
Mean difference in baseline from RCT (Treatment - Control): -0.565 

This baseline imbalance results in a biased effect estimate when we use the difference in follow-up scores. Those differences do not take the imbalance into effect, because they do not use the baseline score.

If we compare the analyses we have done so far, we see that analyzing time1 from the RCT gives a biased effect estimate with a confidence interval that is much wider than that from the potential outcomes analysis:

Code
tbl_stack(
  list(
    tbl_time1_po |> modify_table_body(~.x %>% mutate(label = "time1 (PO)")),
    tbl_change1_po |> modify_table_body(~.x %>% mutate(label = "change1 (PO)")),
    tbl_time1_group |> modify_table_body(~.x %>% mutate(label = "time1 (RCT)"))
  ) 
) |>
  modify_caption("**Treatment Effects Across Different Models and Estimators**") |>
  modify_footnote(
    label ~ "PO = Potential Outcomes; RCT = Randomized Controlled Trial"
  ) %>%
  modify_footnote(
    estimate ~ "CI = Confidence Interval"
  ) |>
  modify_table_styling(
    columns = everything(),
    rows = label == "time1 (RCT)",  # Replace with your specific row label
    text_format = "bold"
  )
Treatment Effects Across Different Models and Estimators
Characteristic1 Beta2 95% CI p-value
time1 (PO) 9.8 9.3, 10 <0.001
change1 (PO) 9.8 9.3, 10 <0.001
time1 (RCT) 9.1 6.6, 12 <0.001
Abbreviation: CI = Confidence Interval
1 PO = Potential Outcomes; RCT = Randomized Controlled Trial
2 CI = Confidence Interval

Now lets consider the effect of treatment on the change score, change1:

means <- dataMod2 |>
  group_by(group) |>
  summarize(mean = mean(change1))
rct_change1_mean_diff <- filter(means, group=="Treatment")$mean - filter(means, group=="Control")$mean
cat("Mean difference in change1 from RCT (Treatment - Control):", round(rct_change1_mean_diff, 3), "\n")
Mean difference in change1 from RCT (Treatment - Control): 9.711 

This estimate removes the bias due to the imbalanced baseline scores. Let’s use regression to generate stats:

res_change1_group <- lm(change1 ~ group, data = dataMod2)
tbl_change1_group <- tbl_regression(res_change1_group, show_single_row = group)
tbl_change1_group
Characteristic Beta 95% CI p-value
group 9.7 8.4, 11 <0.001
Abbreviation: CI = Confidence Interval

Here, we see that the means are statistically significantly different. Also, note the tighter confidence interval as compared to the estimates from analysis of the follow-up scores, time1 (RCT):

Code
tbl_stack(
  list(
    tbl_time1_po |> modify_table_body(~.x %>% mutate(label = "time1 (PO)")),
    tbl_change1_po |> modify_table_body(~.x %>% mutate(label = "change1 (PO)")),
    tbl_time1_group |> modify_table_body(~.x %>% mutate(label = "time1 (RCT)")),
      tbl_change1_group |> modify_table_body(~.x %>% mutate(label = "change1 (RCT)"))
  ) 
) |>
  modify_caption("**Treatment Effects Across Different Models and Estimators**") |>
  modify_footnote(
    label ~ "PO = Potential Outcomes; RCT = Randomized Controlled Trial"
  ) %>%
  modify_footnote(
    estimate ~ "CI = Confidence Interval"
  ) |>
  modify_table_styling(
    columns = everything(),
    rows = label == "change1 (RCT)",  # Replace with your specific row label
    text_format = "bold"
  )
Treatment Effects Across Different Models and Estimators
Characteristic1 Beta2 95% CI p-value
time1 (PO) 9.8 9.3, 10 <0.001
change1 (PO) 9.8 9.3, 10 <0.001
time1 (RCT) 9.1 6.6, 12 <0.001
change1 (RCT) 9.7 8.4, 11 <0.001
Abbreviation: CI = Confidence Interval
1 PO = Potential Outcomes; RCT = Randomized Controlled Trial
2 CI = Confidence Interval

As we have seen, if the baseline values are imbalanced, the effect estimate of the difference in means of time2 will be biased. This is why many analysts recommend estimating the mean differences in follow-up scores or change scores using analysis of covariance (ANCOVA) where the outcome is regressed on the treatment and adjusted for the baseline. This corresponds to the estimands presented earlier: Equation 3 and Equation 4.

Let’s do this for the follow-up score, time1:

res_time1_ANCOVA <- lm(time1 ~ group + baseline, data = dataMod2)
tbl_time1_ANCOVA <- tbl_regression(res_time1_ANCOVA,
                                   show_single_row = c("group"),
                                   include = "group")
tbl_time1_ANCOVA
Characteristic Beta 95% CI p-value
group 9.7 8.5, 11 <0.001
Abbreviation: CI = Confidence Interval

Using ANCOVA on the follow-up scores produces the same estimate as the change scores analysis. In other words, ANCOVA provides an unbiased estimate of the effect.

Now lets try ANCOVA on the change score:

res_change1_ANCOVA <- lm(change1 ~ group + baseline, data = dataMod2)
tbl_change1_ANCOVA <- tbl_regression(res_change1_ANCOVA, show_single_row = c("group"), include = "group")
tbl_change1_ANCOVA
Characteristic Beta 95% CI p-value
group 9.7 8.5, 11 <0.001
Abbreviation: CI = Confidence Interval

Here, there is little change from the marginal (non-ANCOVA) effect estimate.

Let’s look at the two new results in the context of our prior analyses:

Code
tbl_stack(
  list(
    tbl_time1_po |> modify_table_body(~.x %>% mutate(label = "time1 (PO)")),
    tbl_change1_po |> modify_table_body(~.x %>% mutate(label = "change1 (PO)")),
    tbl_time1_group |> modify_table_body(~.x %>% mutate(label = "time1 (RCT)")),
    tbl_change1_group |> modify_table_body(~.x %>% mutate(label = "change1 (RCT)")),
    tbl_time1_ANCOVA |> modify_table_body(~.x %>% mutate(label = "time1 (RCT ANCOVA)")),
        tbl_change1_ANCOVA |> modify_table_body(~.x %>% mutate(label = "change1 (RCT ANCOVA)"))
  ) 
) |>
  modify_caption("**Treatment Effects Across Different Models and Estimators**") |>
  modify_footnote(
    label ~ "PO = Potential Outcomes; RCT = Randomized Controlled Trial"
  ) %>%
  modify_footnote(
    estimate ~ "CI = Confidence Interval"
  ) |>
  modify_table_styling(
    columns = everything(),
    rows = label == c("time1 (RCT ANCOVA)", "change1 (RCT ANCOVA)"),
    text_format = "bold"
  )
Treatment Effects Across Different Models and Estimators
Characteristic1 Beta2 95% CI p-value
time1 (PO) 9.8 9.3, 10 <0.001
change1 (PO) 9.8 9.3, 10 <0.001
time1 (RCT) 9.1 6.6, 12 <0.001
change1 (RCT) 9.7 8.4, 11 <0.001
time1 (RCT ANCOVA) 9.7 8.5, 11 <0.001
change1 (RCT ANCOVA) 9.7 8.5, 11 <0.001
Abbreviation: CI = Confidence Interval
1 PO = Potential Outcomes; RCT = Randomized Controlled Trial
2 CI = Confidence Interval

Here we see that the analysis of change scores using either the marginal effect of group on change1 or ANCOVA recovers an unbiased estimate similar to the marginal effect of change1 (RCT). When analyzing follow-up scores, the marginal effect of group on time1 (time1 (RCT) is biased and has a wide confidence interval, but ANCOVA eliminates the bias and produces a confidence interval equivalent to the change score analyses.

Recall that with the noise in our data and the small sample size, the potential outcomes analysis (time1 (PO) and change1 (PO)) provide the gold standard estimate of the true effect, which in this model is 10.

Causal DAG Theory and Adjustment

According to Causal DAG theory, since there is no backdoor path from baseline to time1 we do not need to adjust for baseline in order to estimate the direct effect of group on time1; however, as we have seen above, the unadjusted analysis produces a biased estimate. This difference is due to the use of adjustment for two separate purposes:

  1. To block confounding paths in causal DAGs; and
  2. To improve precision and remove finite sample bias when estimating effects.

Since baseline is not a confounder and there are no other confounders in the causal DAG for our data generating model, by causal DAG theory baseline does not need to be adjusted for. However, adjusting for baseline provides a more efficient estimator of the direct effect, because it reduces residual variance and eliminates bias from chance baseline imbalance.

Summary

Thus far we have seen that there are several nuances to analyzing change from baseline, even with an RCT. We also saw that ANCOVA provided the best oror equally good estimate when analyzing both follow-up scores and change scores.

Now we turn to an RCT with loss to follow-up

Model 3: RCT, Additive Effect, Differential Loss to Follow-Up.

To be finished.

Model 3 uses the RCT data, but simulates differential loss to follow-up, meaning that patients drop out of the study before the outcome time1 can be measured. When a patient is lost to follow-up we say that that patient has been censored. In this simulation, patients are censored based on the presence of the prognostic factor L and which treatment group they are in, according to the following probabilities.

Code
# Create the data frame with the specified percentages

# Calculate censoring probabilities
prob_ltfu_unfavorable_control <- 0.15 # Probability of LTFU for L=1 in control group
prob_ltfu_unfavorable_treatment <- 0.7 # Probability of LTFU for L=1 in treatment group
prob_ltfu_favorable <- 0.02 # Probability of LTFU for L=0 in both groups


table_data <- data.frame(
L = c("0", "1"),
Control = c(paste0(prob_ltfu_favorable * 100, "%"), 
             paste0(prob_ltfu_unfavorable_control * 100, "%")),
Treatment = c(paste0(prob_ltfu_favorable * 100, "%"), 
               paste0(prob_ltfu_unfavorable_treatment * 100, "%"))
)

# Create the gt table
gt_table <- table_data %>%
  gt() %>%
  tab_header(
    title = "Probability of Loss to Follow-up"
  ) %>%
  cols_label(
    L = "L",
    Control = "Control",
    Treatment = "Treatment"
  ) %>%
  tab_spanner(
    label = "Group",
    columns = c(Control, Treatment)
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_column_labels()
  ) %>%
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_title()
  ) %>%
  opt_align_table_header(align = "center")

# Display the table
gt_table
Probability of Loss to Follow-up
L
Group
Control Treatment
0 2% 2%
1 15% 70%

Patients with L see an increase by 10 in time1, but are much more likely to drop out of the study if they are also in the treatment group. This will tend to decrease the difference in time1 between the treatment and control groups, hence resulting in underestimating the treatment effect.

Let’s generate the censored dataset:

# Copy dataMod2
dataMod3 <- dataMod2

# Simulate differential loss to follow-up (LTFU) based on L and Treatment
dataMod3$ltfu_prob <- ifelse(dataMod3$L == 1 & dataMod3$group == "Treatment", prob_ltfu_unfavorable_treatment,
                          ifelse(dataMod3$L == 1 & dataMod3$group == "Control", prob_ltfu_unfavorable_control,
                                 prob_ltfu_favorable))


# Generate censoring indicator (1 = censored/dropped out, 0 = observed)
dataMod3$censored <- rbinom(n, 1, dataMod3$ltfu_prob)

# Apply censoring to the data (set values to NA)
dataMod3$time1[dataMod3$censored == 1] <- NA
dataMod3$change1[dataMod3$censored == 1] <- NA

Summary Statistics

As expected, the summary statistics comparing control and treatment groups show differential loss to follow-up:

Code
table2_Mod3 <- dataMod3 |>
  select(-id) |>
  mutate(group = as.factor(group)) |>
  tbl_summary(by = group) |>
  add_p() |>
  bold_labels()
table2_Mod3
Characteristic Control
N = 1501
Treatment
N = 1501
p-value2
baseline 50 (44, 57) 49 (43, 55) 0.6
L 61 (41%) 53 (35%) 0.3
time1 52 (47, 59) 62 (54, 67) <0.001
    Unknown 9 43
change1 2.0 (-1.2, 8.1) 10.3 (8.4, 14.1) <0.001
    Unknown 9 43
ltfu_prob

<0.001
    0.02 89 (59%) 97 (65%)
    0.15 61 (41%) 0 (0%)
    0.7 0 (0%) 53 (35%)
censored 9 (6.0%) 43 (29%) <0.001
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test

Censored patients have missing outcomes, which means that we must drop them from the dataset prior to applying any of the methods we used above. Let’s look at how that affects our effect estimates on time1 and change1 using ANCOVA:

Code
tbl_time1_ANCOVA_LTFU_Naive <- 
  tbl_regression(lm(time1 ~ group + baseline, data = dataMod3),
                 show_single_row = c("group"),
                 include = "group")
tbl_change1_ANCOVA_LTFU_Naive <- 
  tbl_regression(lm(change1 ~ group + baseline, data = dataMod3),
                 show_single_row = c("group"),
                 include = "group")
tbl_stack(
  list(
    tbl_time1_po |> modify_table_body(~.x %>% mutate(label = "time1 (PO)")),
    tbl_change1_po |> modify_table_body(~.x %>% mutate(label = "change1 (PO)")),
    tbl_time1_group |> modify_table_body(~.x %>% mutate(label = "time1 (RCT)")),
    tbl_change1_group |> modify_table_body(~.x %>% mutate(label = "change1 (RCT)")),
    tbl_time1_ANCOVA |> modify_table_body(~.x %>% mutate(label = "time1 (RCT ANCOVA)")),
        tbl_change1_ANCOVA |> modify_table_body(~.x %>% mutate(label = "change1 (RCT ANCOVA)")),
        tbl_time1_ANCOVA_LTFU_Naive |> modify_table_body(~.x %>% mutate(label = "time1 (RCT LTFU ANCOVA)")),
    tbl_change1_ANCOVA_LTFU_Naive |> modify_table_body(~.x %>% mutate(label = "change1 (RCT LTFU ANCOVA)"))
  ) 
) |>
  modify_caption("**Treatment Effects Across Different Models and Estimators**") |>
  modify_footnote(
    label ~ "PO = Potential Outcomes; RCT = Randomized Controlled Trial"
  ) %>%
  modify_footnote(
    estimate ~ "LTFU = Loss To Follow-Up"
  ) |>
  modify_table_styling(
    columns = everything(),
    rows = label == c("time1 (RCT LTFU ANCOVA)", "change1 (RCT LTFU ANCOVA)"),
    text_format = "bold"
  )
Treatment Effects Across Different Models and Estimators
Characteristic1 Beta2 95% CI p-value
time1 (PO) 9.8 9.3, 10 <0.001
change1 (PO) 9.8 9.3, 10 <0.001
time1 (RCT) 9.1 6.6, 12 <0.001
change1 (RCT) 9.7 8.4, 11 <0.001
time1 (RCT ANCOVA) 9.7 8.5, 11 <0.001
change1 (RCT ANCOVA) 9.7 8.5, 11 <0.001
time1 (RCT LTFU ANCOVA) 8.0 6.7, 9.2 <0.001
change1 (RCT LTFU ANCOVA) 8.0 6.7, 9.2 <0.001
Abbreviation: CI = Confidence Interval
1 PO = Potential Outcomes; RCT = Randomized Controlled Trial
2 LTFU = Loss To Follow-Up

The naive analysis of the data, that completely drops censored patients, underestimates the treatment effect.

Adjusting for Loss To Follow-up

When we ignore patients who are lost to follow-up we run the risk of creating a selection bias. To see this, let’s look at the causal DAG for Model 3 including the censoring indicator:

Code
# Create the DAG using dagify
dagMod3 <- dagify(
  time1 ~ baseline + group + L,
  censored ~ L + group,
  change1 ~ baseline + time1,
  exposure = "group",
  outcome = "time1",
  coords = data.frame(
    name = c("baseline", "L", "censored", "group", "time1", "change1"),
    x = c(0, 0, 3, 4, 3, 1.5)*.5,
    y = c(1, 2, 2, 2, 1, 0)*.5
  )
)

ggdag_status(dagMod3, var = "censored",
             text = TRUE, node_size = 25, text_size = 4) +
  geom_dag_edges(start_cap = ggraph::circle(10, 'mm'),
                 end_cap = ggraph::circle(10, 'mm'),  # Larger end cap for bigger nodes
                 arrow_directed = grid::arrow(length = grid::unit(15, "pt"), 
                                             type = "closed")) +
  guides(shape = "none") + 
  theme_dag() +
  theme(plot.margin = margin(20, 20, 20, 20)) 

Causal DAG for Model 3 where group is the treatment variable, censoring codes whether a patient is lost to follow-up, and either time1 or change1 is the outcome.

To estimate the causal effect of group on time1 we need to block any open backdoor paths between group and time1 while keeping forward, causal, paths open. A backdoor path is an acyclic path from the treatment variable to the outcome variable where at least one arrow points back toward the treatment. There are two paths between group and time:

Code
paths <- paths(dagMod3)
df_paths <- data.frame(path = paths$paths,
                       open = paths$open)

df_paths %>%
  gt() %>%
  cols_label(
    path = "Path",
    open = "Open"
  ) %>%
  tab_header(
    title = "Causal Paths Analysis") %>%
  cols_align(
    align = "center",
    columns = open
  ) %>%
  cols_align(
    align = "left", 
    columns = path
  )
Causal Paths Analysis
Path Open
group -> censored <- L -> time1 FALSE
group -> time1 TRUE
Code
# To show open paths graphically with censoring use this:
# dag_paths(dagMod3, shadow=TRUE, adjust_for = "censored") + theme_dag()

The second path from group to time1 is open, meaning that association is allowed to flow along it. To get an unbiased estimate of the total effect of group on time1 all forward (causal) paths from group to time1 must be open and all backdoor (non-causal paths) must be closed. In this case, there is only one backdoor path: the path going through censored; however, as long as we don’t adjust for censored that path is blocked because censored along this path is a collider. A collider blocks the flow of association along the path. However, if we analyze only patients who remained in the study (were not censored) then we are selecting only uncensored patients, which means that we are setting \(censored = 1\), hence we are adjusting censored.

When we adjust censored, we get the following graph that allows non-causal association to flow between group and time1:

ggdag_adjust(dagMod3, var = "censored",
             text = TRUE, node_size = 25, text_size = 2.5) +
  guides(shape = "none") + 
  theme_dag() +
  theme(plot.margin = margin(20, 20, 20, 20)) 

The censored variable is shown with a square around it to indicate that by analyzing only uncensored patients, censored is adjusted for in the analysis. Since all arrows point into censored, any path from treatment group to effect time2 that includes censored will transmit spurious association through censored unless the path is blocked at another point.

The naive analysis answers the causal question: What is the effect of treatment on the outcome among patients who were not lost to follow-up. Instead, we want to know the causal effect of the treatment on the outcome had no one been lost to follow-up.

According to Pearl’s causal inference framework (Pearl 2010), adjusting for L will block the non-causal path. One way to do this is by adding L as a covariate to the regression:

tbl_time1_ANCOVA_LTFU_Adj <- 
  tbl_regression(lm(time1 ~ group + baseline + L, data = dataMod3),
                 show_single_row = c("group"),
                 include = "group")
tbl_change1_ANCOVA_LTFU_Adj <- 
  tbl_regression(lm(change1 ~ group + baseline + L, data = dataMod3),
                 show_single_row = c("group"),
                 include = "group")

To avoid selection bias we need to block the association that flows through the non-causal path.

ggdag_adjustment_set(dagMod3) + theme_dag()

dagMod3_censor_adjustment <- adjust_for(dagMod3,var = "censored")

gg_dagMod3_censored <- ggdag_adjust(dagMod3, var = "censored") +
  guides(shape = "none") # remove redundant shapes legend

gg_dagMod3_censored

# Get the tidy DAG out of the graph
dagMod3_censored <- gg_dagMod3_censored$data

#List all paths from treatment `group` to outcome `time2`

# `paths` currently ignores the adjusted nodes when determining which paths are open, so we have to specify adjusted nodes in the call

paths <- ggdag_paths(dagMod3_censored) #Z = adjustedNodes(dagMod3))
df_paths <- data.frame(path = paths$paths,
                       open = paths$open)
kable(df_paths, 
      col.names = c("Path", "Open"),
      caption = "Causal Paths and Their Status",
      align = c("l", "c"))



# Print the adjustment sets (if any)

adjustmentSets(dagMod3)

tidy_dagMod3 <- tidy_dagitty(dag1) |>
  mutate(shape = ifelse(name == "censored", "square", "circle"))

# Indicate that `censored` is adjusted for in this model

tidy_dag1 <- adjust_for(tidy_dag1, "censored")

# Plot with custom node shapes

# Get the status (exposure, outcome) information

status_data <- ggdag_status(tidy_dag1)$data

status_data <- status_data %>% mutate(gstatus = as.character(status), gstatus = ifelse(name == "censored", "adjusted", as.character(status)),

                                      gstatus = factor(gstatus,levels = c(levels(status), "adjusted")))

# Plot with custom node shapes while preserving status coloring

ggplot(status_data, aes(x = x, y = y, xend = xend, yend = yend)) +

  geom_dag_edges(arrow_directed = grid::arrow(length = unit(0.5, "cm"), type = "closed")) +

  geom_dag_point(aes(color = gstatus, shape = shape), size = 24) +

  geom_dag_text(color = "white") +

  scale_shape_manual(values = c(circle = 16, square = 15), guide = "none") +

  scale_color_discrete(name = "Status", na.value = "grey50") +

  theme_dag() +

  theme(legend.position = "right")
# `paths` currently ignores the adjusted nodes when determining which paths are open, so we have to specify the adjusted nodes in the call

#paths(dag1, Z = adjustedNodes(dag1))

We see that there are 18 paths from group to time2. $open is a boolean list showing whether each path is open or closed. All of these paths are open and since many of these paths are noncausal “backdoor” paths, meaning the arrows do not all point from group to time2 this means that if we measure the association between group and time2 in the data, that association is likely to exhibit bias from confounding. For example, paths 1 through 11 are all backdoor paths that go through censored. A collider is a variable on a path in which both arrows point toward it. As you can see, censored is a collider on every path it appears in. Colliders block association when they are not adjusted, but allow association to flow and bias the analysis when they are adjusted.

To properly measure the total causal effect of group on time2 we need to measure the association that flows through all paths that point from group to time2. Let’s use paths to list only the causal paths:

#paths(dag1, Z = adjustedNodes(dag1), directed = TRUE)

There are two causal paths, both open. To measure the total causal effect, we need to block the non-causal open paths by adjusting for variables, if possible. We can use ggdag to list the adjustment sets for the total causal effect of group on time2. This returns nothing because there is no way to adjust variables to block the open path caused through the collider censored.

#adjustmentSets(dag1, effect = "total")

We will use IP Weighting for censoring below, but before that, lets analyze this dataset as if there were no loss to follow-up.

References

Glymour, M Maria. 2022. “Commentary: Modelling Change in a Causal Framework.” International Journal of Epidemiology 51 (5): 1615–21. https://doi.org/10.1093/ije/dyac151.
Pearl, Judea. 2010. “An Introduction to Causal Inference.” The International Journal of Biostatistics 6 (2). https://doi.org/10.2202/1557-4679.1203.