Stage 3 · Causal Steering Effect

IPW · AIPW (doubly robust) · CATE by segment · marginaleffects

Why Naive Comparisons Fail

Network-steered claims are not a random sample. Claim type, severity, and region all influence both the probability of steering and the outcome (cost). A naive comparison confounds the treatment effect with selection.

Show code
# Show covariate imbalance before adjustment
claims |>
  group_by(steering_flag, claim_type) |>
  summarise(avg_severity = mean(severity_score), .groups = "drop") |>
  mutate(steered = factor(steering_flag, labels = c("Not steered", "Steered"))) |>
  ggplot(aes(x = claim_type, y = avg_severity, fill = steered)) +
  geom_col(position = "dodge", width = 0.7) +
  scale_fill_manual(values = c("Not steered" = "#66B5E8", "Steered" = "#003781"),
                    name = NULL) +
  scale_y_continuous(limits = c(0, 0.5)) +
  labs(title    = "Selection Bias: Severity Differs Between Steered vs. Non-Steered",
       subtitle = "Steered claims tend to have different severity profiles",
       x = "Claim Type", y = "Average Severity Score") +
  theme_allianz()

Propensity Score Model

Show code
ps_result <- fit_propensity(claims)
claims_ps <- ps_result$data
m_ps      <- ps_result$model

# Summary
cat("Propensity model AUC:", round(
  pROC::auc(pROC::roc(claims$steering_flag, fitted(m_ps), quiet = TRUE)), 3
), "\n")
Propensity model AUC: 0.707 

Propensity Score Distribution

Show code
ggplot(claims_ps, aes(x = ps, fill = factor(steering_flag))) +
  geom_histogram(binwidth = 0.025, alpha = 0.7, position = "identity") +
  scale_fill_manual(
    values = c("0" = "#66B5E8", "1" = "#003781"),
    labels = c("0" = "Not steered", "1" = "Steered"),
    name   = NULL
  ) +
  facet_wrap(~claim_type, nrow = 1) +
  labs(title    = "Propensity Score Distribution by Claim Type",
       subtitle = "Good overlap is required for valid IPW estimates",
       x = "Propensity Score P(Steered | X)", y = "Count") +
  theme_allianz()

Propensity Model Coefficients (sjPlot)

Show code
plot_model(m_ps, type = "est",
           title = "Propensity Model: Log-Odds of Steering",
           colors = c("#FF6600", "#003781")) +
  theme_allianz()

Marginal Predictions (ggeffects)

Show code
gg_ps <- ggpredict(m_ps, terms = c("severity_score [all]", "claim_type"))
plot(gg_ps) +
  scale_colour_manual(values = c(
    glass="#003781", body="#0066CC", engine="#00A9CE", total_loss="#FF6600"
  )) +
  labs(title    = "P(Steered) by Severity and Claim Type",
       subtitle = "Marginal predictions from propensity model",
       x = "Severity Score", y = "P(Steered)") +
  theme_allianz()

Overlap (Positivity) Diagnostic

Before computing IPW weights, check for positivity violations. AIPW requires 0 < P(treated | X) < 1 for all covariate combinations; near-zero or near-one propensity scores inflate IPW weights and destabilise the doubly-robust correction.

Show code
overlap_check <- claims_ps |>
  group_by(claim_type) |>
  summarise(
    n              = n(),
    pct_ps_low     = mean(ps < 0.05) * 100,   # near-zero: almost never steered
    pct_ps_high    = mean(ps > 0.95) * 100,   # near-one:  almost always steered
    max_trim_ipw   = max(trim_ipw),
    median_trim_ipw = median(trim_ipw),
    .groups = "drop"
  )

overlap_check |>
  gt() |>
  tab_header(
    title    = "Propensity Score Overlap Diagnostics",
    subtitle = "Rows with PS < 0.05 or > 0.95 signal potential positivity violations"
  ) |>
  fmt_number(columns = c(pct_ps_low, pct_ps_high), decimals = 1, suffix = "%") |>
  fmt_number(columns = c(max_trim_ipw, median_trim_ipw), decimals = 2) |>
  tab_style(
    style = cell_fill(color = "#FFE0CC"),
    locations = cells_body(
      rows = pct_ps_low > 5 | pct_ps_high > 5
    )
  ) |>
  tab_style(
    style = cell_fill(color = "#003781"),
    locations = cells_column_labels()
  ) |>
  tab_style(
    style = list(cell_text(color = "white", weight = "bold")),
    locations = cells_column_labels()
  ) |>
  tab_options(table.font.size = px(13))
Propensity Score Overlap Diagnostics
Rows with PS < 0.05 or > 0.95 signal potential positivity violations
claim_type n pct_ps_low pct_ps_high max_trim_ipw median_trim_ipw
glass 2890 0.0 0.4 2.69 0.83
body 4079 0.0 0.0 2.69 0.93
engine 1994 0.0 0.0 2.69 0.93
total_loss 1037 0.0 0.0 2.69 0.63
Warning

If pct_ps_low or pct_ps_high exceeds ~5% for any segment, AIPW estimates for that segment should be interpreted with caution. The 99th-percentile trimming of IPW weights (trim_ipw) mitigates but does not eliminate the instability.

IPW / AIPW Estimators

WeightIt: Stabilised IPW

Show code
w_out <- weightit(
  steering_flag ~ claim_type + vehicle_class + severity_score + region + vehicle_age,
  data   = claims,
  method = "ps",
  estimand = "ATE"
)
summary(w_out)
                  Summary of weights

- Weight ranges:

          Min                                  Max
treated 1.035 |-------|                      6.679
control 1.173 |---------------------------| 17.954

- Units with the 5 most extreme weights by group:
                                           
           5290   2274   2032   2026   1924
 treated  5.909  5.954  6.183  6.509  6.679
           2512   2444   1463   1420   1221
 control 13.951 14.289 16.605 17.492 17.954

- Weight statistics:

        Coef of Var   MAD Entropy # Zeros
treated       0.336 0.228   0.047       0
control       0.559 0.375   0.120       0

- Effective Sample Sizes:

           Control Treated
Unweighted 3817.   6183.  
Weighted   2907.31 5554.98

Balance Diagnostics (halfmoon)

Standardized Mean Differences (SMD) measure covariate balance between steered and non-steered groups before and after inverse-probability weighting. Good balance: |SMD| < 0.1.

Show code
library(halfmoon)
library(tidysmd)

balance_vars <- c("severity_score", "vehicle_age", "ncd_class",
                   "deductible_amount", "notification_delay_days")

# tidy_smd with .wts returns method="observed" + method="trim_ipw" rows
smd_df <- tidy_smd(
  claims_ps,
  all_of(balance_vars),
  .group = steering_flag,
  .wts   = trim_ipw
) |>
  mutate(
    abs_smd = abs(smd),
    method  = factor(method,
                     levels = c("observed", "trim_ipw"),
                     labels = c("Unweighted", "IPW weighted"))
  )

love_plot(smd_df) +
  geom_vline(xintercept = 0.1, linetype = "dashed", colour = "#FF6600", linewidth = 0.8) +
  scale_colour_manual(values = c(Unweighted = "#FF6600", `IPW weighted` = "#003781"),
                      aesthetics = c("colour", "fill"), name = NULL) +
  labs(
    title    = "Love Plot: Covariate Balance Before vs. After IPW",
    subtitle = "Dashed line = |SMD| = 0.1 threshold · Values below = good balance",
    x        = "|Standardized Mean Difference|", y = NULL
  ) +
  theme_allianz(grid = "y")

Doubly-Robust AIPW (with mirai bootstrap CIs)

The doubly-robust AIPW estimator is consistent if either the propensity model or the outcome model is correctly specified — providing valuable insurance against model misspecification.

Show code
# Point estimate
ate_result <- aipw_ate(claims_ps)
cat(sprintf("AIPW ATE: %.1f%% (CHF difference: %.0f)\n",
            ate_result$ate_pct, ate_result$ate_chf))
AIPW ATE: -6.0% (CHF difference: -314)

Bootstrap CIs for AIPW (500 resamples via mirai)

Show code
cat("Bootstrapping AIPW CIs (500 resamples, 4 mirai workers)...\n")
Bootstrapping AIPW CIs (500 resamples, 4 mirai workers)...
Show code
daemons(4)

boot_ate <- mirai_map(
  seq_len(500L),
  function(b, claims, utils_path) {
    suppressMessages(library(tidyverse))
    source(utils_path, local = TRUE)
    idx <- sample(nrow(claims), nrow(claims), replace = TRUE)
    d   <- claims[idx, ]
    ps_res <- fit_propensity(d)
    res    <- aipw_ate(ps_res$data)
    res$ate_pct
  },
  .args = list(claims = claims, utils_path = "../R/utils_models.R")
)

boot_ate_vec <- vapply(boot_ate[], function(x) {
  tryCatch(as.numeric(x), error = function(e) NA_real_)
}, numeric(1))
daemons(0)

ci_lower <- quantile(boot_ate_vec, 0.025, na.rm = TRUE)
ci_upper <- quantile(boot_ate_vec, 0.975, na.rm = TRUE)

cat(sprintf("AIPW ATE: %.1f%% [95%% CI: %.1f%%, %.1f%%]\n",
            ate_result$ate_pct, ci_lower, ci_upper))
AIPW ATE: -6.0% [95% CI: -7.2%, -4.7%]

ATE Summary

Summary: ATE of Network Steering on Repair Cost
Estimator ATE_pct CI_95 Note
Naive (raw) −44.2 Confounded by selection
IPW (WeightIt) NA IPW only
AIPW (doubly robust) −6.0 [-7.2%, -4.7%] Doubly robust

Heterogeneous Treatment Effects (CATE)

The overall ATE masks important heterogeneity. Steering benefits vary substantially by claim type — this is where the real business value lies.

CATE by Claim Type

Show code
cat("Fitting CATE models per segment (4 mirai workers)...\n")
Fitting CATE models per segment (4 mirai workers)...
Show code
daemons(4)

cate_results_mirai <- mirai_map(
  levels(claims_ps$claim_type),
  function(ct, claims_ps, utils_path) {
    suppressMessages(library(tidyverse))
    source(utils_path, local = TRUE)
    d   <- claims_ps[claims_ps$claim_type == ct, ]
    res <- tryCatch(aipw_ate(d), error = function(e) list(ate_pct = NA_real_))
    list(claim_type = ct, cate_pct = res$ate_pct, n = nrow(d))
  },
  .args = list(claims_ps = claims_ps, utils_path = "../R/utils_models.R")
)

cate_df <- do.call(rbind, lapply(cate_results_mirai[], function(x) {
  data.frame(claim_type = x$claim_type, cate_pct = x$cate_pct, n = x$n)
})) |>
  as_tibble()

daemons(0)

# Persist CATE results for the Shiny dashboard (avoids hard-coded values there)
saveRDS(cate_df, "../data/cate_results.rds")
Show code
# Add ground truth for reference
ground_truth <- tibble(
  claim_type = c("glass","body","engine","total_loss"),
  true_ate   = c(-20, -10, -8, 0)
)

cate_df |>
  left_join(ground_truth, by = "claim_type") |>
  ggplot(aes(x = claim_type)) +
  geom_col(aes(y = cate_pct, fill = cate_pct < 0), width = 0.65, alpha = 0.9) +
  geom_point(aes(y = true_ate), shape = 18, size = 5, colour = "#FF6600",
             position = position_nudge(x = 0)) +
  geom_hline(yintercept = 0, colour = "#6B7280", linewidth = 0.5, linetype = "dashed") +
  geom_text(aes(y = cate_pct, label = sprintf("%.1f%%", cate_pct)),
            vjust = ifelse(cate_df$cate_pct < 0, 1.5, -0.5),
            size = 3.5, fontface = "bold", colour = "#1A1A1A") +
  scale_fill_manual(values = c(`TRUE` = "#00A9CE", `FALSE` = "#FF6600"), guide = "none") +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  labs(
    title    = "CATE: Steering Effect on Cost by Claim Type",
    subtitle = "Bars = AIPW estimate · Orange diamonds = ground truth | n per segment labeled",
    caption  = "Ground truth: glass −20%, body −10%, engine −8%, total_loss ~0%",
    x        = "Claim Type",
    y        = "Cost Change (%)"
  ) +
  theme_allianz()

CATE by Region

Show code
cate_region <- claims_ps |>
  group_by(region) |>
  group_map(function(d, k) {
    res <- tryCatch(aipw_ate(d), error = function(e) list(ate_pct = NA_real_))
    tibble(region = k$region, cate_pct = res$ate_pct, n = nrow(d))
  }) |>
  bind_rows()

p_region <- ggplot(cate_region, aes(x = reorder(region, cate_pct), y = cate_pct,
                                     fill = cate_pct)) +
  geom_col(width = 0.7) +
  geom_hline(yintercept = 0, colour = "#6B7280", linetype = "dashed") +
  geom_text(aes(label = sprintf("%.1f%%\n(n=%d)", cate_pct, n)),
            hjust = ifelse(cate_region$cate_pct[order(cate_region$cate_pct)] < 0, 1.1, -0.1),
            size = 2.8, colour = "#1A1A1A") +
  scale_fill_gradient2(
    low = "#FF6600", mid = "#66B5E8", high = "#003781", midpoint = -10,
    guide = "none"
  ) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  coord_flip() +
  labs(
    title    = "CATE by Swiss Region",
    subtitle = "Regional heterogeneity in steering benefit",
    x        = NULL, y = "Estimated Cost Change (%)"
  ) +
  theme_allianz(grid = "y")

ggplotly(p_region)

Sensitivity Check: marginaleffects (OLS baseline)

The AIPW and causal forest above are the primary estimators. As a sensitivity check, marginaleffects::avg_comparisons() on a simple OLS interaction model should produce directionally consistent CATEs — if it does not, the more flexible methods should be trusted.

Show code
# OLS with interaction — NOT the preferred estimator (assumes homoscedastic errors
# on log-cost), but serves as a quick sanity check against AIPW.
m_outcome <- lm(log(repair_cost) ~ steering_flag * claim_type +
                  vehicle_class + severity_score + region + vehicle_age,
                data = claims)

avg_comp <- avg_comparisons(
  m_outcome,
  variables = "steering_flag",
  by        = "claim_type"
)

# Check directional agreement with AIPW
left_join(
  as_tibble(avg_comp) |> select(claim_type, ols_est = estimate, ols_lo = conf.low, ols_hi = conf.high),
  cate_df |> select(claim_type, aipw_pct = cate_pct),
  by = "claim_type"
) |>
  mutate(agree = sign(ols_est) == sign(aipw_pct / 100)) |>
  gt() |>
  tab_header(title = "Sensitivity: OLS marginaleffects vs. AIPW",
             subtitle = "Both estimators should agree in direction if assumptions hold") |>
  fmt_number(columns = c(ols_est, ols_lo, ols_hi), decimals = 4) |>
  fmt_number(columns = aipw_pct, decimals = 1, suffix = "%") |>
  tab_style(style = cell_fill(color = "#003781"), locations = cells_column_labels()) |>
  tab_style(style = list(cell_text(color = "white", weight = "bold")),
            locations = cells_column_labels()) |>
  tab_options(table.font.size = px(13))
Sensitivity: OLS marginaleffects vs. AIPW
Both estimators should agree in direction if assumptions hold
claim_type ols_est ols_lo ols_hi aipw_pct agree
glass −0.1807 −0.1952 −0.1662 NA NA
body −0.0972 −0.1079 −0.0865 NA NA
engine −0.0655 −0.0802 −0.0508 NA NA
total_loss −0.0473 −0.0687 −0.0259 NA NA

Business Interpretation

Note

Steering delivers the most value for glass and body damage claims. Engine damage shows moderate benefits. For total loss, steering provides no cost advantage — resources should focus on quality (CSAT) and speed rather than cost optimisation in this segment.

Steering Strategy Recommendations by Claim Type
Claim Type CATE (est.) Ground Truth Volume Share Recommendation
Glass NA% −20% 30% Prioritise steering — large cost saving
Body NA% −10% 40% High priority — moderate saving at high volume
Engine NA% −8% 20% Moderate priority — select quality partners
Total Loss NA% ~0% 10% No cost advantage — focus on speed & CSAT

Action Items

Warning

Immediate steering optimisation opportunities:

  1. Glass claims: maximise steering rate — a −20% cost effect at scale (30% of claims) represents the single largest cost-reduction lever. Target steering rate > 85% for glass.
  2. Body damage: partner quality gate — steer to partners with composite score > 60; average O/E ratio < 1.05. The −10% effect is real but heterogeneous by partner.
  3. Total loss: switch priority metric — since steering provides no cost benefit, rank total loss partners by duration_days and csat_score instead. Speed of settlement drives customer satisfaction in total loss cases.
  4. Engine damage: segment by vehicle class — CATE for engine damage varies by vehicle class; premium vehicles with ADAS systems have a higher partner-skill dependency.
  5. New variables to activate: fraud_flag × notification_delay_days as a real-time triage signal; incorporate into the steering decision logic.
Tip

Further analysis required:

  • Longitudinal CATE analysis: does the steering benefit change over time (partner learning effects)?
  • Mediation analysis: what fraction of the cost reduction is due to partner quality vs. speed?
  • Compliance analysis: why do some steerable claims not get steered? (identify friction points)
  • Test coverage_type (Vollkasko vs Teilkasko) as a moderator for the steering effect

Model Choice Rationale

Why doubly-robust AIPW instead of simple regression or matching?

The doubly-robust AIPW estimator provides valid causal inference if either the propensity model or the outcome model is correctly specified — but not necessarily both. This is practically important: we cannot be certain our logistic propensity model fully captures the assignment mechanism, and our outcome model inevitably misspecifies the true functional form. AIPW exploits both models simultaneously, offering a safety margin that neither IPW alone nor simple outcome regression provides.

Why not matching (MatchIt)? Matching discards unmatched units (typically 20–40% of data), reducing statistical power. For a 10,000-claim dataset, this sacrifice is unnecessary. The IPW approach retains all observations and achieves balance through re-weighting. The trimming at the 99th percentile of weights prevents extreme observations from dominating.

Why bootstrap CIs for AIPW instead of analytical variance estimation? The semiparametric variance of AIPW involves the influence function of both nuisance models simultaneously. The bootstrap approximation, parallelised via mirai, avoids these derivations and is asymptotically equivalent under mild regularity conditions.

Why segment CATE by claim type? The Conditional Average Treatment Effect (CATE) is the causally correct estimand when treatment effects are heterogeneous across subgroups. Reporting a single ATE would mask the zero effect in total loss (and could justify wasteful steering resources in that segment). Segment-level CATE directly maps to the steering strategy decision for each peril type.

Causal ML: Honest Causal Forests (grf)

Causal forests (Wager & Athey, 2018) estimate heterogeneous treatment effects at the individual level using honest random forests — a non-parametric causal ML alternative to the parametric AIPW approach used above. Key advantages:

  • Non-parametric: no functional form assumption for \tau(x)
  • Honest splitting: the sample used to build the tree structure is separate from the sample used to estimate leaf values, providing valid confidence intervals
  • Adaptive: the forest automatically focuses splitting power on regions of covariate space with high treatment effect heterogeneity
Show code
library(grf)

# Prepare matrices
X_vars <- c("severity_score", "vehicle_age", "ncd_class", "deductible_amount",
             "notification_delay_days")
X_cat  <- c("claim_type", "vehicle_class", "region", "coverage_type", "fault_indicator")

# One-hot encode categoricals
X_num <- claims |>
  select(all_of(X_vars)) |>
  as.matrix()

X_enc <- model.matrix(
  ~ claim_type + vehicle_class + region + coverage_type + fault_indicator - 1,
  data = claims
)

X  <- cbind(X_num, X_enc)
# Note on scale: grf requires a continuous Y; we use log(cost) to reduce the
# influence of large outliers and stabilise variance. The AIPW in utils_models.R
# uses a Gamma GLM on the original CHF scale. When comparing CATE estimates
# below, both are converted to % change — they should agree directionally but
# may differ in magnitude due to the scale difference (log-normal vs Gamma).
Y  <- log(claims$repair_cost)
W  <- claims$steering_flag

set.seed(2024)
cf <- causal_forest(
  X         = X,
  Y         = Y,
  W         = W,
  num.trees = 2000,
  tune.parameters = "all"
)

# Individual-level CATE estimates
tau_hat <- predict(cf, estimate.variance = TRUE)
claims_cf <- claims |>
  mutate(
    cate        = tau_hat$predictions,
    cate_pct    = (exp(cate) - 1) * 100,
    cate_se     = sqrt(tau_hat$variance.estimates),
    cate_lo     = (exp(cate - 1.96 * cate_se) - 1) * 100,
    cate_hi     = (exp(cate + 1.96 * cate_se) - 1) * 100
  )

cat("Causal forest ATE:", round((exp(average_treatment_effect(cf)[["estimate"]]) - 1) * 100, 1), "%\n")
Causal forest ATE: -10.4 %
Show code
# Calibration test: checks whether CATE estimates have the right mean and
# whether heterogeneity is real or noise. Best Linear Projection of tau(X)
# onto forest predictions — slope ≈ 1 means well-calibrated heterogeneity.
tc <- test_calibration(cf)
cat("\nCalibration test (BLP):\n")

Calibration test (BLP):
Show code
print(tc)

Best linear fit using forest predictions (on held-out data)
as well as the mean forest prediction as regressors, along
with one-sided heteroskedasticity-robust (HC3) SEs:

                               Estimate Std. Error t value    Pr(>t)    
mean.forest.prediction         1.000894   0.030926  32.364 < 2.2e-16 ***
differential.forest.prediction 1.027284   0.074255  13.835 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show code
cat("95% CI: [",
    round((exp(average_treatment_effect(cf)[["estimate"]] -
               1.96 * average_treatment_effect(cf)[["std.err"]]) - 1) * 100, 1),
    ",",
    round((exp(average_treatment_effect(cf)[["estimate"]] +
               1.96 * average_treatment_effect(cf)[["std.err"]]) - 1) * 100, 1),
    "]\n")
95% CI: [ -11.1 , -9.8 ]

CATE Distribution

Show code
ggplot(claims_cf, aes(x = cate_pct, fill = claim_type)) +
  geom_histogram(binwidth = 2, alpha = 0.75, colour = "white", linewidth = 0.2) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "#FF6600", linewidth = 0.8) +
  scale_fill_manual(
    values = c(glass="#003781", body="#0066CC", engine="#00A9CE", total_loss="#FF6600"),
    name   = "Claim Type"
  ) +
  facet_wrap(~claim_type, nrow = 1, scales = "free_y") +
  scale_x_continuous(labels = scales::percent_format(scale = 1)) +
  labs(
    title    = "Distribution of Individual CATE Estimates (Causal Forest)",
    subtitle = "Each bar = one claim · Dashed = zero effect · Colour = claim type",
    x        = "Estimated Treatment Effect on Cost (%)", y = "Count"
  ) +
  theme_allianz() +
  theme(legend.position = "none")

Comparing AIPW vs. Causal Forest CATEs

Show code
cf_segment <- claims_cf |>
  group_by(claim_type) |>
  summarise(
    cf_cate_pct = mean(cate_pct),
    cf_lo       = mean(cate_lo),
    cf_hi       = mean(cate_hi),
    .groups     = "drop"
  )

cate_compare <- left_join(
  cate_df,
  cf_segment |> rename(cf_cate = cf_cate_pct),
  by = "claim_type"
)

cate_compare_long <- cate_compare |>
  pivot_longer(c(cate_pct, cf_cate),
               names_to = "method", values_to = "estimate") |>
  mutate(method = recode(method,
                         cate_pct = "AIPW (doubly robust)",
                         cf_cate  = "Causal Forest (grf)"))

ggplot(cate_compare_long, aes(x = claim_type, y = estimate,
                               colour = method, group = method)) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "#6B7280") +
  geom_line(linewidth = 1, alpha = 0.7) +
  geom_point(size = 4) +
  geom_errorbar(
    data = filter(cate_compare_long, method == "Causal Forest (grf)"),
    aes(ymin = cf_lo, ymax = cf_hi), width = 0.2, linewidth = 0.8
  ) +
  scale_colour_manual(
    values = c("AIPW (doubly robust)" = "#003781", "Causal Forest (grf)" = "#00A9CE"),
    name   = NULL
  ) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  labs(
    title    = "AIPW vs. Causal Forest CATE Estimates",
    subtitle = "Error bars = 95% CI from causal forest · Both methods should broadly agree",
    x        = "Claim Type", y = "Estimated Cost Change (%)"
  ) +
  theme_allianz()

Note

Causal forest interpretation: If AIPW and causal forest estimates broadly agree (within their respective CIs), this triangulation increases confidence in the causal estimates. Disagreements reveal where parametric assumptions in the AIPW propensity or outcome model may be violated — the causal forest provides a more flexible non-parametric check. The causal forest additionally provides individual-level CATE estimates, enabling claim-by-claim decisions about whether network steering is expected to save costs.