Question

我正在使用lifelines库来估计Cox PH模型。对于回归，我具有许多分类特征，我对它们进行了一次热编码并删除了每个特征的一列，以避免出现多重共线性问题（虚拟变量陷阱）。我没有附加代码，因为该示例可能与文档here中给出的示例相似。

通过运行cph.check_assumptions(data)，我收到有关每个虚拟变量都违反假设的信息：

Variable 'dummy_a' failed the non-proportional test: p-value is 0.0063.
Advice: with so few unique values (only 2), you can try `strata=['dummy_a']` in the call in `.fit`. See documentation in link [A] and [B] below.

我应该如何针对单个分类功能的多个虚拟变量理解建议？我应该将它们全部添加到地层吗？

我将不胜感激：）

Answer 1

@abu，您的问题在文档中引起了明显的空白-虚拟变量的操作违反了比例检验。在这种情况下，我建议 not 虚拟变量，然后将原始列添加为分层变量，例如：library(dplyr) anim <- polls_ %>% arrange(week) %>% ggplot(aes(week, resultados, group = partidos)) + geom_line() + geom_segment(aes(xend = as.POSIXct("2019-03-08 00:00:00", tz="UTC"), yend = resultados), linetype = 2, colour = 'grey') + geom_point(size = 2) + geom_text(aes(x = as.POSIXct("2019-03-15 00:00:00", tz="UTC"), label = partidos), hjust = 0) + transition_reveal(week) + coord_cartesian(clip = 'off') + labs(title = 'Opinion polling for the 2019 Spanish general election', y = 'Estimated results', x = 'week') + theme_minimal() + theme(plot.margin = margin(5.5, 40, 5.5, 5.5)) animate(anim, width = 900, height = 600, end_pause = 10, fps = 10, rewind = FALSE, duration = 15)

“生命线”中的Cox PH模型-违反了虚拟变量的假设

1 个答案: