“生命线”中的Cox PH模型-违反了虚拟变量的假设

时间:2019-03-05 16:29:25

标签: python survival-analysis cox-regression lifelines

我正在使用lifelines库来估计Cox PH模型。对于回归,我具有许多分类特征,我对它们进行了一次热编码并删除了每个特征的一列,以避免出现多重共线性问题(虚拟变量陷阱)。我没有附加代码,因为该示例可能与文档here中给出的示例相似。

通过运行cph.check_assumptions(data),我收到有关每个虚拟变量都违反假设的信息:

Variable 'dummy_a' failed the non-proportional test: p-value is 0.0063.
Advice: with so few unique values (only 2), you can try `strata=['dummy_a']` in the call in `.fit`. See documentation in link [A] and [B] below.

我应该如何针对单个分类功能的多个虚拟变量理解建议?我应该将它们全部添加到地层吗?

我将不胜感激:)

1 个答案:

答案 0 :(得分:1)

@abu,您的问题在文档中引起了明显的空白-虚拟变量的操作违反了比例检验。在这种情况下,我建议 not 虚拟变量,然后将原始列添加为分层变量,例如:library(dplyr) anim <- polls_ %>% arrange(week) %>% ggplot(aes(week, resultados, group = partidos)) + geom_line() + geom_segment(aes(xend = as.POSIXct("2019-03-08 00:00:00", tz="UTC"), yend = resultados), linetype = 2, colour = 'grey') + geom_point(size = 2) + geom_text(aes(x = as.POSIXct("2019-03-15 00:00:00", tz="UTC"), label = partidos), hjust = 0) + transition_reveal(week) + coord_cartesian(clip = 'off') + labs(title = 'Opinion polling for the 2019 Spanish general election', y = 'Estimated results', x = 'week') + theme_minimal() + theme(plot.margin = margin(5.5, 40, 5.5, 5.5)) animate(anim, width = 900, height = 600, end_pause = 10, fps = 10, rewind = FALSE, duration = 15)