使用生存模型的R中AFT Weibull模型后的边际效应或预测值

时间:2018-07-16 20:24:09

标签: r survival-analysis marginal-effects

我正在使用生存软件包在R中运行加速故障时间(AFT)威布尔模型。它包括一个因子变量,并且具有群集稳健的标准误差。

我现在需要估计边际效应。换句话说,我需要为我的每个协变量估计一个标准差增加(或当协变量是虚拟变量时增加一个单位)的生存时间变化。我尝试使用margins包,但是没有对Surv对象进行测试,并且收到了一堆错误消息。

因此,我诉诸于使用预测函数来计算它们。但是,到目前为止,我一直没有成功。这是我到目前为止设计的:

  library(survival)
  library(dplyr)

  #Model
  ourdata$NOTCENSORED=1
  Execweibullmodel <-survreg(Surv(TOTALDAYSTORECEIVED, 
  NOTCENSORED)~DifferenceinPartyMeansfirstby10s + PartyImbalance2by10s + 
  LengthOfPlanningin10s +  FirstHundred_intent + WHWorkload_totalin10s + 
  AvgWkIntents + priorityappoint + IRC_Dummy + factor(Policy_num) +  
  RepublicanPresident + MonthlyPresApprovalAvg_Received + female 
  +cluster(presidentnumber), data=ourdata, dist="weibull", robust=TRUE)

  #Create mock dataset holding all continuous covariates constant at their 
  mean. Below I am changing LengthOfPlanningin10s, because I want to 
  calculate the change in survival time when I increase it from 70 to 248.

  ND <- with(ourdata, expand.grid(
  DifferenceinPartyMeansfirstby10s = 6.9695041,
  PartyImbalance2by10s=1.0748024,
  LengthOfPlanningin10s=c(70, 248),
  FirstHundred_intent=0.2840300,
  WHWorkload_totalin10s=74.7713290, 
  AvgWkIntents= 12.32512,
  priorityappoint=4, 
  IRC_Dummy=c(.176),
  Policy_num=levels(ourdata$Policy_num),
  RepublicanPresident=c(.638),
  MonthlyPresApprovalAvg_Received=c(56.2),
  female=c(.256),
  presidentnumber=levels(ourdata$presidentnumber)
  ))

 #Calculate predicted values

 prs <- predict(Execweibullmodel, ND, se.fit = TRUE, type = "lp")
 ND$pred <- prs[[1]]
 ND$se <- prs[[2]]
 ND$lo <- ND$pred - 1.96 * ND$se
 ND$up <- ND$pred + 1.96 * ND$se
 head(ND, 10)

 #Calculate the change in the predicted value from one level to the next,and 
 average across the factor variables
 newobject3=ND %>%
    group_by(Policy_num, presidentnumber ) %>% 
    dplyr::mutate(dydx2 = c(NA, diff(pred))) 

 #Calculate the average change
 mef1=mean(newobject3$dydx2, na.rm=TRUE)

这确实给了我一个价值,但它与Stata产生的边际效应有很大不同。 Stata使用略有不同的方法来计算聚类鲁棒性标准误差,因此系数估计值最多相差一个小数位。但是,Stata给我的边际影响要比使用此方法得到的幅度大。所以我想我做错了。

一些数据(我的数据集很大,所以前20个观察点可能不够用):

ourdata <-
structure(list(TOTALDAYSTORECEIVED = c(538, 561, 272, 384, 308, 
371, 184, 561, 167, 552, 254, 247, 37, 149, 55, 293, 440, 62, 
288, 461), DifferenceinPartyMeansfirstby10s = c(5.55999994277954, 
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954, 
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954, 
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954, 
5.55999994277954, 5.55999994277954, 5.55999994277954, 5.55999994277954, 
5.55999994277954, 5.55999994277954, 5.55999994277954), PartyImbalance2by10s = c(1.81818187236786, 
1.81818187236786, 1.41414141654968, 1.41414141654968, 1.41414141654968, 
1.41414141654968, 1.41414141654968, 1.81818187236786, 1.41414141654968, 
1.81818187236786, 1.41414141654968, 1.41414141654968, 1.41414141654968, 
1.41414141654968, 1.41414141654968, 1.41414141654968, 1.63265299797058, 
1.41414141654968, 1.41414141654968, 1.63265299797058), LengthOfPlanningin10s = c(35.4000015258789, 
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789, 
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789, 
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789, 
35.4000015258789, 35.4000015258789, 35.4000015258789, 35.4000015258789, 
35.4000015258789, 35.4000015258789, 35.4000015258789), FirstHundred_intent = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0), WHWorkload_totalin10s = c(30.6000003814697, 
28.6000003814697, 51.4000015258789, 43.5999984741211, 48, 45.4000015258789, 
59.2000007629395, 28.6000003814697, 63, 29.3999996185303, 52.4000015258789, 
53.2999992370605, 83.9000015258789, 65.5, 82.8000030517578, 48.9000015258789, 
38.4000015258789, 80.1999969482422, 49.5, 37.2999992370605), 
    AvgWkIntents = c(5, 11, 16, 16, 17, 17, 12, 11, 11, 3, 4, 
    17, 9, NA, NA, 10, 16, 9, 13, 4), priorityappoint = c(5L, 
    6L, 2L, 2L, 5L, 5L, 5L, 6L, 5L, 3L, 5L, 5L, 5L, 6L, 5L, 5L, 
    5L, 5L, 5L, 5L), IRC_Dummy = c(0, 1, 1, 1, 1, 1, 0, 1, 1, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Policy_num = structure(c(1L, 
    8L, 8L, 8L, 8L, 8L, 9L, 8L, 10L, 8L, 9L, 3L, 6L, 8L, 10L, 
    5L, 1L, 4L, 8L, 8L), .Label = c("SocialWelfare", "Agriculture", 
    "Commerce", "Infrastructure", "Justice", "Labor", "Treasury", 
    "Other", "Defense", "ForeignPolicy"), class = "factor"), 
    RepublicanPresident = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), MonthlyPresApprovalAvg_Received = c(42, 
    41.3333320617676, 56, 47, 52, 48, 58, 41.3333320617676, 58, 
    42, 56, 52, 53, 58.6666679382324, 60, 52, 44, 60, 52, 44), 
    female = c(0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 
    1, 0, 1, 0), presidentnumber = structure(c(1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("Reagan", "GHW Bush", "Clinton", "GW Bush", 
    "Obama", "Trump"), class = "factor")), .Names = c("TOTALDAYSTORECEIVED", 
"DifferenceinPartyMeansfirstby10s", "PartyImbalance2by10s", "LengthOfPlanningin10s", 
"FirstHundred_intent", "WHWorkload_totalin10s", "AvgWkIntents", 
"priorityappoint", "IRC_Dummy", "Policy_num", "RepublicanPresident", 
"MonthlyPresApprovalAvg_Received", "female", "presidentnumber"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20"
), class = "data.frame")

1 个答案:

答案 0 :(得分:1)

在收集示例数据时,我将提供以下基于this source的基于仿真的方法:

# Load data
data("CarpenterFdaData")

# Run a model  
model <- coxph(Surv(acttime, censor) ~ lethal*prevgenx, data = CarpenterFdaData)

# Simulate the Marginal Effect of interest:
marginal_effects <- coxsimInteract(model, 
                       b1 = "lethal", 
                       b2 = "prevgenx",
                       qi = "Marginal Effect",
                       X2 = seq(2, 115, by = 2), nsim = 1000)

enter image description here