
时间:2015-03-23 15:33:12

标签: r interaction


m1<-gam(death ~ pm10 + s(trend)+ s(temp), data=df1, na.action=na.omit, family=poisson)
m2<-gam(cvd ~ pm10 + s(trend)+ s(temp), data=df1, na.action=na.omit, family=poisson)
m3<-gam(others ~ pm10 + s(trend)+ s(temp), data=df1, na.action=na.omit, family=poisson)

每天都有死亡人数,其中大部分是死因。 1987年1月1日,有130人死亡(65人死于CVD,65人死于其他原因)。我的目的是通过暴露于PM10来确定CVD组中的死亡和其他原因是否存在差异。研究问题是:当接触PM10时,CVD和其他人的死亡率是否不同。在分层分析中,我可以将数据分成CVD组和其他组。但是在这个任务中,我有兴趣使用交互术语来运行模型。但我无法弄清楚如何做到这一点。 我想两次扩展每一行并为两个组创建一个虚拟变量(1为其他人,0为CVD)和单个列(newdeath),每天包含两行代表死亡,因为其他人与CVD相比。通过该设置(数据集df2如下所示),我想运行以下代码:

minter<-gam(newdeath~ pm10*dummy  + s(trend)+ s(temp), data=df2, na.action=na.omit, family=poisson)



df <- chicagoNMMAPS
df <- chicagoNMMAPS
df1 <- df[,c("date","dow","death","cvd","temp","pm10")] 
df1$others<-df1$death-df1$cvd # all other non-CVD deaths


> dput(df2)
structure(list(date = structure(c(6209, 6209, 6210, 6210, 6211, 
6211, 6212, 6212, 6213, 6213), class = "Date"), dow = structure(c(5L, 
5L, 6L, 6L, 7L, 7L, 1L, 1L, 2L, 2L), .Label = c("Sunday", "Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"), class = "factor"), 
    death = c(130L, 130L, 150L, 150L, 101L, 101L, 135L, 135L, 
    126L, 126L), cvd = c(65L, 65L, 73L, 73L, 43L, 43L, 72L, 72L, 
    64L, 64L), temp = c(-0.277777777777778, -0.277777777777778, 
    0.555555555555556, 0.555555555555556, 0.555555555555556, 
    0.555555555555556, -1.66666666666667, -1.66666666666667, 
    0, 0), pm10 = c(26.956073733, 26.956073733, NA, NA, 32.838694951, 
    32.838694951, 39.9560737332, 39.9560737332, NA, NA), trend = c(1L, 
    1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L), newdeath = c(65L, 65L, 
    77L, 73L, 58L, 43L, 63L, 72L, 62L, 64L), dummy = c(1, 0, 
    1, 0, 1, 0, 1, 0, 1, 0)), datalabel = "Written by R.              ", time.stamp = "24 Mar 2015 00:00", .Names = c("date", 
"dow", "death", "cvd", "temp", "pm10", "trend", "newdeath", "dummy"
), formats = c("%dD_m_Y", "%9.0g", "%9.0g", "%9.0g", "%9.0g", 
"%9.0g", "%9.0g", "%9.0g", "%9.0g"), types = c(255L, 253L, 253L, 
253L, 255L, 255L, 253L, 253L, 254L), val.labels = c("", "dow", 
"", "", "", "", "", "", ""), var.labels = c("date", "dow", "death", 
"cvd", "temp", "pm10", "trend", "others", ""), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"), version = 12L, label.table = structure(list(
    dow = structure(1:7, .Names = c("Sunday", "Monday", "Tuesday", 
    "Wednesday", "Thursday", "Friday", "Saturday"))), .Names = "dow"), class = "data.frame")

1 个答案:

答案 0 :(得分:0)

我同意@BondedDust ...似乎您的数据已经过多地聚合以回答您的问题,而且还有您想要的工具。然而,你可以衡量的是pm10与心脏病死亡比例之间的关系:

df1$prop.cvd <- df1$cvd / df1$death


plot(prop.cvd ~ pm10, data = df1)

也许在回归模型中使用此变量作为响应变量,就像这样 - 但问题与您的不同,并没有考虑pm10和时间效果的任何延迟。为此,您需要其他工具,但我无法在此帮助您。也许在Cross-Validated上提问可以帮助你进一步发展。

model <- glm(prop.cvd ~ pm10 + temp, data = df1)

glm(formula = prop.cvd ~ pm10 + temp, data = df1)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.1632  -0.0351  -0.0014   0.0349   0.3323  

              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.435e-01  1.529e-03 290.051   <2e-16 ***
pm10         5.294e-05  4.161e-05   1.272    0.203    
temp        -6.436e-04  7.437e-05  -8.654   <2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.002732995)

    Null deviance: 13.498  on 4862  degrees of freedom
Residual deviance: 13.282  on 4860  degrees of freedom
  (251 observations deleted due to missingness)
AIC: -14898

Number of Fisher Scoring iterations: 2