零膨胀模型

时间:2016-03-05 10:56:08

标签: r regression modeling

以下是我的回归数据:enter image description here

y是每2分钟时间内火车站平台的乘客数量,而A1至A17是广场上17个研究区域的乘客数量。通过改变X来考虑时间滞后。 有时,在大厅的研究区域内没有人等待,所以会出现过零。我计划使用零膨胀模型。我已经尝试过如下所示的代码,但它说“最小计数不为零”这意味着什么,我该如何解决?我已经完成了poisson,它没关系,但零膨胀不起作用。

    > setwd('C:/Users/zuzymelody/Desktop')
> try<-read.csv('0inflated_2mins27peak.csv',header=TRUE)
> attach(try)
> names(try)
 [1] "y"   "A1"  "A2"  "A3"  "A4"  "A5"  "A6"  "A7"  "A8"  "A9"  "A10" "A11"
[13] "A12" "A13" "A14" "A15" "A16" "A17"

    > model1<-glm(y~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15+A16+A17,family="poisson")
    > summary(model1)

Call:
glm(formula = y ~ A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8 + A9 + 
    A10 + A11 + A12 + A13 + A14 + A15 + A16 + A17, family = "poisson")

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-7.8598  -3.4571  -0.3663   2.1867  12.5183  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  6.102009   0.164497  37.095  < 2e-16 ***
A1          -0.017555   0.003665  -4.790 1.66e-06 ***
A2          -0.026101   0.017569  -1.486 0.137371    
A3          -0.179988   0.014976 -12.018  < 2e-16 ***
A4          -0.032584   0.007735  -4.213 2.52e-05 ***
A5          -0.019908   0.007014  -2.839 0.004532 ** 
A6          -0.044144   0.010266  -4.300 1.71e-05 ***
A7           0.049829   0.006518   7.645 2.09e-14 ***
A8          -0.080712   0.009819  -8.220  < 2e-16 ***
A9           0.007390   0.007105   1.040 0.298273    
A10          0.041116   0.004085  10.065  < 2e-16 ***
A11         -0.041420   0.008418  -4.921 8.62e-07 ***
A12         -0.008241   0.007304  -1.128 0.259171    
A13         -0.033161   0.008966  -3.699 0.000217 ***
A14          0.020818   0.005250   3.965 7.34e-05 ***
A15         -0.002995   0.006125  -0.489 0.624887    
A16         -0.061997   0.017122  -3.621 0.000294 ***
A17         -0.025025   0.008391  -2.982 0.002860 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1137.71  on 29  degrees of freedom
Residual deviance:  599.74  on 12  degrees of freedom
AIC: 840.1

Number of Fisher Scoring iterations: 5







     >with(model1, cbind(res.deviance = deviance, df = df.residual,
      p = pchisq(deviance, df.residual, lower.tail=FALSE)))

 res.deviance df             p
[1,]     599.7445 12 1.202013e-120

> require( pscl )
> Zip<-zeroinfl(model1,link="logit",dist="poisson")
**Error in zeroinfl(model1, link = "logit", dist = "poisson") : 
  invalid dependent variable, minimum count is not zero**
  

dput(尝试)   结构(列表(y = c)(156L,74L,221L,207L,168L,36L,128L​​,   208L,99L,117L,228L,211L,341L,173L,196L,310L,112L,203L,   104L,183L,325L,143L,218L,166L,218L,127L,136L,38L,102L,   34L),A1 = c(24L,24L,24L,19L,20L,9L,14L,23L,15L,23L,   14L,16L,15L,25L,25L,19L,24L,26L,25L,26L,22L,14L,13L,   15L,9L,12L,9L,12L,15L,18L),A2 = c(2L,4L,0L,3L,0L,   1L,1L,2L,1L,2L,0L,2L,2L,0L,1L,1L,3L,3L,2L,2L,3L,   2L,3L,5L,4L,3L,4L,1L,2L,1L),A3 = c(2L,2L,0L,1L,1L,   9L,3L,0L,0L,0L,1L,1L,3L,1L,0L,0L,1L,2L,3L,1L,0L,   1L,1L,0L,1L,1L,1L,0L,0L,2L),A4 = c(15L,11L,6L,7L,   10L,10L,5L,4L,5L,7L,9L,9L,4L,6L,6L,13L,9L,13L,9L,   10L,6L,6L,7L,6L,10L,9L,10L,7L,9L,2L),A5 = c(13L,10L,   6L,6L,11L,19L,13L,14L,7L,7L,6L,8L,10L,5L,7L,9L,   9L,11L,3L,13L,8L,8L,8L,6L,8L,9L,9L,14L,9L,6L),A6 = c(9L,   10L,9L,9L,4L,7L,7L,12L,11L,11L,12L,8L,6L,7L,8L,   5L,9L,6L,5L,6L,9L,11L,6L,6L,8L,9L,4L,11L,10L,7L   ),A7 = c(21L,16L,13L,13L,4L,9L,12L,13L,12L,12L,12L,   6L,7L,6L,6L,4L,5L,9L,8L,7L,9L,12L,10L,7L,8L,12L,   14L,2L,6L,6L),A8 = c(1L,5L,10L,10L,1L,9L,6L,6L,7L,   7L,5L,6L,3L,2L,4L,0L,4L,2L,5L,5L,5L,3L,2L,4L,3L,   8L,10L,8L,2L,5L),A9 = c(8L,9L,10L,10L,12L,19L,10L,   6L,6L,6L,0L,6L,8L,10L,2L,3L,6L,2L,2L,6L,5L,2L,   4L,1L,3L,7L,7L,4L,4L,2L),A10 = c(7L,10L,12L,20L,24L,   21L,24L,18L,20L,18L,26L,21L,12L,11L,18L,18L,19L,16L,   25L,21L,22L,14L,12L,17L,21L,14L,14L,10L,8L,7L),A11 = c(0L,   2L,1L,4L,2L,1L,1L,1L,13L,10L,12L,5L,2L,0L,5L,1L,   4L,4L,3L,3L,1L,1L,3L,3L,5L,5L,2L,10L,3L,4L),A12 = c(12L,   14L,14L,17L,10L,14L,13L,19L,7L,5L,6L,6L,8L,7L,13L,   11L,10L,8L,6L,6L,9L,14L,9L,10L,8L,9L,8L,9L,5L,7L   ),A13 = c(6L,2L,1L,5L,9L,6L,7L,4L,12L,5L,9L,10L,   3L,7L,4L,2L,2L,6L,4L,6L,7L,4L,9L,6L,11L,4L,5L,   4L,6L,6L),A14 = c(14L,13L,16L,11L,8L,6L,9L,13L,14L,   14L,9L,8L,12L,11L,13L,11L,18L,15L,20L,21L,17L,18L,   18L,18L,25L,20L,12L,9L,8L,8L),A15 = c(7L,6L,7L,5L,   4L,9L,12L,12L,11L,12L,9L,8L,7L,8L,10L,16L,8L,8L,   13L,10L,5L,5L,8L,10L,10L,4L,6L,6L,6L,7L),A16 = c(2L,   1L,3L,3L,1L,2L,3L,2L,3L,2L,2L,1L,2L,2L,3L,3L,2L,   1L,3L,4L,2L,5L,4L,8L,5L,2L,1L,2L,2L,2L),A17 = c(10L,   13L,13L,2L,5L,1L,3L,3L,5L,4L,4L,6L,4L,6L,3L,2L,   2L,2L,7L,8L,3L,7L,5L,6L,7L,6L,6L,3L,4L,3L))。。名称= c(“y”,   “A1”,“A2”,“A3”,“A4”,“A5”,“A6”,“A7”,“A8”,“A9”,“A10”,   “A11”,“A12”,“A13”,“A14”,“A15”,“A16”,“A17”),class =“data.frame”,row.names = c(NA,   -30L))

以上是可重复的例子。对不起,我第一次在这里发帖,不太了解规则

1 个答案:

答案 0 :(得分:1)

您的数据框在因变量$ y $:

中不包含零值
  

分钟(MYDATA $ y)的   [1] 34

您需要至少有一个$ y = 0 $。