我试图用glm
估算足球队的优势。
# data is dataframe (structure on bottom).
model <- glm(Goals ~ Home + Team + Opponent, family=poisson(link=log), data=data)
但得到错误:
Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' family") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In Ops.factor(y, 0) : ‘<’ not meaningful for factors
data
:
> data
Team Opponent Goals Home
1 5a51f2589d39c31899cce9d9 5a51f2579d39c31899cce9ce 3 1
2 5a51f2579d39c31899cce9ce 5a51f2589d39c31899cce9d9 0 0
3 5a51f2589d39c31899cce9da 5a51f2579d39c31899cce9cd 3 1
4 5a51f2579d39c31899cce9cd 5a51f2589d39c31899cce9da 0 0
> is.factor(data$Goals)
[1] TRUE
答案 0 :(得分:2)
来自glm()
功能文档的“详细信息”部分:
典型的预测变量具有形式响应〜术语,其中响应是(数字)响应向量,术语是一系列术语,用于指定响应的线性预测变量。
因此,您需要确保目标列为数字:
df <- data.frame( Team= c("5a51f2589d39c31899cce9d9", "5a51f2579d39c31899cce9ce", "5a51f2589d39c31899cce9da", "5a51f2579d39c31899cce9cd"),
Opponent=c("5a51f2579d39c31899cce9ce", "5a51f2589d39c31899cce9d9", "5a51f2579d39c31899cce9cd", "5a51f2589d39c31899cce9da "),
Goals=c(3,0,3,0),
Home=c(1,0,1,0))
str(df)
#'data.frame': 4 obs. of 4 variables:
# $ Team : Factor w/ 4 levels "5a51f2579d39c31899cce9cd",..: 3 2 4 1
# $ Opponent: Factor w/ 4 levels "5a51f2579d39c31899cce9cd",..: 2 3 1 4
# $ Goals : num 3 0 3 0
# $ Home : num 1 0 1 0
model <- glm(Goals ~ Home + Team + Opponent, family=poisson(link=log), data=df)
然后是输出:
> model
Call: glm(formula = Goals ~ Home + Team + Opponent, family = poisson(link = log),
data = df)
Coefficients:
(Intercept) Home Team5a51f2579d39c31899cce9ce
-2.330e+01 2.440e+01 -3.089e-14
Team5a51f2589d39c31899cce9d9 Team5a51f2589d39c31899cce9da Opponent5a51f2579d39c31899cce9ce
-6.725e-15 NA NA
Opponent5a51f2589d39c31899cce9d9 Opponent5a51f2589d39c31899cce9da
NA NA
Degrees of Freedom: 3 Total (i.e. Null); 0 Residual
Null Deviance: 8.318
Residual Deviance: 3.033e-10 AIC: 13.98