使用二进制结果运行泊松回归时出错

时间:2017-07-13 03:58:27

标签: r binary glm poisson

我正在尝试运行泊松回归来预测常见的二元结果。

这是我第一次尝试使用dput - 如果我使用不当,请告诉我,以便我能更正。

示例数据:

df <- structure(list(id = 1:30, sex = structure(c(1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 
2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L), .Label = c("Female", "Male"
), class = "factor"), migStat = structure(c(1L, 2L, 1L, 1L, 1L, 
1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("Australian-born", 
"Migrant"), class = "factor"), mhAreaBi = structure(c(1L, 1L, 
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("Metropolitan", 
"Regional"), class = "factor"), empStatBi = structure(c(2L, 2L, 
1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 
2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Student / employed", 
"Unemployed"), class = "factor"), pensBenBi = structure(c(1L, 
2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 
1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L), .Label = c("No benefit", 
"In receipt of pension benefit"), class = "factor"), maritStatBi = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("Married (including de facto)", 
"Not married"), class = "factor"), cto = structure(c(1L, 2L, 
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 
2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L), .Label = c("No", 
"Yes"), class = "factor")), .Names = c("id", "sex", "migStat", 
"mhAreaBi", "empStatBi", "pensBenBi", "maritStatBi", "cto"), row.names = c(NA, 
-30L), class = "data.frame")

在R中使用glm运行回归时,收到错误:

fit <- glm(cto ~ sex + migStat + mhAreaBi + empStatBi + pensBenBi + maritStatBi, df, family = poisson)

Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' family") : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In Ops.factor(y, 0) : ‘<’ not meaningful for factors

我们已经简要解释了同样的错误in this thread

  

因为&#34;&lt;&#34;运算符未定义因果结果   传递给if的长度为0.在RHS上设置因子变量   并使用hte上的整数值LHS成功。

将结果转换为整数时,不会出现错误;但是,这个:

  1. 似乎打败了预测二元结果的目的(除非范围0-1的数字变量被视为具有两个级别的因子变量);和
  2. 似乎没有必要(至少根据post使用来自geeglm的{​​{1}}来预测二元结果[不幸的是,我在调整代码时遇到了同样的错误到我自己的数据集])
  3. 问题:

    我可否收到有关错误的进一步说明?

    如果我将结果转换为范围为0-1的整数,geepack会将其视为二进制变量吗?如果没有,是否有更适合为常见二元结果运行回归的方法?

1 个答案:

答案 0 :(得分:1)

我认为这里最好的选择是:

df$cto_binary <- as.numeric(df$cto == "Yes")
fit <- glm(cto_binary ~ sex + migStat + mhAreaBi + empStatBi + pensBenBi + maritStatBi, 
           df, family = poisson)

通过这种方式,您可以在代码中明确地显示您的二进制结果中的1 /成功,并且不会因为因子级别的排序而被绊倒。请注意,在R as.numeric(c(FALSE, TRUE))中给出c(0, 1),因此您始终知道您将从逻辑比较中获得什么。