Question

我想对此数据集执行逻辑回归。前三列是预测变量。第四列（值=否）和第五列（值=是）是响应变量。例如，在第一个ROW中，有53个“否”和6个“是”。在第二行中，有10个“否”和4个“是”。

下面是数据的链接。

如何将其转换为四列数据框？谢谢。

我想要的是这样的：

我尝试直接使用glm函数：

K1=glm(formula = cbind(notUsing,using) ~ age + education + wantsMore, 
    data = as.data.frame(data_imported), family = "binomial")

系数为：

> K1$coefficients
 (Intercept)     age25-29     age30-39     age40-49 educationlow wantsMoreyes 
   0.8082200   -0.3893816   -0.9086135   -1.1892389    0.3249947    0.8329548

Answer 1

在此处扩展我的评论。您的数据已经采用了正确的格式，可以使用R拟合广义线性模型。

它很好地隐藏在R文档中的某个地方，但是我敢打赌，如果阅读help(formula)，help(glm)，help(lm)或help(family)的某个地方有一个注释这种行为。

如果有两列指定success和not success，则正确的公式格式为cbind(success, not success) ~ explanatory variables。针对您的具体情况

glm(cbind(notUsing, Using) ~ age + education + wantsMore, data = [your df here], family = binomial)

可用于拟合某种模型。
这（某种程度上）等同于为每个notUsing和Using添加相同的行，例如，对于第1行，您将有53行，其中usage = no与age = <25，{{ 1}}，education = low和6行，其中wantsMore = yes。

usage = Yes

如何处理分类数据以进行逻辑回归？

1 个答案: