R逻辑回归模型。矩阵

时间:2017-08-18 04:56:09

标签: r statistics logistic-regression

我是R的新手,我正在努力理解逻辑回归的解决方案。到目前为止所做的就是删除未使用的变量,将数据拆分为训练和测试数据集。我正在尝试理解它在谈论model.matrix的部分内容。我刚刚进入R和统计数据,我不确定什么是model.matrix和什么是合同。这是代码:

## create design matrix; indicators for categorical variables (factors)
Xdel <- model.matrix(delay~.,data=DataFD_new)[,-1]
xtrain <- Xdel[train,]
xnew <- Xdel[-train,]
ytrain <- del$delay[train]
ynew <- del$delay[-train]
m1=glm(delay~.,family=binomial,data=data.frame(delay=ytrain,xtrain))
summary(m1)

有人可以告诉我model.matrix的用法吗?为什么我们不能直接创建分类变量的虚拟变量并将它们放在glm中?我很迷惑。 model.matrix有什么用?

1 个答案:

答案 0 :(得分:1)

的Marius&#39;评论解释了如何做到这一点 - 下面的代码只是给出了一个例子(我觉得这很有用,因为海报仍然很混乱)。

# Create example dataset. 'catvar' represents a categorical variable despite being coded with numbers.
X = data.frame("catvar" = sample(c(1, 2, 3), 100, replace = T),
               "numvar" = rnorm(100), 
               "y" = sample(c(0, 1), 100, replace = T))

# Check whether you're categorical variables are coded correctly. (They'll say 'factor' if so)
sapply(X, class) #catvar is coded as 'numeric', which is wrong.

# Tell 'R' that catvar is categorical. If your categorical variables are already classed as factors, you can skip this step
X$catvar = factor(X$catvar)
sapply(X, class) # check all variables are coded correctly

# Fit model to dataframe (i.e. without needing to convert X to a model matrix)
fit = glm(y ~ numvar + catvar, data = X, family = "binomial")