运行逻辑回归时遇到以下两个主要问题:
我的X变量包括因子变量,例如移民身份(immigrant
,non-immigrant
);我的Y变量是二项式变量,低出生体重(non-lbw
,lbw
)。
我运行以下R脚本(我正在使用plsRglm
包):
library(plsRglm)
model.plsrglm <- plsRglm(yair, xair, 3, modele="pls-glm-logistic")
1)如果我不删除y或x中的所有NA
值,则R返回:
summary(model.plsrglm)
Call
plsRglmmodel.default(dataY = yair, dataX = xair, nt = 6,
modele = "pls-glm-logistic")
> model.plsrglm
Number of required components:
NULL
Number of successfully computed components:
NULL
Coefficients:
NULL
Information criteria and Fit statistics:
NULL
2)如果我在运行模型之前删除了所有NA
值,则R会出错:
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
我应该在生成模型之前删除所有NA
值吗?
我应该将因子变量变为数字吗?如果是这样,我应该如何使用as.numeric
?但这不意味着non-immigrant
和immigrant
之间的水平吗?
对于Y变量,我应该将其重新编码为0和1吗?
我添加了一个可重现的数据集,如下所示。
outcome c1 c2 c3 c4
1 lbw 120 yes <30 good
2 lbw 124 yes <30 good
3 lbw 125 yes <30 good
4 lbw 135 yes <30 good
5 lbw 112 yes <30 good
6 lbw 168 yes <30 good
7 lbw 147 yes 30-40 good
8 lbw 174 yes 30-40 fair
9 lbw 153 yes 30-40 fair
10 lbw 145 yes 30-40 fair
11 lbw 145 yes 30-40 fair
12 lbw 125 no >40 fair
13 lbw 125 no >40 poor
14 lbw 111 no >40 poor
15 non-lbw 80 no >40 poor
16 non-lbw 85 no >40 poor
17 non-lbw 78 yes >40 poor
18 non-lbw 67 no >40 poor
xair <- bc1997[,c("c1","c2","c3","c4")]
yair <- bc1997[,"outcome"]
model.plsrglm <- plsRglm(yair, xair, 2, modele="pls-glm-logistic")
summary(model.plsrglm)
但我收到了这个错误:
> model.plsrglm <- plsRglm(yair, xair, 2, modele="pls-glm-logistic")
____************************************************____
Family: binomial
Link function: logit
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
答案 0 :(得分:0)
你的&#39; x&#39;术语必须是数字。你的变量&#34; c2&#34;,&#34; c3&#34;和&#34; c4&#34;都是阶级逻辑或因素。
scaleX的默认设置为TRUE,它使用colMeans()来缩放预测变量。但是,这是不可能的因素。因此,您可以将每列转换为数字或指定scaleX = FALSE。