假设我的类别是颜色,我的分类变量是"红色","橙色"和"蓝"。
我想在两个变量is_blue和is_red上回归我的模型,其中is_blue为1时为蓝色(否则为0),is_red为1时为红色(否则为0)。
我如何能够两次使用我的分类变量?
答案 0 :(得分:0)
很难说没有明确的数据细节,但这是你想要的吗?
set.seed(100)
dat<- data.frame(
color=rep(c("red", "orange", "blue"),each=10)
,
var=rnorm(3*10,20,1)
)
levels(dat$color)
dat$is_red=ifelse(dat$color=="red",1,0)
dat$is_blue=ifelse(dat$color=="blue",1,0)
lm(var~is_blue+is_red,dat)
lm(var~factor(color),dat) #base blue
lm(var ~ C(color,contr.treatment(3, base=2)), data=dat )
> lm(var~is_blue+is_red,dat)
Call:
lm(formula = var ~ is_blue + is_red, data = dat)
Coefficients:
(Intercept) is_blue is_red
20.2337 -0.3628 -0.2516
> lm(var~factor(color),dat) #base blue
Call:
lm(formula = var ~ factor(color), data = dat)
Coefficients:
(Intercept) factor(color)orange factor(color)red
19.8709 0.3628 0.1112
> lm(var ~ C(color,contr.treatment(3, base=2)), data=dat )
Call:
lm(formula = var ~ C(color, contr.treatment(3, base = 2)), data = dat)
Coefficients:
(Intercept) C(color, contr.treatment(3, base = 2))1
20.2337 -0.3628
C(color, contr.treatment(3, base = 2))3
-0.2516