我通常使用SAS,但是我尝试更多使用R。我试图显示对连续自变量进行分类如何使回归变得混乱。所以我创建了一些数据:
set.seed(1234) #sets a seed. It is good to use the same seed all the time.
x <- rnorm(100) #X is now normally distributed with mean 0 and sd 1, N - 100
y <- 3*x + rnorm(100,0,10) #Y is related to x, but with some noise
x2 <- cut(x, 2) #Cuts x into 2 parts
然后我对x2进行了回归:
m2 <- lm(y~as.factor(x2)) #A model with the cut variable
summary(m2)
摘要是我所期望的:截距系数和哑变量系数:
Call:
lm(formula = y ~ as.factor(x2))
Residuals:
Min 1Q Median 3Q Max
-30.4646 -6.5614 0.4409 5.4936 29.6696
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.403 1.290 -1.088 0.2795
as.factor(x2)(0.102,2.55] 4.075 2.245 1.815 0.0726 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.56 on 98 degrees of freedom
Multiple R-squared: 0.03253, Adjusted R-squared: 0.02265
F-statistic: 3.295 on 1 and 98 DF, p-value: 0.07257
但是当我绘制x与y的关系图并添加一条直线以从m2回归时,该直线很平滑-我原本希望x2从0到1发生跳跃。
plot(x,y)
abline(reg = m2)
我在做什么错?还是我缺少基本的东西?