所以想象一下这两组女性和男性的年龄:
femalesage<-c(30,52,59,25,26,72,46,32,64,45)
malesage<-c(40,56,31,63,63,78,42,45,67)
我可以很容易地做一个t.test(女性年龄,malesage)来达到以下结果:
t.test(femalesage,malesage)
Welch Two Sample t-test
data: femalesage and malesage
t = -1.2013, df = 16.99, p-value = 0.2461
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-24.224797 6.647019
sample estimates:
mean of x mean of y
45.10000 53.88889
现在,假设我有相同的数据组织,所以这样的事情:
ages<-c(30,52,59,25,26,72,46,32,64,45,40,56,31,63,63,78,42,45,67)
genders<-c("F","F","F","F","F","F","F","F","F","F","M","M","M","M","M","M","M","M","M","M")
df<-data.frame(ages, genders)
我希望使用某种回归测试获得与威尔士双样本t检验类似的结果,测试Beta1 = 0与Beta1的斜率不等于0,其中B1是性别系数和反应是年龄。知道我怎么能得到相同的结果?
答案 0 :(得分:1)
t检验和线性回归都是一般线性模型的特例。在单个预测器的情况下,对回归系数的显着性的测试等同于t检验的显着性。
R的t.test
函数允许以两种不同的方式指定输入数据:如您所做的那样作为两个单独的向量,或者像我在这里一样使用公式接口。同样,执行简单线性回归的lm
函数需要公式接口。在这种情况下,这使得两个函数调用相同,我们只需要更改函数的名称。
您的数据:
ages <- c(30,52,59,25,26,72,46,32,64,45,40,56,31,63,63,78,42,45,67)
genders <- c("F","F","F","F","F","F","F","F","F","F","M","M","M","M","M","M","M","M","M","M")
df <- data.frame(ages, genders)
t检验:
t.test(ages ~ genders, data = df)
Welch Two Sample t-test
data: ages by genders
t = -1.2013, df = 16.99, p-value = 0.2461
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-24.224797 6.647019
sample estimates:
mean in group F mean in group M
45.10000 53.88889
(几乎)相同的回归:
summary(lm(ages ~ genders, data = df))
Call:
lm(formula = ages ~ genders, data = df)
Residuals:
Min 1Q Median 3Q Max
-22.89 -13.49 0.90 11.11 26.90
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 45.100 5.060 8.914 8.12e-08 ***
gendersM 8.789 7.351 1.196 0.248
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16 on 17 degrees of freedom
Multiple R-squared: 0.07756, Adjusted R-squared: 0.0233
F-statistic: 1.429 on 1 and 17 DF, p-value: 0.2483
请注意,性别的t和beta与p值几乎相同。