我正在尝试对 1 个数值变量 (LungCap
) 和 2 个定性变量(Smoke
和 Gender
)进行 levene 检验和 t 检验。 Smoke
和 Gender
都有自己的列。我可以粘贴一个结合性别和吸烟的新列,然后运行 levene 测试吗?我试过了,它给了我这个错误:
Error in t.test.formula(LungCap ~ GenderSmoke, data = hwdata, var.equal = T) :
grouping factor must have exactly 2 levels
我们正试图弄清楚女性吸烟与男性吸烟的肺活量差异
这是hwdata表
hwdata
LungCap Age Height Smoke Gender Caesarean GenderSmoke
415 8.125 10 66.8 no male no male no
463 7.125 10 60.2 no male no male no
179 9.850 17 72.4 yes female yes female yes
526 8.350 11 68.1 no male no male no
195 11.225 16 72.8 no male no male no
118 10.275 18 71.0 no male no male no
299 2.625 5 49.0 no male no male no
229 4.700 3 52.7 no male no male no
244 8.600 12 61.6 no male yes male no
14 6.000 10 61.1 no female no female no
374 10.725 16 77.4 no female no female no
665 10.400 16 69.6 no male no male no
602 11.800 19 74.6 no female no female no
603 9.375 15 73.1 no female no female no
709 6.900 15 64.5 no female no female no
91 6.950 9 63.9 no male yes male no
348 5.025 12 55.0 no female no female no
649 6.825 13 60.2 no male no male no
355 7.575 12 61.5 no female no female no
26 8.350 12 61.3 no male yes male no
这是我所拥有的:
LungCapData <- read.delim(file="LungCapData.txt")
# WE will use a t-test so we need select a sample from LungCapData
set.seed(123)
hwdata <- LungCapData[sample(x =rownames(LungCapData), size=20 ), ]
hwdata$GenderSmoke <- paste(hwdata$Gender, hwdata$Smoke, sep=" ")
table(hwdata$GenderSmoke)
hwdata
# Evaluation of homogeneity of variance (Levene's test)
library (car)
leveneTest(LungCap ~ GenderSmoke, hwdata, center=mean)
答案 0 :(得分:0)
问题在 Levene 测试中,t 测试给出错误与问题标题无关。新变量 GenderSmoke
有 4 个水平,没有 4 个样本 t 检验这样的东西。在其中一个代码注释中是这样写的
我们将使用 t 检验,因此我们需要从 LungCapData 中选择一个样本
这是为什么?如果您有更多数据,请使用它。但这将是一个关于统计的问题,而不是关于 R 代码的问题。
更有用的检验是对两个变量 Gender
和 Smoke
的独立性进行卡方检验。在下面的代码中,由于数据中没有 male, yes
,模拟了 p 值。
至于 Levene 测试,没有错误,但由于问题还询问使用 paste
创建 GenderSmoke
,这里是 R 方式,使用 help("interaction")
。< /p>
library(car)
chisq_tbl <- table(hwdata[c("Gender", "Smoke")])
chisq_tbl
# Smoke
#Gender no yes
# female 7 1
# male 12 0
chisq.test(chisq_tbl, simulate.p.value = TRUE)
#
# Pearson's Chi-squared test with simulated p-value (based on
# 2000 replicates)
#
#data: chisq_tbl
#X-squared = 1.5789, df = NA, p-value = 0.3938
有理由假设 Gender
和 Smoke
是独立的,但只有一个 Smoke == "yes"
,测试结果不可靠。
现在是 Levene 测试。
hwdata <- within(hwdata, GS <- interaction(Gender, Smoke))
leveneTest(LungCap ~ GS, hwdata, center = mean)
#Levene's Test for Homogeneity of Variance (center = mean)
# Df F value Pr(>F)
#group 2 0.9758 0.397
# 17
不拒绝方差齐性的零点。
dput
格式的数据
hwdata <-
structure(list(LungCap = c(8.125, 7.125, 9.85, 8.35, 11.225,
10.275, 2.625, 4.7, 8.6, 6, 10.725, 10.4, 11.8, 9.375, 6.9, 6.95,
5.025, 6.825, 7.575, 8.35), Age = c(10L, 10L, 17L, 11L, 16L,
18L, 5L, 3L, 12L, 10L, 16L, 16L, 19L, 15L, 15L, 9L, 12L, 13L,
12L, 12L), Height = c(66.8, 60.2, 72.4, 68.1, 72.8, 71, 49, 52.7,
61.6, 61.1, 77.4, 69.6, 74.6, 73.1, 64.5, 63.9, 55, 60.2, 61.5,
61.3), Smoke = c("no", "no", "yes", "no", "no", "no", "no", "no",
"no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no",
"no"), Gender = c("male", "male", "female", "male", "male", "male",
"male", "male", "male", "female", "female", "male", "female",
"female", "female", "male", "female", "male", "female", "male"
), Caesarean = c("no", "no", "yes", "no", "no", "no", "no", "no",
"yes", "no", "no", "no", "no", "no", "no", "yes", "no", "no",
"no", "yes"), GenderSmoke = c("male no", "male no", "female yes",
"male no", "male no", "male no", "male no", "male no", "male no",
"female no", "female no", "male no", "female no", "female no",
"female no", "male no", "female no", "male no", "female no",
"male no")), class = "data.frame", row.names = c("415", "463",
"179", "526", "195", "118", "299", "229", "244", "14", "374",
"665", "602", "603", "709", "91", "348", "649", "355", "26"))