1 个定量变量和 2 个定性变量的 Levene 检验

时间:2021-02-19 05:50:49

标签: r hypothesis-test

我正在尝试对 1 个数值变量 (LungCap) 和 2 个定性变量(SmokeGender)进行 levene 检验和 t 检验。 SmokeGender 都有自己的列。我可以粘贴一个结合性别和吸烟的新列,然后运行 ​​levene 测试吗?我试过了,它给了我这个错误:

Error in t.test.formula(LungCap ~ GenderSmoke, data = hwdata, var.equal = T) : 
  grouping factor must have exactly 2 levels

我们正试图弄清楚女性吸烟与男性吸烟的肺活量差异

这是hwdata表

hwdata
    LungCap Age Height Smoke Gender Caesarean GenderSmoke
415   8.125  10   66.8    no   male        no     male no
463   7.125  10   60.2    no   male        no     male no
179   9.850  17   72.4   yes female       yes  female yes
526   8.350  11   68.1    no   male        no     male no
195  11.225  16   72.8    no   male        no     male no
118  10.275  18   71.0    no   male        no     male no
299   2.625   5   49.0    no   male        no     male no
229   4.700   3   52.7    no   male        no     male no
244   8.600  12   61.6    no   male       yes     male no
14    6.000  10   61.1    no female        no   female no
374  10.725  16   77.4    no female        no   female no
665  10.400  16   69.6    no   male        no     male no
602  11.800  19   74.6    no female        no   female no
603   9.375  15   73.1    no female        no   female no
709   6.900  15   64.5    no female        no   female no
91    6.950   9   63.9    no   male       yes     male no
348   5.025  12   55.0    no female        no   female no
649   6.825  13   60.2    no   male        no     male no
355   7.575  12   61.5    no female        no   female no
26    8.350  12   61.3    no   male       yes     male no

这是我所拥有的:

LungCapData <- read.delim(file="LungCapData.txt")

# WE will use a t-test so we need select a sample from LungCapData 
set.seed(123) 
hwdata <- LungCapData[sample(x =rownames(LungCapData), size=20 ), ]
hwdata$GenderSmoke <- paste(hwdata$Gender, hwdata$Smoke, sep=" ")
table(hwdata$GenderSmoke)
hwdata

# Evaluation of homogeneity of variance (Levene's test)
library (car)
leveneTest(LungCap ~ GenderSmoke, hwdata, center=mean)

1 个答案:

答案 0 :(得分:0)

问题在 Levene 测试中,t 测试给出错误与问题标题无关。新变量 GenderSmoke 有 4 个水平,没有 4 个样本 t 检验这样的东西。在其中一个代码注释中是这样写的

<块引用>

我们将使用 t 检验,因此我们需要从 LungCapData 中选择一个样本

这是为什么?如果您有更多数据,请使用它。但这将是一个关于统计的问题,而不是关于 R 代码的问题。

更有用的检验是对两个变量 GenderSmoke 的独立性进行卡方检验。在下面的代码中,由于数据中没有 male, yes,模拟了 p 值。

至于 Levene 测试,没有错误,但由于问题还询问使用 paste 创建 GenderSmoke,这里是 R 方式,使用 help("interaction")。< /p>

library(car)

chisq_tbl <- table(hwdata[c("Gender", "Smoke")])
chisq_tbl
#        Smoke
#Gender   no yes
#  female  7   1
#  male   12   0

chisq.test(chisq_tbl, simulate.p.value = TRUE)
#
#   Pearson's Chi-squared test with simulated p-value (based on
#   2000 replicates)
#
#data:  chisq_tbl
#X-squared = 1.5789, df = NA, p-value = 0.3938

有理由假设 GenderSmoke 是独立的,但只有一个 Smoke == "yes",测试结果不可靠。 现在是 Levene 测试。

hwdata <- within(hwdata, GS <- interaction(Gender, Smoke))

leveneTest(LungCap ~ GS, hwdata, center = mean)
#Levene's Test for Homogeneity of Variance (center = mean)
#      Df F value Pr(>F)
#group  2  0.9758  0.397
#      17    

不拒绝方差齐性的零点。

dput 格式的数据

hwdata <-
structure(list(LungCap = c(8.125, 7.125, 9.85, 8.35, 11.225, 
10.275, 2.625, 4.7, 8.6, 6, 10.725, 10.4, 11.8, 9.375, 6.9, 6.95, 
5.025, 6.825, 7.575, 8.35), Age = c(10L, 10L, 17L, 11L, 16L, 
18L, 5L, 3L, 12L, 10L, 16L, 16L, 19L, 15L, 15L, 9L, 12L, 13L, 
12L, 12L), Height = c(66.8, 60.2, 72.4, 68.1, 72.8, 71, 49, 52.7, 
61.6, 61.1, 77.4, 69.6, 74.6, 73.1, 64.5, 63.9, 55, 60.2, 61.5, 
61.3), Smoke = c("no", "no", "yes", "no", "no", "no", "no", "no", 
"no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no", 
"no"), Gender = c("male", "male", "female", "male", "male", "male", 
"male", "male", "male", "female", "female", "male", "female", 
"female", "female", "male", "female", "male", "female", "male"
), Caesarean = c("no", "no", "yes", "no", "no", "no", "no", "no", 
"yes", "no", "no", "no", "no", "no", "no", "yes", "no", "no", 
"no", "yes"), GenderSmoke = c("male no", "male no", "female yes", 
"male no", "male no", "male no", "male no", "male no", "male no", 
"female no", "female no", "male no", "female no", "female no", 
"female no", "male no", "female no", "male no", "female no", 
"male no")), class = "data.frame", row.names = c("415", "463", 
"179", "526", "195", "118", "299", "229", "244", "14", "374", 
"665", "602", "603", "709", "91", "348", "649", "355", "26"))