我想用R中的chisq测试分析分类数据。我正在使用移植数据,我希望比较手术中开/关旁路之间的结果。在我的分类变量之前,我已经问了一个类似的问题,并给出了这个答案来测试按性别分组的差异:
df <- read.table(text="Group, Age, Sex, Height, Weight, Diagnosis, Blood loss, Intubation time, Survival
On bypass,59,Male,165,102,Diagnosis 1,57,53,29
On bypass,44,Female,164,140,Diagnosis 1,114,15,35
On bypass,45,Male,165,119,Diagnosis 2,118,31,81
On bypass,26,Male,178,125,Diagnosis 1,171,36,31
On bypass,41,Female,177,105,Diagnosis 1,76,53,91
On bypass,43,Male,161,119,Diagnosis 3,97,38,63
Off bypass,53,Female,164,139,Diagnosis 1,125,49,51
Off bypass,26,Female,165,137,Diagnosis 3,29,7,86
Off bypass,30,Male,174,121,Diagnosis 1,174,43,100
Off bypass,59,Female,174,133,Diagnosis 1,40,16,43
Off bypass,63,Male,172,132,Diagnosis 2,32,46,10 ", header = TRUE, sep = ",")
library(dplyr)
# tally number of participants in each Group by Sex
tab <- tally(group_by(df, Group, Sex))
chisq.test(tab$n) # test for Group differences by Sex
我用它测试两个变量的类别之间的差异(例如性别,两个变量是男性和女性),但是我的一些类别有多个变量,例如诊断(参见下面的示例数据集) 。对于这些类别,我想比较开/关旁路组中每个诊断之间的差异。
这是我的exampledata:
exampledata <- read.table(text="ID,Bypass,Sex,Age,Height,Weight,Diagnosis
559,Bypass on,Male,33,167,78,Other
662,Bypass off,Male,63,175,55,UIP
956,Bypass off,Female,40,158,88,Other
460,Bypass on,Female,34,173,86,UIP
153,Bypass off,Female,31,171,74,UIP
192,Bypass off,Male,33,163,64,Other
658,Bypass on,Male,50,161,60,Other
529,Bypass off,Female,55,179,75,Cystic fibrosis
981,Bypass on,Male,36,166,81,Other
367,Bypass on,Female,46,152,85,PH
728,Bypass off,Male,30,169,88,Other
185,Bypass on,Female,65,162,57,UIP
160,Bypass on,Male,54,176,62,PH
175,Bypass off,Male,29,156,78,Other
167,Bypass off,Male,20,175,86,PH
149,Bypass on,Male,24,169,82,Cystic fibrosis
446,Bypass off,Male,38,162,69,PH
667,Bypass on,Male,55,150,55,Cystic fibrosis
488,Bypass off,Female,41,162,56,Other
169,Bypass off,Female,60,154,55,Cystic fibrosis
787,Bypass on,Male,41,169,52,Cystic fibrosis
443,Bypass on,Male,35,159,77,Other
593,Bypass off,Female,28,167,53,Other
653,Bypass off,Female,22,176,75,Other
685,Bypass off,Male,26,170,88,Cystic fibrosis
676,Bypass on,Male,32,172,58,Cystic fibrosis
556,Bypass off,Male,26,168,88,PH
943,Bypass off,Male,40,176,80,PH
940,Bypass off,Male,37,180,69,Cystic fibrosis
740,Bypass on,Female,58,153,72,UIP
624,Bypass on,Female,40,156,81,UIP
194,Bypass on,Male,33,155,60,PH
162,Bypass on,Female,23,170,64,PH
283,Bypass off,Male,60,180,61,Other
404,Bypass on,Male,26,170,63,PH
312,Bypass on,Male,36,171,83,PH
995,Bypass on,Female,48,161,67,Other
254,Bypass on,Female,35,175,62,UIP
364,Bypass on,Female,65,161,55,UIP
771,Bypass off,Male,37,157,72,Other
698,Bypass on,Male,31,163,87,PH
286,Bypass on,Female,60,154,80,UIP
189,Bypass off,Male,42,168,57,PH
463,Bypass on,Female,32,176,50,PH
634,Bypass off,Male,53,152,64,UIP
198,Bypass off,Female,20,171,70,Cystic fibrosis
356,Bypass off,Male,55,161,72,Cystic fibrosis
254,Bypass on,Female,49,169,61,UIP
921,Bypass on,Male,47,152,63,UIP
185,Bypass on,Male,63,174,71,Other
953,Bypass on,Male,32,169,63,PH
336,Bypass on,Female,33,164,52,Other
651,Bypass off,Female,55,172,54,PH
200,Bypass off,Male,43,179,55,UIP
625,Bypass off,Male,43,158,75,Other
986,Bypass on,Female,32,151,81,Other
437,Bypass off,Female,53,152,57,Other
433,Bypass on,Male,35,180,74,Cystic fibrosis
673,Bypass on,Female,27,159,58,Cystic fibrosis
901,Bypass off,Male,30,169,72,PH", header = TRUE, sep = ",")
我用它来创建一个计数表:
mytable <- table(exampledata$Bypass,exampledata$Diagnosis)
返回
Cystic fibrosis Other PH UIP
Bypass off 6 11 7 4
Bypass on 6 8 9 9
但是,由于我希望单独查看每个诊断,我需要的输出是
Cystic fibrosis Not Cystic fibrosis
Bypass off 6 22
Bypass on 6 26
我希望使用这个输出我可以比较开/关泵组中患有囊性纤维化的患者数量。
理想情况下,我可以在每次诊断时快速重复此操作。
如果有人认为有更好的方法可以做到这一点(或者我只是采取了错误的方式),请告知。
非常感谢任何帮助。
谢谢, 汤姆
答案 0 :(得分:1)
您可以这样做:
mytable <- table(exampledata$Bypass, exampledata$Diagnosis == 'Cystic fibrosis')
colnames(mytable) <- c('Not Cystic fibrosis', 'Cystic fibrosis')
Not Cystic fibrosis Cystic fibrosis
Bypass off 22 6
Bypass on 26 6
如果您希望对所有类别执行相同的操作,可以在函数/循环中执行此操作。
编辑:添加循环选项以获取所需的所有表格:
lapply(levels(exampledata$Diagnosis), function(x) {
mytable <- table(exampledata$Bypass, exampledata$Diagnosis == x)
colnames(mytable) <- c(paste('Not ', x, sep = ''), x)
mytable
})
输出如下:
[[1]]
Not Cystic fibrosis Cystic fibrosis
Bypass off 22 6
Bypass on 26 6
[[2]]
Not Other Other
Bypass off 17 11
Bypass on 24 8
[[3]]
Not PH PH
Bypass off 21 7
Bypass on 23 9
[[4]]
Not UIP UIP
Bypass off 24 4
Bypass on 23 9
要对上述每个表格进行所有卡方检验,只需将上述lapply
调用的输出保存到某个变量即可 - 让我们调用l
。
然后使用:
sapply(l, chisq.test)
输出应该是测试中四个摘要的列表。
当然,将lapply
输出保存到列表l
后,您还可以运行单独的卡方测试,例如:
chisq.test(l[[1]])