从数据集创建表以运行Chisq测试

时间:2016-05-13 17:58:01

标签: r dplyr

我想用R中的chisq测试分析分类数据。我正在使用移植数据,我希望比较手术中开/关旁路之间的结果。在我的分类变量之前,我已经问了一个类似的问题,并给出了这个答案来测试按性别分组的差异:

df <- read.table(text="Group, Age, Sex, Height, Weight, Diagnosis, Blood loss, Intubation time, Survival
                 On bypass,59,Male,165,102,Diagnosis 1,57,53,29
                 On bypass,44,Female,164,140,Diagnosis 1,114,15,35
                 On bypass,45,Male,165,119,Diagnosis 2,118,31,81
                 On bypass,26,Male,178,125,Diagnosis 1,171,36,31
                 On bypass,41,Female,177,105,Diagnosis 1,76,53,91
                 On bypass,43,Male,161,119,Diagnosis 3,97,38,63
                 Off bypass,53,Female,164,139,Diagnosis 1,125,49,51
                 Off bypass,26,Female,165,137,Diagnosis 3,29,7,86
                 Off bypass,30,Male,174,121,Diagnosis 1,174,43,100
                 Off bypass,59,Female,174,133,Diagnosis 1,40,16,43
                 Off bypass,63,Male,172,132,Diagnosis 2,32,46,10  ", header = TRUE, sep = ",")

library(dplyr)

# tally number of participants in each Group by Sex
tab <- tally(group_by(df, Group, Sex))
chisq.test(tab$n)  # test for Group differences by Sex

我用它测试两个变量的类别之间的差异(例如性别,两个变量是男性和女性),但是我的一些类别有多个变量,例如诊断(参见下面的示例数据集) 。对于这些类别,我想比较开/关旁路组中每个诊断之间的差异。

这是我的exampledata:

exampledata <- read.table(text="ID,Bypass,Sex,Age,Height,Weight,Diagnosis
                 559,Bypass on,Male,33,167,78,Other
                 662,Bypass off,Male,63,175,55,UIP
                 956,Bypass off,Female,40,158,88,Other
                 460,Bypass on,Female,34,173,86,UIP
                 153,Bypass off,Female,31,171,74,UIP
                 192,Bypass off,Male,33,163,64,Other
                 658,Bypass on,Male,50,161,60,Other
                 529,Bypass off,Female,55,179,75,Cystic fibrosis
                 981,Bypass on,Male,36,166,81,Other
                 367,Bypass on,Female,46,152,85,PH
                 728,Bypass off,Male,30,169,88,Other
                 185,Bypass on,Female,65,162,57,UIP
                 160,Bypass on,Male,54,176,62,PH
                 175,Bypass off,Male,29,156,78,Other
                 167,Bypass off,Male,20,175,86,PH
                 149,Bypass on,Male,24,169,82,Cystic fibrosis
                 446,Bypass off,Male,38,162,69,PH
                 667,Bypass on,Male,55,150,55,Cystic fibrosis
                 488,Bypass off,Female,41,162,56,Other
                 169,Bypass off,Female,60,154,55,Cystic fibrosis
                 787,Bypass on,Male,41,169,52,Cystic fibrosis
                 443,Bypass on,Male,35,159,77,Other
                 593,Bypass off,Female,28,167,53,Other
                 653,Bypass off,Female,22,176,75,Other
                 685,Bypass off,Male,26,170,88,Cystic fibrosis
                 676,Bypass on,Male,32,172,58,Cystic fibrosis
                 556,Bypass off,Male,26,168,88,PH
                 943,Bypass off,Male,40,176,80,PH
                 940,Bypass off,Male,37,180,69,Cystic fibrosis
                 740,Bypass on,Female,58,153,72,UIP
                 624,Bypass on,Female,40,156,81,UIP
                 194,Bypass on,Male,33,155,60,PH
                 162,Bypass on,Female,23,170,64,PH
                 283,Bypass off,Male,60,180,61,Other
                 404,Bypass on,Male,26,170,63,PH
                 312,Bypass on,Male,36,171,83,PH
                 995,Bypass on,Female,48,161,67,Other
                 254,Bypass on,Female,35,175,62,UIP
                 364,Bypass on,Female,65,161,55,UIP
                 771,Bypass off,Male,37,157,72,Other
                 698,Bypass on,Male,31,163,87,PH
                 286,Bypass on,Female,60,154,80,UIP
                 189,Bypass off,Male,42,168,57,PH
                 463,Bypass on,Female,32,176,50,PH
                 634,Bypass off,Male,53,152,64,UIP
                 198,Bypass off,Female,20,171,70,Cystic fibrosis
                 356,Bypass off,Male,55,161,72,Cystic fibrosis
                 254,Bypass on,Female,49,169,61,UIP
                 921,Bypass on,Male,47,152,63,UIP
                 185,Bypass on,Male,63,174,71,Other
                 953,Bypass on,Male,32,169,63,PH
                 336,Bypass on,Female,33,164,52,Other
                 651,Bypass off,Female,55,172,54,PH
                 200,Bypass off,Male,43,179,55,UIP
                 625,Bypass off,Male,43,158,75,Other
                 986,Bypass on,Female,32,151,81,Other
                 437,Bypass off,Female,53,152,57,Other
                 433,Bypass on,Male,35,180,74,Cystic fibrosis
                 673,Bypass on,Female,27,159,58,Cystic fibrosis
                 901,Bypass off,Male,30,169,72,PH", header = TRUE, sep = ",")

我用它来创建一个计数表:

mytable <- table(exampledata$Bypass,exampledata$Diagnosis)

返回

             Cystic fibrosis Other PH UIP
  Bypass off               6    11  7   4
  Bypass on                6     8  9   9

但是,由于我希望单独查看每个诊断,我需要的输出是

             Cystic fibrosis Not Cystic fibrosis
  Bypass off               6    22
  Bypass on                6    26

我希望使用这个输出我可以比较开/关泵组中患有囊性纤维化的患者数量。

理想情况下,我可以在每次诊断时快速重复此操作。

如果有人认为有更好的方法可以做到这一点(或者我只是采取了错误的方式),请告知。

非常感谢任何帮助。

谢谢, 汤姆

1 个答案:

答案 0 :(得分:1)

您可以这样做:

mytable <- table(exampledata$Bypass, exampledata$Diagnosis == 'Cystic fibrosis')
colnames(mytable) <- c('Not Cystic fibrosis', 'Cystic fibrosis')

             Not Cystic fibrosis Cystic fibrosis
  Bypass off                  22               6
  Bypass on                   26               6

如果您希望对所有类别执行相同的操作,可以在函数/循环中执行此操作。

编辑:添加循环选项以获取所需的所有表格:

lapply(levels(exampledata$Diagnosis), function(x) {
         mytable <- table(exampledata$Bypass, exampledata$Diagnosis == x)
         colnames(mytable) <- c(paste('Not ', x, sep = ''), x)
         mytable
       })

输出如下:

[[1]]

             Not Cystic fibrosis Cystic fibrosis
  Bypass off                  22               6
  Bypass on                   26               6

[[2]]

             Not Other Other
  Bypass off        17    11
  Bypass on         24     8

[[3]]

             Not PH PH
  Bypass off     21  7
  Bypass on      23  9

[[4]]

             Not UIP UIP
  Bypass off      24   4
  Bypass on       23   9

要对上述每个表格进行所有卡方检验,只需将上述lapply调用的输出保存到某个变量即可 - 让我们调用l

然后使用:

sapply(l, chisq.test)

输出应该是测试中四个摘要的列表。

当然,将lapply输出保存到列表l后,您还可以运行单独的卡方测试,例如:

chisq.test(l[[1]])