对按因子分割的数据集中的多个列应用分析,包括连续和分类数据

时间:2016-05-07 21:53:15

标签: r dataframe statistics

我希望在使用R的因子分割的数据集中的许多列上应用t检验,我在这里找到了一个解决方案:Apply t-test on many columns in a dataframe split by factor

此代码取自上述问题:

df <- read.table(text="Group   var1    var2    var3    var4    var5
1           3   5   7   3   7
1           3   7   5   9   6
1           5   2   6   7   6
1           9   5   7   0   8
1           2   4   5   7   8
1           2   3   1   6   4
2           4   2   7   6   5
2           0   8   3   7   5
2           1   2   3   5   9
2           1   5   3   8   0
2           2   6   9   0   7
2           3   6   7   8   8
2           10  6   3   8   0", header = TRUE)


t(sapply(df[-1], function(x) 
 unlist(t.test(x~df$Group)     [c("estimate","p.value","statistic","conf.int")])))

结果:

 estimate.mean in group 1 estimate.mean in group 2   p.value statistic.t conf.int1 conf.int2
var1                 4.000000                 3.000000 0.5635410   0.5955919 -2.696975  4.696975
var2                 4.333333                 5.000000 0.5592911  -0.6022411 -3.104788  1.771454
var3                 5.166667                 5.000000 0.9028444   0.1249164 -2.770103  3.103436
var4                 5.333333                 6.000000 0.7067827  -0.3869530 -4.497927  3.164593
var5                 6.500000                 4.857143 0.3053172   1.0925986 -1.803808  5.089522

这正是我所追求的,但我的数据集还包括分类数据,如性别和诊断(包括多种可能性)。

有没有办法将其合并到上面的代码中?我是统计数据的新手,但我相信使用卡方来测试分类数据之间的区别?

如果这不能合并到以前的代码中,那么测试分类数据并产生类似结果的单独代码也将是一个很大的帮助。

非常感谢任何帮助。

谢谢, 汤姆

编辑:

感谢您的回复。

我正在处理移植数据,我希望比较手术中开/关旁路之间的结果。我不太确定显示我的数据的最佳方式,我已经从csv复制了这个。文件,希望它没问题。

Group,Age,Sex,Height,Weight,Diagnosis,Blood loss,Intubation time,Survival
On bypass,59,Male,165,102,Diagnosis 1,57,53,29
On bypass,44,Female,164,140,Diagnosis 1,114,15,35
On bypass,45,Male,165,119,Diagnosis 2,118,31,81
On bypass,26,Male,178,125,Diagnosis 1,171,36,31
On bypass,41,Female,177,105,Diagnosis 1,76,53,91
On bypass,43,Male,161,119,Diagnosis 3,97,38,63
Off bypass,53,Female,164,139,Diagnosis 1,125,49,51
Off bypass,26,Female,165,137,Diagnosis 3,29,7,86
Off bypass,30,Male,174,121,Diagnosis 1,174,43,100
Off bypass,59,Female,174,133,Diagnosis 1,40,16,43
Off bypass,63,Male,172,132,Diagnosis 2,32,46,10

我计划首先确保我的两组在年龄,性别,身高,体重和诊断方面没有显着差异。

然后我将测试患者的结果,包括失血,插管时间和存活率。

有人可以建议用于此分析的最佳测试吗?如果可能的话,在R上运行这个代码提供一些帮助吗?

再次感谢, 汤姆

1 个答案:

答案 0 :(得分:0)

值得查阅匹配主题设计的优秀文本,但假设您已经拥有或将要,这(以及您已经拥有的内容)应该帮助您在R中做您需要做的事情:

 df <- read.table(text="Group, Age, Sex, Height, Weight, Diagnosis, Blood loss, Intubation time, Survival
                 On bypass,59,Male,165,102,Diagnosis 1,57,53,29
                 On bypass,44,Female,164,140,Diagnosis 1,114,15,35
                 On bypass,45,Male,165,119,Diagnosis 2,118,31,81
                 On bypass,26,Male,178,125,Diagnosis 1,171,36,31
                 On bypass,41,Female,177,105,Diagnosis 1,76,53,91
                 On bypass,43,Male,161,119,Diagnosis 3,97,38,63
                 Off bypass,53,Female,164,139,Diagnosis 1,125,49,51
                 Off bypass,26,Female,165,137,Diagnosis 3,29,7,86
                 Off bypass,30,Male,174,121,Diagnosis 1,174,43,100
                 Off bypass,59,Female,174,133,Diagnosis 1,40,16,43
                 Off bypass,63,Male,172,132,Diagnosis 2,32,46,10  ", header = TRUE, sep = ",")

library(dplyr)

# tally number of participants in each Group by Sex
tab <- tally(group_by(df, Group, Sex))
chisq.test(tab$n)  # test for Group differences by Sex

df <- group_by(df)

# do any of these variables differ by Group?
summary(manova(cbind(Age, Height, Weight) ~ Group, data = df))

# investigate all main effects
summary(aov(Survival ~ ., data = df))

# what about some main effects and interactions?
summary(aov(Survival ~ (Group+Age+Sex)^2, data = df))