x~y [==“”] - 比较按分组变量分组的组

时间:2014-02-26 02:19:19

标签: r variables grouping

我仍然围绕着一些R语法,想要询问如何有效地执行以下分析,而不必将整个数据帧从长到大等等。

这是我的数据框:

> data.frame(SOExample)
    StudyID TimePoint   Group      Conc
1  N0920235        BL Control 0.7998743
2  N1020555        BL Control 0.3839061
3  N1020621        BL Control 0.5446354
4  N1121951        BL Control 0.5146689
5  N1122107        BL Control 0.5431685
6  N1122225        BL Control 0.5775356
7  N1122221        BL Control 0.9474015
8  N1222611        BL Control 0.6194468
9  N1222745        BL Control 0.7110226
10 N1222781        BL Control 0.5347863
11 N1223363        BL Control 0.5079631
12 N1223541        BL Control 0.5054484
13 N1223579        BL Control 0.8162196
14 N1122171        BL Control 0.4997904
15 N0920198        BL Control 0.5924141
16 N0920367        BL Control 0.6244761
17 N1021085        BL Control 0.7759849
18 N1121329        BL Control 0.3845348
19 N1121389        BL Control 1.1695306
20 N1121475        BL Control 1.7254820
21 N1121871        BL Control 0.7080889
22 N1121875        BL Control 0.8214585
23 N1122021        BL Control 0.7384744
24 N1122103        BL Control 0.6026823
25 N1122283        BL Control 0.7581727
26 N1122321        BL Control 0.5282900
27 N1222493        BL Control 0.4258173
28 N1222529        BL Control 0.1538139
29 N1222587        BL Control 0.7663453
30 N1222705        BL Control 0.5873847
31 N1222693        BL Control 0.6584241
32 N1222761        BL Control 0.3321459
33   MP0001        BL Patient 0.8216681
34   MP0002        BL Patient 0.4800922
35   MP0007        BL Patient 0.8822297
36   MP0008        BL Patient 0.8975272
37   MP0010        BL Patient 0.7567058
38   MP0011        BL Patient 0.4893127
39   MP0017        BL Patient 0.5840319
40   MP0022        BL Patient 0.8053227
41   MP0023        BL Patient 0.7837370
42   MP0024        BL Patient 0.3938870
43   MP0027        BL Patient 0.6345636
44   MP0028        BL Patient 0.6234141
45   MP0029        BL Patient 0.7101115
46   MP0001        3M Patient 0.5415225
47   MP0002        3M Patient 0.3986928
48   MP0007        3M Patient 0.5722799
49   MP0008        3M Patient 0.5140331
50   MP0010        3M Patient 0.4913495
51   MP0011        3M Patient 0.5288351
52   MP0017        3M Patient 0.2931565
53   MP0023        3M Patient 0.2149173
54   MP0024        3M Patient 0.3794694
55   MP0028        3M Patient 0.6322568
56   MP0029        3M Patient 0.5297962

所以我想做的事情真的很简单。在TimePoint“BL”比较患者与对照。但出于某种原因,除了我的代码外,R不会出现:

t.test(Conc~Group[TimePoint=="BL"], data=SOExample)

这是我收到的错误消息:

Error in model.frame.default(formula = Conc ~ Group[TimePoint == "BL"],  : 
  variable lengths differ (found for 'Group[TimePoint == "BL"]')

现在进一步下来,我想进行pairwise.t.test比较BL患者与对照组和3M患者对照组。我觉得,像下面这样的东西会起作用,但你会看到R不喜欢它:

> pairwise.t.test(SOExample$Conc~Group|TimePoint, data=SOExample)
Error in factor(g) : argument "g" is missing, with no default

所以我也尝试了以下内容:

> t.test(Conc~Group, data=SOExample[SOExample$TimePoint=="BL",])

    Welch Two Sample t-test

data:  Conc by Group
t = -0.452, df = 36.94, p-value = 0.6539
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1638470  0.1040813
sample estimates:
mean in group Control mean in group Patient 
            0.6518559             0.6817387 

但是现在,当我想比较3M与对照的患者时,我收到了这样的信息:

> t.test(Conc~Group, data=SOExample[SOExample$TimePoint=="3M",])
Error in t.test.formula(Conc ~ Group, data = SOExample[SOExample$TimePoint ==  : 
  grouping factor must have exactly 2 levels

有什么想法吗?当然,我可以改变我的整个数据格式,但这只是一种痛苦。我不希望同一数据集有多个文本文件。

1 个答案:

答案 0 :(得分:1)

我不完全确定你要求的是什么,因为问题措辞对我来说有点混乱,但这里有一些选择:

所有患者与TimePoint BL的所有对照:

t.test(Conc~Group, data=SOExample[SOExample$TimePoint=="BL",])

所有患者在3M与所有对照组在BL:

with(SOExample,t.test(Conc[TimePoint=="BL" & Group=="Control"],
                      Conc[TimePoint=="3M" & Group=="Patient"]))

3M患者与BL患者的成对比较(基于研究ID配对):

ID.3M <- SOExample[SOExample$TimePoint=="3M",]$StudyID
df    <- SOExample[SOExample$StudyID %in% ID.3M,]
t.test(Conc~TimePoint, data=df, paired=T)