Question

我有一些我继承的代码，它为从预测均值的成对比较生成显着性水平矩阵。由于该模型包含来自多个站点和治疗的数据，但我只想比较某个站点内治疗中的基因型，只有一部分比较是有意义的。

这是当前生成的虚拟版本。

effect.nam <- expand.grid(site=c("A","B","C"), treat=c("low","high"), genotype=c("A1","B2"))
labels <-  paste(effect.nam[,1],effect.nam[,2],effect.nam[,3], sep=".")
mat <-matrix(sample(c(T,F),144,replace=T),12,12)
dimnames(mat) <- list(labels,labels)

显然在这种情况下，T / F是随机的。我想要的是只看到网站和治疗中的比较。最好也删除自我比较。理想情况下，我想以以下形式返回数据框：

   Site Treat Genotype1 Genotype2   Sig
1     A   low         A1         2  TRUE
2     A   low         A1         3  TRUE
3     A   low         B2         3  TRUE
4     A  high         A1         2  TRUE
5     A  high         A1         3 FALSE
6     A  high         B2         3 FALSE
7     B   low         A1         2 FALSE
8     B   low         A1         3  TRUE
9     B   low         B2         3 FALSE
10    B  high         A1         2  TRUE
11    B  high         A1         3  TRUE
12    B  high         B2         3  TRUE
13    C   low         A1         2  TRUE
14    C   low         A1         3  TRUE
15    C   low         B2         3 FALSE
16    C  high         A1         2  TRUE
17    C  high         B1         3  TRUE
18    C  high         A2         3  TRUE

我做了一些错误的开始，如果有人在正确的方向上有一些快速指示，我们将不胜感激。

在Chase给出的非常有用的答案中，你可以看到，虽然已经删除了无意义的比较，但每次有用的比较都包含两次（基因型1对比基因型2，反之亦然）。我看不出如何轻松删除这些，因为它们并不是真的重复......

- 更新 -

道歉，我需要更改mat，以便在实施Chase的解决方案时，Genotype1和Genotype2为factor，而不是int，因为我的实际情况。我在下面的解决方案中添加了几个附加内容（在此处添加了一个排序列以避免倍增比较）。

它有效，这很棒，但添加这些列对我来说似乎很尴尬 - 是否有更优雅的方式？

mat.m <- melt(mat)
mat.m[,c("site1", "treat1", "genotype1")] <-  colsplit(mat.m$X1, "\\.", c("site1", "treat1", "genotype1"))
mat.m[,c("site2", "treat2", "genotype2")] <-  colsplit(mat.m$X2, "\\.", c("site2", "treat2", "genotype2"))
str(mat.m)
mat.m$genotype1sort <- mat.m$genotype1
mat.m$genotype2sort <- mat.m$genotype2
levels(mat.m$genotype1sort) <- c(1, 2)
levels(mat.m$genotype2sort) <- c(1, 2)
mat.m$genotype1sort <- as.numeric(levels(mat.m$genotype1sort))[mat.m$genotype1sort]
mat.m$genotype2sort <- as.numeric(levels(mat.m$genotype2sort))[mat.m$genotype2sort]

subset(mat.m, site1 == site2 & treat1 == treat2 & genotype1 != genotype2 & genotype1sort < genotype2sort,
   select = c("site1", "treat1", "genotype1", "genotype2", "value"))

#-----
    site1 treat1 genotype1 genotype2 value
73      A    low        A1        B2  TRUE
86      B    low        A1        B2  TRUE
99      C    low        A1        B2  TRUE
112     A   high        A1        B2  TRUE
125     B   high        A1        B2 FALSE
138     C   high        A1        B2 FALSE

Answer 1

我认为使用reshape2中的一些函数可以得到你想要的东西。首先，将melt数据转换为长格式：

require(reshape2)
mat.m <- melt(mat)
#--------
        X1      X2 value
1  A.low.1 A.low.1  TRUE
2  B.low.1 A.low.1  TRUE

接下来，将列X1和X2分成“。”

上的三列

mat.m[,c("site1", "treat1", "genotype1")] <-  colsplit(mat.m$X1, "\\.", c("site1", "treat1", "genotype1"))
mat.m[,c("site2", "treat2", "genotype2")] <-  colsplit(mat.m$X2, "\\.", c("site2", "treat2", "genotype2"))

现在，mat.m看起来像是：

> head(mat.m,3)
        X1      X2 value site1 treat1 genotype1 site2 treat2 genotype2
1  A.low.1 A.low.1  TRUE     A    low         1     A    low         1
2  B.low.1 A.low.1  TRUE     B    low         1     A    low         1
3  C.low.1 A.low.1 FALSE     C    low         1     A    low         1

最后，对要抓取的行进行子集化，并按所需顺序获取列。请注意，有很多方法可以在subset之后执行此操作，但我在此处使用它是为了清晰：

subset(mat.m, site1 == site2 & treat1 == treat2 & genotype1 != genotype2,
       select = c("site1", "treat1", "genotype1", "genotype2", "value"))
#--------
    site1 treat1 genotype1 genotype2 value
7       A    low         2         1  TRUE
20      B    low         2         1  TRUE

你可能会做一些更聪明的事情，并避免将两列分开，但这似乎做你想要的，应该合理地直截了当。

如何在R中进行特定的成对比较

1 个答案: