Question

我有一个格式的数据框：

my.df = data.frame(ID=c(1,2,3,4,5,6,7), STRAND=c('+','+','+','-','+','-','+'), COLLAPSE=c(0,0,1,0,1,0,0))

和nrow（my.df）的另一个维度nrow（mydf）矩阵。它是一个相关矩阵，但这对讨论并不重要。

例如：

mat = matrix(rnorm(n=nrow(my.df)*nrow(my.df),mean=1,sd=1), nrow = nrow(my.df), ncol=nrow(my.df))

问题是如何只从矩阵mat中检索上三角形元素，这样my.df的值为COLLAPSE == 0，并且属于同一条链？

在这个具体的例子中，我有兴趣从向量中的矩阵mat中检索以下条目：

mat[1,2]
mat[1,7]
mat[2,7]
mat[4,6]

逻辑如下，1,2都是相同的链，并且它的崩溃值等于零，因此应该检索，3将永远不会与任何其他行组合，因为它具有崩溃值= 1,1 ，3具有相同的链并且具有崩溃值= 0因此也应该被检索，......

我可以写一个for循环，但我正在寻找一种更实际的方法来实现这样的结果......

Answer 1

以下是使用outer执行此操作的一种方法：

首先，查找具有相同STRAND值且COLLAPSE == 0：

的索引

idx <- with(my.df, outer(STRAND, STRAND, "==") &
              outer(COLLAPSE, COLLAPSE, Vectorize(function(x, y) !any(x, y))))

#       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]
# [1,] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
# [2,]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
# [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [4,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
# [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [6,] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
# [7,]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

其次，将下三角形和对角线上的值设置为FALSE。创建数字索引：

idx2 <- which(idx & upper.tri(idx), arr.ind = TRUE)
#      row col
# [1,]   1   2
# [2,]   4   6
# [3,]   1   7
# [4,]   2   7

提取值：

mat[idx2]
# [1] 1.72165093 0.05645659 0.74163428 3.83420241

Answer 2

这是一种方法。

# select only the 0 collapse records
sel <- my.df$COLLAPSE==0

# split the data frame by strand
groups <- split(my.df$ID[sel], my.df$STRAND[sel])

# generate all possible pairs of IDs within the same strand
pairs <- lapply(groups, combn, 2)

# subset the entries from the matrix
lapply(pairs, function(ij) mat[t(ij)])

Answer 3

df <- my.df[my.df$COLLAPSE == 0, ]
strand <- c("+", "-")
idx <- do.call(rbind, lapply(strand, function(strand){
  t(combn(x = df$ID[df$STRAND == strand], m = 2))
}))
idx
#      [,1] [,2]
# [1,]    1    2
# [2,]    1    7
# [3,]    2    7
# [4,]    4    6

mat[idx]

基于来自数据帧的值检索矩阵的特定条目

3 个答案: