Question

我有一个相关相似性数据框，例如下面显示的那个：

      1         2         3         4
1 1.0000000 0.9991846 0.7648333 0.3517951
2 0.9991846 1.0000000 0.7563048 0.3569311
3 0.7648333 0.7563048 1.0000000 0.6568740
4 0.3517951 0.3569311 0.6568740 1.0000000

我想提取超过某个阈值的值的簇（在这种情况下为0.95）。因此，我将以下内容作为单独的数据框返回：

      1         2        
1 1.0000000 0.9991846 
2 0.9991846 1.0000000 

      3
3 1.0000000 

      4
4 1.0000000

我考虑过分组或使用逻辑运算符，例如：

subset(blah, blah[1,] >.95)
blah > .95

我认为这是正确的方向，但我对如何解决这个问题感到难过？任何帮助将不胜感激？

Answer 1

这是一个没有循环的解决方案：

mytableTxt <- "      1         2         3         4
1 1.0000000 0.9991846 0.7648333 0.3517951
2 0.9991846 1.0000000 0.7563048 0.3569311
3 0.7648333 0.7563048 1.0000000 0.6568740
4 0.3517951 0.3569311 0.6568740 1.0000000"
mytable <- read.table(textConnection(mytableTxt), header = TRUE, row.names = 1)
mytable <- mytable[mytable > .95]
newbloc <- mytable==1 & c(1, mytable[-length(mytable)])==1
blocid <- rep(1:sum(newbloc), c(which(newbloc), length(newbloc) + 1)[-1] - which(newbloc))
blocsplit <- split(mytable, factor(blocid))
lapply(blocsplit, function(x)
    tmp <- as.data.frame(matrix(x, ncol = max(c(1, length(x)/2)))))
# $`1`
#          V1        V2
# 1 1.0000000 0.9991846
# 2 0.9991846 1.0000000
# 
# $`2`
#   V1
# 1  1
# 
# $`3`
#   V1
# 1  1

Answer 2

我想出了一个完整的答案，但它仍然使用循环和一些ifs：

#clusters ordered correlation similarity matrix
#takes csm and threshold as input
man_cluster_threshold <- function(M.AOE, threshold = .95){
   list_of_clusters <- list()
   y <- 1
   for(x in 1:ncol(M.AOE)){
    #if its not the last column this will check to see if the value to the right is below the theshold 

       if(x != ncol(M.AOE)){
          if(M.AOE[x,x+1] < threshold){
             #if it is below the threshold then that block of the matrix gets saved as its own block
                 qwer <- as.data.frame(M.AOE[y:x,y:x])
                 name_trans <- colnames(M.AOE[])[y:x]
                  y <- x + 1
                  names(qwer) <- name_trans
                list_of_clusters[[x]] <- qwer
              }
         }
       # if it is above the threshold
        else{
             qwer <- M.AOE[y:x,y:x]
              list_of_clusters[[x]] <- qwer
       }
        #this checks the last transcript
     if(x == ncol(M.AOE)){
          if(M.AOE[x,x-1] < threshold){
           #if it is below the threshold then that block of the matrix gets saved as its own block
                qwer <- as.data.frame(M.AOE[y:x,y:x])
                name_trans <- colnames(M.AOE[])[y:x]
                y <- x + 1
                names(qwer) <- name_trans
                  list_of_clusters[[x]] <- qwer
                 }
              }
            }
               list_of_clusters <- rmNullObs(list_of_clusters)
                return(list_of_clusters)
           }

有人有更好的主意吗？

R，如何从相关相似矩阵中提取值块？

2 个答案: