我有一个相关相似性数据框,例如下面显示的那个:
1 2 3 4
1 1.0000000 0.9991846 0.7648333 0.3517951
2 0.9991846 1.0000000 0.7563048 0.3569311
3 0.7648333 0.7563048 1.0000000 0.6568740
4 0.3517951 0.3569311 0.6568740 1.0000000
我想提取超过某个阈值的值的簇(在这种情况下为0.95)。因此,我将以下内容作为单独的数据框返回:
1 2
1 1.0000000 0.9991846
2 0.9991846 1.0000000
3
3 1.0000000
4
4 1.0000000
我考虑过分组或使用逻辑运算符,例如:
subset(blah, blah[1,] >.95)
blah > .95
我认为这是正确的方向,但我对如何解决这个问题感到难过?任何帮助将不胜感激?
答案 0 :(得分:1)
这是一个没有循环的解决方案:
mytableTxt <- " 1 2 3 4
1 1.0000000 0.9991846 0.7648333 0.3517951
2 0.9991846 1.0000000 0.7563048 0.3569311
3 0.7648333 0.7563048 1.0000000 0.6568740
4 0.3517951 0.3569311 0.6568740 1.0000000"
mytable <- read.table(textConnection(mytableTxt), header = TRUE, row.names = 1)
mytable <- mytable[mytable > .95]
newbloc <- mytable==1 & c(1, mytable[-length(mytable)])==1
blocid <- rep(1:sum(newbloc), c(which(newbloc), length(newbloc) + 1)[-1] - which(newbloc))
blocsplit <- split(mytable, factor(blocid))
lapply(blocsplit, function(x)
tmp <- as.data.frame(matrix(x, ncol = max(c(1, length(x)/2)))))
# $`1`
# V1 V2
# 1 1.0000000 0.9991846
# 2 0.9991846 1.0000000
#
# $`2`
# V1
# 1 1
#
# $`3`
# V1
# 1 1
答案 1 :(得分:0)
我想出了一个完整的答案,但它仍然使用循环和一些ifs:
#clusters ordered correlation similarity matrix
#takes csm and threshold as input
man_cluster_threshold <- function(M.AOE, threshold = .95){
list_of_clusters <- list()
y <- 1
for(x in 1:ncol(M.AOE)){
#if its not the last column this will check to see if the value to the right is below the theshold
if(x != ncol(M.AOE)){
if(M.AOE[x,x+1] < threshold){
#if it is below the threshold then that block of the matrix gets saved as its own block
qwer <- as.data.frame(M.AOE[y:x,y:x])
name_trans <- colnames(M.AOE[])[y:x]
y <- x + 1
names(qwer) <- name_trans
list_of_clusters[[x]] <- qwer
}
}
# if it is above the threshold
else{
qwer <- M.AOE[y:x,y:x]
list_of_clusters[[x]] <- qwer
}
#this checks the last transcript
if(x == ncol(M.AOE)){
if(M.AOE[x,x-1] < threshold){
#if it is below the threshold then that block of the matrix gets saved as its own block
qwer <- as.data.frame(M.AOE[y:x,y:x])
name_trans <- colnames(M.AOE[])[y:x]
y <- x + 1
names(qwer) <- name_trans
list_of_clusters[[x]] <- qwer
}
}
}
list_of_clusters <- rmNullObs(list_of_clusters)
return(list_of_clusters)
}
有人有更好的主意吗?