我希望计算矩阵的每一行与同一矩阵的每一行之间的variation of information。此距离指标不包含在dist
中,因此我必须手动迭代。每行都是一个聚类,每列都是一个样本。矩阵的值为{1,0},表示样本是否是群集的成员。这是一个示例矩阵和我现在拥有的。可能需要一段时间,是否有更有效的方法来执行此计算?
# subset those clusterings which meet threshold of member count
m <- 100
n <- 70
membership <- matrix(sample(0:1, m * n, replace = TRUE), m, n)
# create distance matrix, set diagonal to 0
dist.matrix <- matrix(, nrow = m, ncol = m)
diag(dist.matrix) <- 0
# iterate through each row and calculate distances with subsequent rows
# fill values in distance matrix
for (i in 1:m) {
for (j in (i+1):m) {
if (j > m) break
vi <- igraph::compare(membership[i,], membership[j,], method = "vi")
dist.matrix[i,j] <- vi
dist.matrix[j,i] <- vi
}
}
答案 0 :(得分:0)
您可以使用expand.grid定义组合,使用sapply来计算值,并重新整形以生成最终矩阵
df_combs <- expand.grid(1:nrow(membership), 1:nrow(membership))
df_combs$compare <- apply(df_combs, 1, function(x) igraph::compare(membership[x[1],], membership[x[2],], method = "vi"))
df_wide <- reshape(df_combs, direction = "wide", timevar = "Var1", idvar = "Var2")
df_wide$Var2 <- NULL
df_wide与dist.matrix相同。