我有一个填充了离散元素的矩阵,我需要将它们聚类成完整的组。因此,例如,采用这个矩阵:
[A B B C A]
[A A B A A]
[A B B C C]
[A A A A A]
A有两个独立的集群,C有两个独立的集群,B有一个集群。
我正在寻找的输出理想情况下会为每个clister分配一个唯一的ID,如下所示:
[1 2 2 3 4]
[1 1 2 4 4]
[1 2 2 5 5]
[1 1 1 1 1]
现在我有一个R代码,通过迭代检查最近邻居来递归地执行此操作,但是当矩阵变大(即100x100)时它会快速溢出。
R中是否有内置函数可以执行此操作?我查看了光栅和图像处理,但没有运气。我确信它必须在那里。
谢谢!
答案 0 :(得分:7)
您可以通过构建表示矩阵的点阵图来处理此问题,其中只有顶点具有相同类型时才会保留边:
# Build initial matrix and lattice graph
library(igraph)
mat <- matrix(c(1, 1, 1, 1, 2, 1, 2, 1, 2, 2, 2, 1, 3, 1, 3, 1, 1, 1, 3, 1), nrow=4)
labels <- as.vector(mat)
g <- graph.lattice(dim(mat))
lyt <- layout.auto(g)
# Remove edges between elements of different types
edgelist <- get.edgelist(g)
retain <- labels[edgelist[,1]] == labels[edgelist[,2]]
g <- delete.edges(g, E(g)[!retain])
# Take a look at what we have
plot(g, layout=lyt)
顶点按列向下编号。很容易看出我们需要做的就是抓住这个图的组成部分:
matrix(clusters(g)$membership, nrow=nrow(mat))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 2 2 3 4
# [2,] 1 1 2 4 4
# [3,] 1 2 2 5 5
# [4,] 1 1 1 1 1
如果要在网格中包含对角线,可以从邻域大小为2的网格开始,然后限制为不超过一行或一列的元素。考虑以下矩阵:
[A B C B]
[B A A A]
由于包含对角链接,这里的代码将捕获4组,而不是6组:
# Build initial matrix and lattice graph (neighborhood size 2)
mat <- matrix(c(1, 2, 2, 1, 3, 1, 2, 1), nrow=2)
labels <- as.vector(mat)
rows <- (seq(length(labels)) - 1) %% nrow(mat)
cols <- ceiling(seq(length(labels)) / nrow(mat))
g <- graph.lattice(dim(mat), nei=2)
# Remove edges between elements of different types or that aren't diagonal
edgelist <- get.edgelist(g)
retain <- labels[edgelist[,1]] == labels[edgelist[,2]] &
abs(rows[edgelist[,1]] - rows[edgelist[,2]]) <= 1 &
abs(cols[edgelist[,1]] - cols[edgelist[,2]]) <= 1
g <- delete.edges(g, E(g)[!retain])
# Cluster to obtain final groups
matrix(clusters(g)$membership, nrow=nrow(mat))
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 2 1 1 1
答案 1 :(得分:0)
我不太确定这是否能解决同样的问题,但我最近写了一些代码,它们以相同的方式将迷宫中的墙段分组,即最近邻居。我是迭代的,并使用dist()函数。这是我使用过的一些代码。
我从包含所有墙段的N * 4矩阵开始(使用Prim&#39树Alg生成);列(x0,y0,x1,y1)定义给定段的端点。所有段在整数网格点上开始和结束,长度为1. treelist
的每个元素都包含所有聚簇段。对于发布的问题,这应该更容易一些,因为每个项目只有一个坐标(行,列)而不是两个。
treelist<-list()
treecnt<-1
#kill edge walls, i.e. wall segments on the border of the maze.
# edges<- which(dowalls[,1]==dowalls[,3] | dowalls[,2]==dowalls[,4])
vedges <- which( (dowalls[,1]==dowalls[,3]) & (dowalls[,1]==1 | dowalls[,1]==dimx+1) )
hedges <- which( (dowalls[,2]==dowalls[,4]) & (dowalls[,2]==1 | dowalls[,1]==dimy+1) )
dowalls<-dowalls[-c(vedges,hedges),,drop=FALSE]
# now sort into trees
while(nrow(dowalls)>0 ) {
tree <- matrix(dowalls[1,],nr=1) #force dimensions
dowalls<-dowalls[-1,,drop=FALSE]
treerow <- 1 #current row of tree we're looking at
while ( treerow <= nrow(tree) ) {
#only examine the first 'column' of the dist() matrix 'cause those are the
# distances from the tree[] endpoints
touch <- c( which(dist(rbind(tree[treerow,1:2],dowalls[,1:2]) )[1:nrow(dowalls)]==0), which(dist(rbind(tree[treerow,1:2],dowalls[,3:4]) )[1:nrow(dowalls)]==0), which(dist(rbind(tree[treerow,3:4],dowalls[,1:2]) )[1:nrow(dowalls)]==0), which(dist(rbind(tree[treerow,3:4],dowalls[,3:4]) )[1:nrow(dowalls)]==0) )
if(length(touch) ) {
tree <- rbind(tree,dowalls[c(touch),])
dowalls <- dowalls[-c(touch),,drop=FALSE]
}
# now be careful: want to track the row of tree[] we're working with AND
# track how many rows there currently are in tree[]
treerow <- treerow + 1
} #end of while treerow <= nrow
treelist[[treecnt]]<-tree
treecnt <- treecnt + 1
} #end ; all walls have been classified