Question

我正在使用e1071 R包中的cmeans来聚类我的数据。我想预测新数据的集群成员资格，而我却失去了编写预测函数的方法。虽然预测硬集群成员资格很简单（只是分配给最近的集群中心），但我不知道如何计算cl$membership中给出的成员资格值：

cl <- cmeans( train, centers= 10, m= 1.08 )
# cl$membership contains the "soft" cluster membership
# the following line does not work, unfortunately
cl.new <- predict( cl, test )

# getting the hard cluster assignments is easy
predict.fclust <- function( cl, x ) { 
  which.cl <- function( xx ) 
    which.min( apply( cl$centers, 1, function( y ) sum( ( y - xx )^2 ) ) ) 
  ret <- apply( x, 1, which.cl )
  names( ret ) <- rownames( x )
  ret
}
# this works, but only predicts hard clustering
cl.new <- predict( cl, test )

Answer 1

会员资格定义为（Wikipedia）

cmeans membership

请在cmeans帮助页面中考虑此示例：

library("e1071")
set.seed(1)
x <- rbind(matrix(rnorm(100,sd=0.3), ncol=2),
           matrix(rnorm(100,mean=1,sd=0.3), ncol=2))
cl <- cmeans(x, 2, 20, verbose=TRUE, method="cmeans", m=2)

然后可以按如下方式计算成员资格值：

## compute distances between samples and cluster centers for default setting
## dist="euclidean"; use absolute values for dist="manhattan"
cc <- cl$centers
dm <- sapply(seq_len(nrow(x)),
             function(i) apply(cc, 1, function(v) sqrt(sum((x[i, ]-v)^2))))

m <- 2
## compute cluster membership values
ms <- t(apply(dm, 2,
              function(x) {
                tmp <- 1/((x/sum(x))^(2/(m-1)))  # formula above
                tmp/sum(tmp)  # normalization
              }))

比较：

R> head(cl$membership)
           1      2
[1,] 0.02669 0.9733
[2,] 0.01786 0.9821
[3,] 0.03622 0.9638
[4,] 0.13481 0.8652
[5,] 0.13708 0.8629
[6,] 0.20024 0.7998

R> head(ms)
           1      2
[1,] 0.02669 0.9733
[2,] 0.01786 0.9821
[3,] 0.03622 0.9638
[4,] 0.13481 0.8652
[5,] 0.13708 0.8629
[6,] 0.20024 0.7998

R> all.equal(ms, cl$membership, tolerance=1e-15)
[1] TRUE

如何用cmeans预测集群成员资格？

1 个答案: