我正在使用e1071 R包中的cmeans
来聚类我的数据。我想预测新数据的集群成员资格,而我却失去了编写预测函数的方法。虽然预测硬集群成员资格很简单(只是分配给最近的集群中心),但我不知道如何计算cl$membership
中给出的成员资格值:
cl <- cmeans( train, centers= 10, m= 1.08 )
# cl$membership contains the "soft" cluster membership
# the following line does not work, unfortunately
cl.new <- predict( cl, test )
# getting the hard cluster assignments is easy
predict.fclust <- function( cl, x ) {
which.cl <- function( xx )
which.min( apply( cl$centers, 1, function( y ) sum( ( y - xx )^2 ) ) )
ret <- apply( x, 1, which.cl )
names( ret ) <- rownames( x )
ret
}
# this works, but only predicts hard clustering
cl.new <- predict( cl, test )
答案 0 :(得分:4)
会员资格定义为(Wikipedia)
请在cmeans
帮助页面中考虑此示例:
library("e1071")
set.seed(1)
x <- rbind(matrix(rnorm(100,sd=0.3), ncol=2),
matrix(rnorm(100,mean=1,sd=0.3), ncol=2))
cl <- cmeans(x, 2, 20, verbose=TRUE, method="cmeans", m=2)
然后可以按如下方式计算成员资格值:
## compute distances between samples and cluster centers for default setting
## dist="euclidean"; use absolute values for dist="manhattan"
cc <- cl$centers
dm <- sapply(seq_len(nrow(x)),
function(i) apply(cc, 1, function(v) sqrt(sum((x[i, ]-v)^2))))
m <- 2
## compute cluster membership values
ms <- t(apply(dm, 2,
function(x) {
tmp <- 1/((x/sum(x))^(2/(m-1))) # formula above
tmp/sum(tmp) # normalization
}))
比较:
R> head(cl$membership)
1 2
[1,] 0.02669 0.9733
[2,] 0.01786 0.9821
[3,] 0.03622 0.9638
[4,] 0.13481 0.8652
[5,] 0.13708 0.8629
[6,] 0.20024 0.7998
R> head(ms)
1 2
[1,] 0.02669 0.9733
[2,] 0.01786 0.9821
[3,] 0.03622 0.9638
[4,] 0.13481 0.8652
[5,] 0.13708 0.8629
[6,] 0.20024 0.7998
R> all.equal(ms, cl$membership, tolerance=1e-15)
[1] TRUE