错误聚类影响的来自贝叶斯高斯混合函数的协方差矩阵

时间:2019-02-04 21:02:36

标签: python scikit-learn cluster-analysis

我尝试学习一个高斯混合模型,其中(数量未知的)簇可能被更大的距离(与其协方差相比)分开。如果适当选择簇数,则GMM的效果很好。但是,使用BayesianGaussianMixture时,即使距离很远,这些群集似乎也受到其他群集的强烈影响。以下最小示例显示了这一点:

import matplotlib.pyplot as plt

from numpy.random import randn
from sklearn.datasets import make_blobs
from sklearn.mixture import GaussianMixture, BayesianGaussianMixture


n_clusters=3
X,y = make_blobs(1000,n_features=2, centers=randn(n_clusters, 2)*500)

plt.rcParams['figure.figsize'] = [20, 20]

plt.subplot(3,2,1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.viridis, alpha=.25);

plt.subplot(3,2,2)
gmm = GaussianMixture(n_components=3, covariance_type='full')
gmm.fit(X)
X1,y1=gmm.sample(1000)
plt.scatter(X1[:, 0], X1[:, 1],  c=y1, cmap=plt.cm.viridis, alpha=.25);


for i,prior in enumerate([1.0, 0.00001, 10., 9999999999999999]):
    print(i)
    plt.subplot(3,2,i+3)
    gmm = BayesianGaussianMixture(n_components=100,covariance_type='full', weight_concentration_prior=prior)
    gmm.fit(X)
    Xi,yi=gmm.sample(1000)
    plt.scatter(Xi[:, 0], Xi[:, 1],  c=yi, cmap=plt.cm.viridis, alpha=.25);

Result of the program

对于两个组件,此效果甚至更明显。这是BayesianGaussianMixture的错误,还是我使用不正确?

0 个答案:

没有答案