我尝试学习一个高斯混合模型,其中(数量未知的)簇可能被更大的距离(与其协方差相比)分开。如果适当选择簇数,则GMM
的效果很好。但是,使用BayesianGaussianMixture
时,即使距离很远,这些群集似乎也受到其他群集的强烈影响。以下最小示例显示了这一点:
import matplotlib.pyplot as plt
from numpy.random import randn
from sklearn.datasets import make_blobs
from sklearn.mixture import GaussianMixture, BayesianGaussianMixture
n_clusters=3
X,y = make_blobs(1000,n_features=2, centers=randn(n_clusters, 2)*500)
plt.rcParams['figure.figsize'] = [20, 20]
plt.subplot(3,2,1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.viridis, alpha=.25);
plt.subplot(3,2,2)
gmm = GaussianMixture(n_components=3, covariance_type='full')
gmm.fit(X)
X1,y1=gmm.sample(1000)
plt.scatter(X1[:, 0], X1[:, 1], c=y1, cmap=plt.cm.viridis, alpha=.25);
for i,prior in enumerate([1.0, 0.00001, 10., 9999999999999999]):
print(i)
plt.subplot(3,2,i+3)
gmm = BayesianGaussianMixture(n_components=100,covariance_type='full', weight_concentration_prior=prior)
gmm.fit(X)
Xi,yi=gmm.sample(1000)
plt.scatter(Xi[:, 0], Xi[:, 1], c=yi, cmap=plt.cm.viridis, alpha=.25);
对于两个组件,此效果甚至更明显。这是BayesianGaussianMixture
的错误,还是我使用不正确?