在之前的scikit-learn版本(0.1.17)中,我使用以下代码自动确定最佳高斯混合模型,并针对无监督聚类优化超参数(alpha,协方差类型,bic)。
# Gaussian Mixture Model
try:
# Determine the most suitable covariance_type
lowest_bic = np.infty
bic = []
cv_types = ['spherical', 'tied', 'diag', 'full']
for cv_type in cv_types:
# Fit a mixture of Gaussians with EM
gmm = mixture.GMM(n_components=NUMBER_OF_CLUSTERS, covariance_type=cv_type)
gmm.fit(transformed_features)
bic.append(gmm.bic(transformed_features))
if bic[-1] < lowest_bic:
lowest_bic = bic[-1]
best_gmm = gmm
best_covariance_type = cv_type
gmm = best_gmm
except Exception, e:
print 'Error with GMM estimator. Error: %s' % e
# Dirichlet Process Gaussian Mixture Model
try:
# Determine the most suitable alpha parameter
alpha = 2/math.log(len(transformed_features))
# Determine the most suitable covariance_type
lowest_bic = np.infty
bic = []
cv_types = ['spherical', 'tied', 'diag', 'full']
for cv_type in cv_types:
# Fit a mixture of Gaussians with EM
dpgmm = mixture.DPGMM(n_components=NUMBER_OF_CLUSTERS, covariance_type=cv_type, alpha = alpha)
dpgmm.fit(transformed_features)
bic.append(dpgmm.bic(transformed_features))
if bic[-1] < lowest_bic:
lowest_bic = bic[-1]
best_dpgmm = dpgmm
best_covariance_type = cv_type
dpgmm = best_dpgmm
except Exception, e:
print 'Error with DPGMM estimator. Error: %s' % e
# Variational Inference for Gaussian Mixture Model
try:
# Determine the most suitable alpha parameter
alpha = 2/math.log(len(transformed_features))
# Determine the most suitable covariance_type
lowest_bic = np.infty
bic = []
cv_types = ['spherical', 'tied', 'diag', 'full']
for cv_type in cv_types:
# Fit a mixture of Gaussians with EM
vbgmm = mixture.VBGMM(n_components=NUMBER_OF_CLUSTERS, covariance_type=cv_type, alpha = alpha)
vbgmm.fit(transformed_features)
bic.append(vbgmm.bic(transformed_features))
if bic[-1] < lowest_bic:
lowest_bic = bic[-1]
best_vbgmm = vbgmm
best_covariance_type = cv_type
vbgmm = best_vbgmm
except Exception, e:
print 'Error with VBGMM estimator. Error: %s' % e
如何使用scikit-learn 0.1.18中引入的新高斯混合/贝叶斯高斯混合模型实现相同或相似的行为?
根据scikit-learn文件,没有&#34; alpha&#34;参数已经存在,但有&#34; weight_concentration_prior&#34;参数而不是。这些是否相同? http://scikit-learn.org/stable/modules/generated/sklearn.mixture.BayesianGaussianMixture.html#sklearn.mixture.BayesianGaussianMixture
weight_concentration_prior:float |无,可选。 每种组分的重量分布的Dirichlet浓度(Dirichlet)。较高的浓度会增加更多的质量 中心,将导致更多的组件活跃,而a 较低的浓度参数将导致边缘更多的质量 混合物重量单一。参数的值必须是 大于0.如果是None,则设置为1. / n_components。
http://scikit-learn.org/0.17/modules/generated/sklearn.mixture.VBGMM.html
alpha:float,默认值为1: 表示dirichlet分布的浓度参数的实数。直觉上,alpha值越高 更有可能的是,高斯模型的变分混合将使用全部 它可以组成。
如果这两个参数(alpha和weight_concentration_prior)相同,是否意味着公式alpha = 2 / math.log(len(transformed_features))仍然适用于weight_concentration_prior = 2 / math.log(len(transformed_features) ))?
答案 0 :(得分:0)
BIC分数仍可用于GaussianMixture类中实现的GMM的经典/ EM实现。
对于给定的alpha
值,BayesianGaussianMixture类可以自动调整有效组件的数量(n_components应该足够大)。
您还可以对对数似然使用标准交叉验证(使用模型的score
方法)。