Question

在之前的scikit-learn版本（0.1.17）中，我使用以下代码自动确定最佳高斯混合模型，并针对无监督聚类优化超参数（alpha，协方差类型，bic）。

# Gaussian Mixture Model 
try:       
    # Determine the most suitable covariance_type
    lowest_bic = np.infty
    bic = []
    cv_types = ['spherical', 'tied', 'diag', 'full']
    for cv_type in cv_types:
        # Fit a mixture of Gaussians with EM
        gmm = mixture.GMM(n_components=NUMBER_OF_CLUSTERS, covariance_type=cv_type)
        gmm.fit(transformed_features)
        bic.append(gmm.bic(transformed_features))
        if bic[-1] < lowest_bic:
            lowest_bic = bic[-1]
            best_gmm = gmm
            best_covariance_type = cv_type
    gmm = best_gmm
except Exception, e:       
    print 'Error with GMM estimator. Error: %s' % e 

# Dirichlet Process Gaussian Mixture Model  
try:
    # Determine the most suitable alpha parameter
    alpha = 2/math.log(len(transformed_features))     
    # Determine the most suitable covariance_type
    lowest_bic = np.infty
    bic = []
    cv_types = ['spherical', 'tied', 'diag', 'full']
    for cv_type in cv_types:
        # Fit a mixture of Gaussians with EM
        dpgmm = mixture.DPGMM(n_components=NUMBER_OF_CLUSTERS, covariance_type=cv_type, alpha = alpha)
        dpgmm.fit(transformed_features)
        bic.append(dpgmm.bic(transformed_features))
        if bic[-1] < lowest_bic:
            lowest_bic = bic[-1]
            best_dpgmm = dpgmm
            best_covariance_type = cv_type        
    dpgmm = best_dpgmm                
except Exception, e:       
    print 'Error with DPGMM estimator. Error: %s' % e    

# Variational Inference for Gaussian Mixture Model   
try: 
    # Determine the most suitable alpha parameter 
    alpha = 2/math.log(len(transformed_features))  
    # Determine the most suitable covariance_type
    lowest_bic = np.infty
    bic = []
    cv_types = ['spherical', 'tied', 'diag', 'full']
    for cv_type in cv_types:
        # Fit a mixture of Gaussians with EM
        vbgmm = mixture.VBGMM(n_components=NUMBER_OF_CLUSTERS, covariance_type=cv_type, alpha = alpha)
        vbgmm.fit(transformed_features)
        bic.append(vbgmm.bic(transformed_features))
        if bic[-1] < lowest_bic:
            lowest_bic = bic[-1]
            best_vbgmm = vbgmm
            best_covariance_type = cv_type
    vbgmm = best_vbgmm     
except Exception, e:       
    print 'Error with VBGMM estimator. Error: %s' % e

如何使用scikit-learn 0.1.18中引入的新高斯混合/贝叶斯高斯混合模型实现相同或相似的行为？

根据scikit-learn文件，没有＆＃34; alpha＆＃34;参数已经存在，但有＆＃34; weight_concentration_prior＆＃34;参数而不是。这些是否相同？ http://scikit-learn.org/stable/modules/generated/sklearn.mixture.BayesianGaussianMixture.html#sklearn.mixture.BayesianGaussianMixture

weight_concentration_prior：float |无，可选。每种组分的重量分布的Dirichlet浓度（Dirichlet）。较高的浓度会增加更多的质量中心，将导致更多的组件活跃，而a 较低的浓度参数将导致边缘更多的质量混合物重量单一。参数的值必须是大于0.如果是None，则设置为1. / n_components。

http://scikit-learn.org/0.17/modules/generated/sklearn.mixture.VBGMM.html

alpha：float，默认值为1：表示dirichlet分布的浓度参数的实数。直觉上，alpha值越高更有可能的是，高斯模型的变分混合将使用全部它可以组成。

如果这两个参数（alpha和weight_concentration_prior）相同，是否意味着公式alpha = 2 / math.log（len（transformed_features））仍然适用于weight_concentration_prior = 2 / math.log（len（transformed_features）））？

Answer 1

BIC分数仍可用于GaussianMixture类中实现的GMM的经典/ EM实现。

对于给定的alpha值，BayesianGaussianMixture类可以自动调整有效组件的数量（n_components应该足够大）。

您还可以对对数似然使用标准交叉验证（使用模型的score方法）。

旧（sklearn 0.17）GMM，DPGM，VBGMM vs new（sklearn 0.18）GaussianMixture和BayesianGaussianMixture

1 个答案: