Question

我正在尝试学习期望最大化，用于高斯混合中的参数估计（ 1D ）。但是，似乎算法很少找到正确的参数。我想知道我做错了什么。

数据由三个不同位置的三位高斯生成（x=-10, x=5, and x=10）：

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

# dataset is generated with 3 gaussians at with mean at -10, 10 and 5. 
x1 = 1.0 * np.random.randn(10000) - 10.0
x2 = 1.0 * np.random.randn(10000) + 10.0
x3 = 1.0 * np.random.randn(10000) + 5.0
x = np.hstack([x1,x2,x3]) # final data set

我检查了直方图，x是正确的。参数学习通过EM更新完成：

# model and initialization
M = 3 # number of mixtures
alpha = np.ones(M)*.5 # -> likelihood of the mixture
mu = np.random.random(M)*10 # -> mean of the gaussian
sigma = np.ones(M)*1.0 # -> std of the gaussian

w_mt = np.zeros((M,len(x))) # -> q(mixture | data, parameter)

# EM
for i in range(100):
    print "alpha:", alpha, "mu:", mu, "sigma:", sigma

    # E-step
    for m in range(M):
        w_mt[m] = alpha[m] * mlab.normpdf(x,mu[m],sigma[m])
    C = np.sum(w_mt, axis=0) # normalization
    w_mt = w_mt / C

    # M-step
    alpha = np.sum(w_mt,axis=1) / len(x)
    mu = np.sum(w_mt*x,axis=1)/np.sum(w_mt,axis=1)
    sigma = np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1)

    sigma[sigma < 0.1] = 0.1 # avoid numerical problems

我希望算法（至少有时）找到正确的mu（即-10,5,10），其中std~ = 1.0。但是，似乎算法永远无法做到这一点。任何帮助表示赞赏

更新

Ted Kim的修复似乎解决了这个问题。我在计算std时忘了采用平方根。如果有人有兴趣，这里是更新代码的链接：link

Answer 1

sigma是标准差，但代码中的sigma是变体（sigma ** 2）。

试

sigma = np.sqrt(np.sum(w_mt*pow(x - mu[:,np.newaxis],2),axis=1) / np.sum(w_mt,axis=1))

Answer 2

通过将平均mu视为向量而将Sigma视为协方差矩阵而不是方差，这也可以推广到更高维度。

例如，Sigma_k高斯的协方差矩阵k可以在 EM 的 M-step 中计算 MLE ，其中包含以下内容，其中phi[i,k]表示i数据点属于k群集的概率，在 E-step <中计算/ strong>之前的 EM 算法。

Sigma[k,:,:] = sum([phi[i,k]*np.outer(X[i,:]-mu[k,:], X[i,:]-mu[k,:]) for i in range(n)]) / n_k[k]。

下图显示了使用k=3群集中心的 GMM-EM 软群集的工作原理：

有关详情，请参阅this博文。

期望最大化（GMM-EM）永远不会找到正确的参数。（高斯混合）

2 个答案:

期望最大化（GMM-EM）永远不会找到正确的参数。 （高斯混合）

2 个答案:

期望最大化（GMM-EM）永远不会找到正确的参数。（高斯混合）