Question

假设我们有两个正态分布的线性组合。我想有人会把结果称为multimodal distribution。

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab

ls = np.linspace(0, 60, 1000)

distribution = mlab.normpdf(ls, 0, 5) + mlab.normpdf(ls, 20, 10)
distribution = (distribution * 1000).astype(int)
distribution = distribution/distribution.sum()

plt.plot(ls, distribution)

如您所见，我们正在使用两个具有参数(mu1 = 0, s1 = 5)和(mu2 = 20, s2 = 10)的正态分布的线性组合。但是，当然，我们通常不会事先知道这些参数。

我想知道如何估计或拟合这些参数（mus和sigmas）。我相信有一些方法可以做到这一点，但我还没找到。

Answer 1

您描述的问题是Gaussian Mixture model的特例。为了能够估计这些参数，您需要一些样本。如果您没有样本但是您获得了曲线，则可以根据曲线生成一些样本。然后，您可以使用Expectation–maximization algorithm来估算参数。 Scikit-learn有一种方法可以让你这样做：sklearn.mixture.GaussianMixture。您只需要提供样本，组件数量（import numpy as np array_length = 30 high = [(1, 2), 3, 4, 5, 6, (7, 8, 9)] medium = [10, 11, (12, 13), 14] low = [100, 101, 102, (103, 104)] # Initialise exntended lists new_high = [] new_medium = [] new_low = [] # Create extended lists as repeating units for x in range(3): np.random.shuffle(high) np.random.shuffle(medium) np.random.shuffle(low) new_high.extend(high) new_medium.extend(medium) new_low.extend(low) # Probability distribution for consuming the extended lists probability_array = np.random.choice( ['High', 'Medium', 'Low',], array_length, p=[4.0/7, 2.0/7, 1.0/7] ) # Our final sequence playlist = [] # Keep track of how far we got through each of the extended lists high_counter, medium_counter, low_counter = 0, 0, 0 for pick in probability_array: if pick == 'High': playlist.append(new_high[high_counter]) high_counter += 1 elif pick == 'Medium': playlist.append(new_medium[medium_counter]) medium_counter += 1 else: playlist.append(new_low[low_counter]) low_counter += 1 print(playlist)），在您的情况下为2，以及协方差类型，在您的情况下为n_components，因为您之前没有假设协方差矩阵。

Answer 2

您可能希望使用Expectation Maximization算法。

这是一种迭代方法，允许您拟合混合组件的模型。 scikit-learn中有一个非常方便的实现：GaussianMixture

我发现很难弄清楚如何为这个算法构建数据，所以我为你设置了一个样本： https://nbviewer.jupyter.org/gist/lhk/e566e2d6b67992eca062f9d96e2a14a2

拟合多模态分布

2 个答案: