假设我们有两个正态分布的线性组合。我想有人会把结果称为multimodal distribution。
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
ls = np.linspace(0, 60, 1000)
distribution = mlab.normpdf(ls, 0, 5) + mlab.normpdf(ls, 20, 10)
distribution = (distribution * 1000).astype(int)
distribution = distribution/distribution.sum()
plt.plot(ls, distribution)
如您所见,我们正在使用两个具有参数(mu1 = 0, s1 = 5)
和(mu2 = 20, s2 = 10)
的正态分布的线性组合。但是,当然,我们通常不会事先知道这些参数。
我想知道如何估计或拟合这些参数(mus和sigmas)。我相信有一些方法可以做到这一点,但我还没找到。
答案 0 :(得分:2)
您描述的问题是Gaussian Mixture model的特例。为了能够估计这些参数,您需要一些样本。如果您没有样本但是您获得了曲线,则可以根据曲线生成一些样本。然后,您可以使用Expectation–maximization algorithm来估算参数。 Scikit-learn有一种方法可以让你这样做:sklearn.mixture.GaussianMixture。您只需要提供样本,组件数量(import numpy as np
array_length = 30
high = [(1, 2), 3, 4, 5, 6, (7, 8, 9)]
medium = [10, 11, (12, 13), 14]
low = [100, 101, 102, (103, 104)]
# Initialise exntended lists
new_high = []
new_medium = []
new_low = []
# Create extended lists as repeating units
for x in range(3):
np.random.shuffle(high)
np.random.shuffle(medium)
np.random.shuffle(low)
new_high.extend(high)
new_medium.extend(medium)
new_low.extend(low)
# Probability distribution for consuming the extended lists
probability_array = np.random.choice(
['High', 'Medium', 'Low',],
array_length,
p=[4.0/7, 2.0/7, 1.0/7]
)
# Our final sequence
playlist = []
# Keep track of how far we got through each of the extended lists
high_counter, medium_counter, low_counter = 0, 0, 0
for pick in probability_array:
if pick == 'High':
playlist.append(new_high[high_counter])
high_counter += 1
elif pick == 'Medium':
playlist.append(new_medium[medium_counter])
medium_counter += 1
else:
playlist.append(new_low[low_counter])
low_counter += 1
print(playlist)
),在您的情况下为2,以及协方差类型,在您的情况下为n_components
,因为您之前没有假设协方差矩阵。
答案 1 :(得分:1)
您可能希望使用Expectation Maximization算法。
这是一种迭代方法,允许您拟合混合组件的模型。 scikit-learn中有一个非常方便的实现:GaussianMixture
我发现很难弄清楚如何为这个算法构建数据,所以我为你设置了一个样本: https://nbviewer.jupyter.org/gist/lhk/e566e2d6b67992eca062f9d96e2a14a2