在2D高斯模型中考虑噪声

时间:2020-05-03 15:08:36

标签: python machine-learning cluster-analysis gaussian mixture-model

我需要将2D高斯嵌入到基本均匀的噪声中,如下图左图所示。我尝试将sklearn.mixture.GaussianMixture与两个组件一起使用(底部的代码),但这显然失败了,如下面的右图所示。

enter image description here

我想将概率分配给属于2D高斯的每个元素以及统一的背景噪声。这似乎很简单,但是我发现没有“简单”的方法。

有什么建议吗?不需要是GMM,我可以接受其他方法/包。


import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture

# Generate 2D Gaussian data
N_c = 100
xy_c = np.random.normal((.5, .5), .05, (N_c, 2))

# Generate uniform noise
N_n = 1000
xy_n = np.random.uniform(.0, 1., (N_n, 2))

# Combine into a single data set
data = np.concatenate([xy_c, xy_n])

# fit a Gaussian Mixture Model with two components
model = mixture.GaussianMixture(n_components=2, covariance_type='full')
model.fit(data)
probs = model.predict_proba(data)
labels = model.predict(data)
# Separate the two clusters for plotting
msk0 = labels == 0
c0, p0 = data[msk0], probs[msk0].T[0]
msk1 = labels == 1
c1, p1 = data[msk1], probs[msk1].T[1]

# Plot
plt.subplot(121)
plt.scatter(*xy_n.T, c='b', alpha=.5)
plt.scatter(*xy_c.T, c='r', alpha=.5)
plt.xlim(0., 1.)
plt.ylim(0., 1.)

plt.subplot(122)
plt.scatter(*c0.T, c=p0, alpha=.75)
plt.scatter(*c1.T, c=p1, alpha=.75)
plt.colorbar()
# display predicted scores by the model as a contour plot
X, Y = np.meshgrid(np.linspace(0., 1.), np.linspace(0., 1.))
XX = np.array([X.ravel(), Y.ravel()]).T
Z = -model.score_samples(XX)
Z = Z.reshape(X.shape)
plt.contour(X, Y, Z)

plt.show()

1 个答案:

答案 0 :(得分:1)

我认为核密度可以帮助您定位高斯并排除高斯以外的点(例如在密度较小的区域)

这是示例代码:

import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
from sklearn.neighbors import KernelDensity


# Generate 2D Gaussian data
N_c = 100
xy_c = np.random.normal((.2, .2), .05, (N_c, 2))

# Generate uniform noise
N_n = 1000
xy_n = np.random.uniform(.0, 1., (N_n, 2))

# Combine into a single data set
data = np.concatenate([xy_c, xy_n])
print(data.shape)

model = KernelDensity(kernel='gaussian',bandwidth=0.05)
model.fit(data)
probs = model.score_samples(data)

# Plot
plt.subplot(131)
plt.scatter(*xy_n.T, c='b', alpha=.5)
plt.scatter(*xy_c.T, c='r', alpha=.5)

plt.xlim(0., 1.)
plt.ylim(0., 1.)

# plot kernel score
plt.subplot(132)
plt.scatter(*data.T, c=probs, alpha=.5)

# display predicted scores by the model as a contour plot
X, Y = np.meshgrid(np.linspace(0., 1.), np.linspace(0., 1.))
XX = np.array([X.ravel(), Y.ravel()]).T
Z = -model.score_samples(XX)
Z = Z.reshape(X.shape)
plt.contour(X, Y, Z)
plt.xlim(0,1)
plt.ylim(0,1)

# plot kernel score with threshold
plt.subplot(133)
plt.scatter(*data.T, c=probs>0.5, alpha=.5) # here you can adjust the threshold
plt.colorbar()
plt.xlim(0,1)
plt.ylim(0,1)

这是输出图:

Output figure

我更改了高斯的中心,以确保我的代码正常工作。右侧面板显示带有阈值的内核分数,在您的情况下可以使用该阈值过滤掉高斯范围之外的嘈杂数据,但是您不能过滤高斯范围内的噪声。