如何在贴合后获得sklearn GMM中每个组件的标准偏差

时间:2016-11-29 19:30:38

标签: python-2.7 scikit-learn

如何在贴合后获得sklearn GMM中每个组件的标准偏差?

model.fit(dataSet)
model.means_ is the means of each components.
model.weights_ is the co-efficient of each components.

哪里可以找到每个高斯分量的偏差?

谢谢,

2 个答案:

答案 0 :(得分:0)

model.covariances_会为您提供协方差信息。

返回协方差取决于covariance_type,它是GMM的参数。

例如,如果covariance_type = 'diag',则返回协方差是[pxq]矩阵,其中p表示高斯分量的数量,q是输入的维数

有关详细信息,请参阅http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_covariances.html

答案 1 :(得分:0)

您可以在协方差矩阵的对角线上获得方差:第一个对角元素为sigma_x,第二个对角元素为sigma_y。

基本上,如果您有N个混合物,而C是您的高斯混合物实例:

cov = C.covariances_
[ np.sqrt(  np.trace(cov[i])/N) for i in range(0,N) ]

将为您提供每种混合物的标准差。

我在下面的模拟中进行了检查,它似乎以几百或数千个点收敛于实际值的1%左右:

# -*- coding: utf-8 -*-
"""
Created on Wed Jul 24 12:37:38 2019

- - -

Simulate two point - gaussian normalized - distributions.
Use GMM cluster fit and look how covariance elements are related to sigma.


@author: Adrien MAU / ISMO & Abbelight

"""

import numpy as np
import matplotlib
import matplotlib.pyplot as plt

import sklearn
from sklearn import cluster, mixture

colorsList = ['c','r','g']
CustomCmap = matplotlib.colors.ListedColormap(colorsList)


sigma1=16
sigma2=4
npoints = 2000

s = (100,100)
x1 = np.random.normal( 50, sigma1, npoints )
y1 = np.random.normal( 70, sigma1, npoints )

x2 = np.random.normal( 20, sigma2, npoints )
y2 = np.random.normal( 50, sigma2, npoints )

x = np.hstack((x1,x2))
y = np.hstack((y1,y2))


C = mixture.GaussianMixture(n_components= 2 , covariance_type='full'  )
subdata = np.transpose( np.vstack((x,y)) )
C.fit( subdata )

m = C.means_
w = C.weights_
cov = C.covariances_


print('\n')
print( 'test var 1 : ' , np.sqrt(  np.trace( cov[0]) /2 ) )
print( 'test var 2 : ' , np.sqrt(  np.trace( cov[1]) /2 ) )

plt.scatter(x1,y1)
plt.scatter(x2,y2)

plt.scatter( m[0,0], m[0,1])
plt.scatter( m[1,0], m[1,1])
plt.title('Initial data, and found Centroid')
plt.axis('equal')



gmm_sub_sigmas = [ np.sqrt(  np.trace(cov[i])/2) for i in range(0,2) ]
xdiff= (np.transpose(np.repeat([x],2 ,axis=0)) - m[:,0]) / gmm_sub_sigmas
ydiff= (np.transpose(np.repeat([y],2 ,axis=0)) - m[:,1]) / gmm_sub_sigmas
#            distances = np.hypot(xdiff,ydiff)  #not the effective distance for gaussian distributions...
distances = 0.5*np.hypot(xdiff,ydiff) + np.log(gmm_sub_sigmas)  # I believe this is a good estimate of closeness to a gaussian distribution
res2 = np.argmin( distances , axis=1) 

plt.figure()
plt.scatter(x,y, c=res2, cmap=CustomCmap )
plt.axis('equal')
plt.title('GMM Associated data')