sklearn KernelDensity score_samples给出大于0的值

时间:2019-05-23 10:13:25

标签: python-3.x scikit-learn gaussian kernel-density

我正在使用sklearn KernelDensity函数来估计密度,然后使用score_samples函数在某些点上评估pdf,但是score_samples函数返回的值比0,这不应该这样,因为按照documentation,它返回log(density) [文档:对数(密度)评估值的数组。这些被归一化为概率密度,因此对于高维数据,值将较低。]

from sklearn.neighbors.kde import KernelDensity
import numpy as np

data = np.random.normal(0, 1, [50, 10]) #50 data points, dimension=10
data_kde = KernelDensity(kernel="gaussian", bandwidth=0.2).fit(data)
output = data_kde.score_samples(data)

#print(output)
output = array([19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645])

由于密度位于[0,1]中,因此log(density)应该在(-Inf, 0]之间,与上面显示的19.9448不同。

1 个答案:

答案 0 :(得分:0)

概率密度不必在[0,1]之间。它们是密度,而不是确切的概率。维基百科页面很好地概述了pdf。

  

https://en.wikipedia.org/wiki/Probability_density_function