Question

我正在使用sklearn KernelDensity函数来估计密度，然后使用score_samples函数在某些点上评估pdf，但是score_samples函数返回的值比0，这不应该这样，因为按照documentation，它返回log(density) [文档：对数（密度）评估值的数组。这些被归一化为概率密度，因此对于高维数据，值将较低。]

from sklearn.neighbors.kde import KernelDensity
import numpy as np

data = np.random.normal(0, 1, [50, 10]) #50 data points, dimension=10
data_kde = KernelDensity(kernel="gaussian", bandwidth=0.2).fit(data)
output = data_kde.score_samples(data)

#print(output)
output = array([19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
       19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645])

由于密度位于[0，1]中，因此log(density)应该在(-Inf, 0]之间，与上面显示的19.9448不同。

Answer 1

概率密度不必在[0,1]之间。它们是密度，而不是确切的概率。维基百科页面很好地概述了pdf。

https://en.wikipedia.org/wiki/Probability_density_function

sklearn KernelDensity score_samples给出大于0的值

1 个答案: