我正在使用sklearn
KernelDensity
函数来估计密度,然后使用score_samples
函数在某些点上评估pdf,但是score_samples
函数返回的值比0,这不应该这样,因为按照documentation,它返回log(density)
[文档:对数(密度)评估值的数组。这些被归一化为概率密度,因此对于高维数据,值将较低。]
from sklearn.neighbors.kde import KernelDensity
import numpy as np
data = np.random.normal(0, 1, [50, 10]) #50 data points, dimension=10
data_kde = KernelDensity(kernel="gaussian", bandwidth=0.2).fit(data)
output = data_kde.score_samples(data)
#print(output)
output = array([19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645,
19.94484645, 19.94484645, 19.94484645, 19.94484645, 19.94484645])
由于密度位于[0,1]中,因此log(density)
应该在(-Inf, 0]
之间,与上面显示的19.9448
不同。
答案 0 :(得分:0)
概率密度不必在[0,1]之间。它们是密度,而不是确切的概率。维基百科页面很好地概述了pdf。