Question

我正在使用Matlab和提供的代码 http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator/content/kde.m

聚类1D数据。特别是我估计了我的数据的密度函数，然后分析了峰值我应该能够识别形成我的数据集的不同分布。（正确？）然后，我根据这些聚类质心（密度函数中的峰值）对点进行聚类。

您可以在以下位置找到我的数据（z）： https://drive.google.com/file/d/0B3vXKJ_zYaCJLUE3YkVBMmFtbUk/view?usp=sharing

以及概率密度函数的图： https://drive.google.com/file/d/0B3vXKJ_zYaCJTjVobHRBOXo4Tmc/view?usp=sharing

我所做的只是运行

   [bandwidth,density,xmesh]=kde(z);

   plot(xmesh,density);

我得到的（请看第二个链接）是每个数据点密度函数的1个峰值.... 我认为我做错了... kde函数的默认参数可能是原因吗？

kde(data,n,MIN,MAX)
%     data    - a vector of data from which the density estimate is constructed;
%          n  - the number of mesh points used in the uniform discretization of the
%               interval [MIN, MAX]; n has to be a power of two; if n is not a power of two, then
%               n is rounded up to the next power of two, i.e., n is set to n=2^ceil(log2(n));
%               the default value of n is n=2^12;
%   MIN, MAX  - defines the interval [MIN,MAX] on which the density estimate is constructed;
%               the default values of MIN and MAX are:
%               MIN=min(data)-Range/10 and MAX=max(data)+Range/10, where Range=max(data)-min(data);

这可能吗？你能告诉我应该在哪个基础上改变它们吗？

Answer 1

您在问题中指出了解决方案。文档建议该算法设置从数据创建的2 ^ N峰值的上限。默认值（16k或2 ^ 14）大于您提供的点数（~8k），导致“尖刻”行为。

如果您改为运行

 [bandwidth,density,xmesh]=kde(z,2^N);

对于2 ^ N的不同值（函数需要2的幂，必须是FFT的东西），你会得到如下图：

enter image description here

您可以根据它选择合适的N值。

用于聚类1维数据的核密度估计

1 个答案: