可以在1D阵列上使用dbscan吗?

时间:2018-04-06 15:28:39

标签: r dbscan

我想在阵列上找到簇。我试过这段代码:

mydata <- C(0.067238904, -0.102679881, 0.01940899, -0.131117488, -0.214517613, 0.157258923, 0.036706008, 0.016978233, 0.116067734, 4.743973742, 14.45681545, 19.79653307, 19.63551697, 14.75640964, 14.49508407, 29.57162957, 34.35154035, 24.5891771, 19.5566106, 19.77786917, 19.48045239, 19.39253524, 19.6119075, 19.37288854, 14.46814558, 5.045143817, -0.179989144, 0.028726364, 5.095571357, 9.611555878, 9.782350203, 9.816313554, 4.669270539, 0.168666591, 0.145820734, 0.098501045, 0.227520096, 9.570195928, 19.3275607, 9.893992329, -0.183070026, -0.234127009, 0.009692396, -0.043350227, 0.086534462, 4.940506347, 9.682493476, 9.6797441, 9.912886934, 4.702649696, 0.126017184, 0.067977594, 9.808998855, 19.74575552, 9.908506244, 0.078706378, 9.901372568, 19.48938819, 19.75414373, 19.30717806, 9.715180742, 4.753063059, 14.84621102, 24.5142621, 24.22609497, 9.819711948, 4.860427965, 19.62910875, 29.48940595, 24.46262038, 24.39358348, 29.31943171, 19.67473155, 14.5374882, 29.39737594, 34.29607172, 29.09359081, 29.21900907, 24.33754818, 19.62927235, 24.50864647, 29.55191414, 29.15532645, 29.23586306, 19.69262392, 14.88864931, 29.31430615, 39.08977936, 34.37013456, 29.28457452, 29.09823037, 24.59405531, 19.72552198, 24.28197776, 34.20368783, 38.96958544, 29.5214294, 14.42708676, 19.5855616, 29.42845242, 24.59078733, 29.32780403, 34.19078719, 19.59049443, 14.89861361, 24.50865539, 29.34039008, 24.27815921, 14.78998033, 14.58721547, 29.52443582, 29.56073152, 19.30874611, 19.30472237, 9.912708087, 9.741318791, 29.23817381, 29.24338455, 14.66415896, 14.82204758, 24.32628072, 24.36577297, 19.5338725, 19.51281431, 19.57161821, 19.73853609, 19.5444779, 29.51936609, 29.52085292, 9.840828548, 9.95537852, 19.3793856, 19.70600151, 29.4517574, 29.56955801, 14.7456921, 19.53452657, 24.66074808, 19.7398255, 24.30808533, 19.70857809, 9.841699767, 24.28906266, 33.98590267, 24.36929409, 19.34572709, 9.626587523, 9.854661829, 24.54829185, 19.61713169, 19.73651064, 34.29485221, 24.63946819, 9.679601386, 19.60449283, 24.45146344, 19.30179531, 19.72184805, 14.43924964, 19.42170776, 29.0984513, 19.78242071, 14.57892748, 19.39415279, 14.88312006, 19.55170865)
mydata <- matrix(mydata, nrow = 1)

library("dbscan")

db <- dbscan(mydata, eps = 1)

¿dbscan需要2D数据吗?

  • 代码“有效”,但我得到零集群:db.cluster = 0
  • 我没有错误。
  • 如何设置参数eps?总的来说,我无法预测这个价值。

3 个答案:

答案 0 :(得分:0)

问题是矩阵的格式。你有1行和166列。但是dbscan会将每个视为一个数据点,因此看起来你在166维空间中有1个点。你想要

mydata <- matrix(mydata, ncol = 1)
db <- dbscan(mydata, eps = 1)
DBSCAN clustering for 166 objects.
Parameters: eps = 1, minPts = 5
The clustering contains 8 cluster(s) and 2 noise points.

 0  1  2  3  4  5  6  7  8 
 2 23  8 17 42 25  7 21 21

答案 1 :(得分:0)

是的,你可以。

但结果本质上是核密度估计的一个非常原始的变体,并且通常的库将需要更慢的挂起。

我宁愿使用KDE。

答案 2 :(得分:0)

Sklearn的DBSCAN实施没有利用一维空间可能实现的一些显着提速。例如,可以通过对等排序的数组来计算距离,而不是通过计算全距离矩阵来计算。

我写了一个小程序包,该程序包具有(几乎)与sklearn的DBSCAN相同的接口,但速度明显更快。检出here