从Wikipedia了解有关DBscan的信息。本文具有以下伪代码:
DBSCAN(DB, distFunc, eps, minPts) {
C = 0 /* Cluster counter */
for each point P in database DB {
if label(P) ≠ undefined then continue /* Previously processed in inner loop */
Neighbors N = RangeQuery(DB, distFunc, P, eps) /* Find neighbors */
if |N| < minPts then { /* Density check */
label(P) = Noise /* Label as Noise */
continue
}
C = C + 1 /* next cluster label */
label(P) = C /* Label initial point */
Seed set S = N \ {P} /* Neighbors to expand */
for each point Q in S { /* Process every seed point */
if label(Q) = Noise then label(Q) = C /* Change Noise to border point */
if label(Q) ≠ undefined then continue /* Previously processed */
label(Q) = C /* Label neighbor */
Neighbors N = RangeQuery(DB, distFunc, Q, eps) /* Find neighbors */
if |N| ≥ minPts then { /* Density check */
S = S ∪ N /* Add new neighbors to seed set */
}
}
}
}
我很确定| N |就是N的数量。
那行是什么:
Seed set S = N \ {P} /* Neighbors to expand */
是什么意思?我认为 S是像对象列表一样的种子集。 N \ {P}是什么意思?
答案 0 :(得分:2)
\
是补码运算,因此N \ {P}
是没有点N
的邻居P
的集合。表示P
周围RangeQuery(DB, distFunc, P, eps)
周围的所有点,由P
返回(查询结果包括pop year type value
pop3 1980 prev 1.42
pop4 1988 prev 1.53
pop6 1981 prev 1.42
pop8 1980 prev 1.7
pop3 1980 pops 977
pop4 1988 pops 822
pop6 1981 pops 1028
pop8 1980 pops 935
)。