Question

给出一对一维坐标（或段）的列表，如下一个：

[1]: 1200, 1210
[2]: 1212, 1222
[3]: 1190, 1200
[4]: 300, 310
...
[n]: 800, 810

（您可以将每对的中心代表每个元素）我想知道我可以使用什么算法或什么样的算法来找到“热点”或群集。

热点是一个包含一定数量项目的段（假设是k）。

例如[3]，[1]和[2]属于同一个组，结果列表如下：

[1']: 1190, 1222 ([1], [2], [3])

（开始，结束，包含元素）

Answer 1

问题的定义并不是很明确，但也许这会对你有帮助。

KMeans是一种按距离聚类项目的方法。 Scikit-learn有一个实现，很容易使用。有关示例，请参阅http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_digits.html#example-cluster-plot-kmeans-digits-py。

这将允许您定义要查找的群集的数量。但是，您无法知道每个群集中最终会有多少个点。无论如何，这是一个小例子：

from sklearn.cluster import KMeans

data = [[1200, 1210], [1212, 1222], [1190, 1200], [300, 310], [800, 810]]
centers = [[sum(x) / len(x)] for x in data]

clf = KMeans(n_clusters=3)

clf.fit(centers)

for points in data:
    center = sum(points) / len(points)
    print points, center, clf.predict([center])

输出：

[1200,1210] 1205 [1]

[1212,1222] 1217 [1]

[1190,1200] 1195 [1]

[300,310] 305 [0]

[800,810] 805 [2]

编辑：SKLearn中提供的另一种算法是Affinity Propagation，它不需要预先设置簇的数量。我不知道这是如何运作的，但你应该能够自己找到一些信息。

示例：

from sklearn.cluster import AffinityPropagation
import numpy as np

data = [[1200, 1210], [1212, 1222], [1190, 1200], [300, 310], [800, 810]]
centers = np.array([[sum(x) / len(x)] for x in data])

clf = AffinityPropagation()


for (points, cluster) in zip(data, clf.fit_predict(centers)):
    center = sum(points) / len(points)
    print points, center, cluster

输出：

[1200,1210] 1205 0

[1212,1222] 1217 0

[1190,1200] 1195 0

[300,310] 305 1

[800,810] 805 2

在python中的一维列表中查找“热点”

1 个答案: