例如考虑集合:
a = [123,234,345,456,124,568,10000,15000,564]
然后我需要将上面的数字分组为
a = [123,234,345,456,124,568,564]
示例2:
a = [10000,12345,11000,10,5000,10500,13000]
然后我需要将上面的数字分组为
a = [10000,12345,11000,10500,13000]
所以基本的概念是将彼此接近的元素分组。那么请建议一般的python代码来做到这一点?
答案 0 :(得分:0)
如果您已经知道自己拥有多少个群集,建议您使用K-Means
群中的scikit-learn
群集:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=2) # numbers will be clustered into 2 clusters based on euclidean distance
numbers = [123,234,345,456,124,568,10000,15000,564]
X = [[x] for x in numbers]
kmeans.fit(X)
zip(numbers, kmeans.predict(X))
输出(第一个元素是数字,第二个是分配的集群ID):
[(123, 1),
(234, 1),
(345, 1),
(456, 1),
(124, 1),
(568, 1),
(10000, 0),
(15000, 0),
(564, 1)]
或者你的第二个例子:
[(10000, 0),
(12345, 0),
(11000, 0),
(10, 1),
(5000, 1),
(10500, 0),
(13000, 0)]
还有许多其他技术,例如选择最佳群集数量,但我担心这有点超出范围。希望有所帮助。