Question

使用探索性数据，这将是最好的聚类方法？目前我使用的是HDBSCAN。问题是我在R中使用HDBSCAN得到的结果与在Python中通过HDSCBAN获得的结果不同。

R版：https://rdrr.io/cran/largeVis/man/hdbscan.html

链接到R的数据文件：https://www.dropbox.com/s/731hjrj0geibi3f/test.csv?dl=0

test_r <- data.frame("data")
vis <- largevis::largevis(test_r)
cluster <- largevis::hdbscan(vis)
largevis::gplot(cluster,t(vis$coords), text = TRUE)

OUTPUT of R

Python版本：https://github.com/scikit-learn-contrib/hdbscan/tree/master/hdbscan

链接到Python的数据文件：https://www.dropbox.com/s/640elbjr1xt8q3e/test_projection.txt?dl=0

%pylab
import hdbscan
import numpy as np
import seaborn as sns
import matplotlib.pyploy as plt
import pandas as pd

projection = np.loadtxt("data")
projection = projection[1:1001,:]

clusterer = hdbscan.HDBSCAN(min_cluster_size=20, gen_min_span_tree=True)
clusterer.fit(projection)

palette = sns.color_palette()
cluster_colors = [sns.desaturate(palette[col], sat)
              if col >= 0 else (0.5, 0.5, 0.5) for col, sat in
              zip(clusterer.labels_, clusterer.probabilities_)]

fig = plt.scatter(panc_projection.T[0], panc_projection.T[1], c= cluster_colors)

OUTPUT of Python

两个版本的输出之间存在差异的原因是什么以及如何根据结果确定准确度？（即簇数，簇大小和噪声）

http://hdbscan.readthedocs.io/en/latest/basic_hdbscan.html

聚类算法：R中的HDBSCAN与Python中的HDBSCAN？

0 个答案: