将分数距离用作距离度量标准,以实现Python中高维数据集的K均值聚类

时间:2019-12-22 04:11:20

标签: python scikit-learn k-means

我有至少30个要素的数据集,可以通过k均值在Python上进行聚类。有谁知道如何将分数距离(https://medium.com/@amit02093/the-right-distance-approximation-in-high-dimensions-fractional-distances-bb0b8cd858b2)设置为K均值聚类的距离度量? SK学习似乎没有分数距离。预先谢谢你!

这是我从另一个stackoverflow帖子获得的分数距离代码:

def fractional_distance(p_coord_array, q_coord_array):
  # f is an arbitrary value, but must be greater than zero and 
  # less than one. In this case, I used 3/10. I took advantage
  # of the difference of cubes in this case, so that I wouldn't
  # encounter an overflow error.

    a = np.sum(np.array(p_coord_array, dtype=np.float64))
    b = np.sum(np.array(q_coord_array, dtype=np.float64))
    a2 = np.sum(np.power(p_coord_array, 2))
    ab = np.sum(p_coord_array) * np.sum(q_coord_array)
    b2 = np.sum(np.power(p_coord_array, 2))
    diffab = a - b
    suma2abb2 = a2 + ab + b2

    temp_dist = abs(diffab * suma2abb2)
    temp_dist = np.power(temp_dist, 1./10)

    dist = np.power(temp_dist, 10./3)
    return dist

0 个答案:

没有答案