scikit-learn - 特征集维数缩减中的affinity =“预先计算”是什么意思？ - Thinbug

特征集维数缩减中的affinity =“预先计算”是什么意思？

时间：2018-09-27 13:45:58

标签： scikit-learn feature-extraction feature-selection dimensionality-reduction

affinity ='precomputed'在特征集维降维（scikit-learn）中是什么意思，它是如何使用的？与使用其他相似性选项（例如“ euclidean”，“ l1”，“ l2”或“ manhattan”）相比，我得到的结果要好得多，但是，我不确定此“预计算”的实际含义以及是否必须提供一些“预先计算”的功能集聚算法？ “预先计算”实际上是什么意思？

除了经过预处理（缩放）的原始数据（numpy数组）之外，我没有传递任何东西。在具有特征集的fit_transform之后，将结果传递给Birch聚类算法，与提到的其他相似性相比，我得到了更好的结果。结果与PCA相当，但是内存消耗开销低得多，因此我将使用功能集结来减少维数，但是我担心自己做错了吗？

1 个答案:

答案 0 :(得分：3)

很好的问题。

`affinity == 'precomputed'`表示使用包含原始数据`distance matrix`的上三角形的扁平数组。

参考（source code）：

    if affinity == 'precomputed':
        # for the linkage function of hierarchy to work on precomputed
        # data, provide as first argument an ndarray of the shape returned
        # by pdist: it is a flat array containing the upper triangular of
        # the distance matrix.
        i, j = np.triu_indices(X.shape[0], k=1)
        X = X[i, j]
    elif affinity == 'l2':
        # Translate to something understood by scipy
        affinity = 'euclidean'
    elif affinity in ('l1', 'manhattan'):
        affinity = 'cityblock'