Question

我有以下方形DataFrame：

In [104]: d
Out[104]:
           a          b          c          d          e
a        inf   5.909091   8.636364   7.272727   4.454545
b   7.222222        inf   8.666667   7.666667   1.777778
c  15.833333  13.000000        inf   9.166667  14.666667
d   4.444444   3.833333   3.055556        inf   4.833333
e  24.500000   8.000000  44.000000  43.500000        inf

这是修改后的距离矩阵，表示物体之间的成对距离[＆＃39; a＆＃39;＆＃39;＆＃39;＆＃39; c＆＃39;＆＃39; d＆＃39; ，＆＃39; e＆＃39;]，其中每行除以系数（权重），并将所有对角线元素人为设置为np.inf。

如何以高效（矢量化）的方式获取如下所示的索引列表/向量：

d   # index of minimal element in the column `a`
a   # index of minimal element in the column `b` (excluding already found indices: [d]) 
b   # index of minimal element in the column `c` (excluding already found indices: [d,a]) 
c   # index of minimal element in the column `d` (excluding already found indices: [d,a,b])

即。在第一列中我们找到了索引d，所以当我们在第二列中搜索最小值时，我们排除了索引为d的行（先前在第一列中找到） - 这将是{{ 1}}。

当我们在第三列中寻找最小值时，我们将排除先前找到的索引（a）的行 - 这将是['d','a']。

当我们在第四列中寻找最小值时，我们将排除先前找到的索引（b）的行 - 这将是['d','a','b']。

我不需要对角线（c）元素，因此生成的列表/向量将包含inf元素。

即。结果列表将如下所示：d.shape[0] - 1或在Numpy解决方案的情况下，相应的数字索引：['d','a','b','c']

使用慢速[3,0,1,2]解决方案来解决这个问题不是问题，但我无法绕过矢量化（快速）解决方案......

Answer 1

循环是我能在这里看到的唯一解决方案。

但您可以使用numpy + numba进行优化。

from numba import jit

@jit(nopython=True)
def get_min_lookback(A, res):
    for i in range(A.shape[1]):
        res[i] = np.argmin(A[:, i])
        A[res[i], :] = np.inf
    return res

arr = df.values

get_min_lookback(arr, np.zeros(arr.shape[1], dtype=int))

# array([3, 0, 1, 2, 0])

Answer 2

这是我的解决方案，我确信这不是最好的解决方案：

结果列表：

res = []

main函数，它将在列中搜索最小值，排除以前找到的索引并将找到的索引添加到res：

def f(col):
    ret = col.loc[~col.index.isin(res)].idxmin()
    if ret not in res:
        res.append(ret)

将函数应用于每列：

_ = d.apply(f)

结果：

In [55]: res
Out[55]: ['d', 'a', 'b', 'c', 'e']

排除最后一个元素：

In [56]: res[:-1]
Out[56]: ['d', 'a', 'b', 'c']

* Vectorized *方法查找每列的最小值索引（不包括所有已找到的索引）

2 个答案: