Question

我想获得numpy.array向量（或pandas.Series）的所有唯一组合。我使用了itertools.combinations，但它非常慢。对于大小（1000，）的数组，它需要几个小时。这是我使用itertools的代码（实际上我使用组合差异）：

def a(array):
    temp = pd.Series([])
    for i in itertools.combinations(array, 2):
        temp = temp.append(pd.Series(np.abs(i[0]-i[1])))
    temp.index=range(len(temp))
    return temp

如您所见，没有重复 !! sklearn.utils.extmath.cartesian真的很快，很好，但它提供了我不想要的重复！我需要帮助重写上面的函数而不使用itertools和更大的速度来处理大型向量。

Answer 1

您可以使用二元运算（在此为减法，如示例所示）采用笛卡尔积上形成的矩阵的上三角形部分：

import numpy as np
n = 3
a = np.random.randn(n)
print(a)
print(a - a[:, np.newaxis])
print((a - a[:, np.newaxis])[np.triu_indices(n, 1)])

给出

[ 0.04248369 -0.80162228 -0.44504522]
[[ 0.         -0.84410597 -0.48752891]
 [ 0.84410597  0.          0.35657707]
 [ 0.48752891 -0.35657707  0.        ]]
[-0.84410597 -0.48752891  0.35657707]

n = 1000（并且输出通过管道输送到/dev/null），运行时间为0.131秒在我相对适中的笔记本电脑上。

Answer 2

用于随机数组：

    import numpy as np
    b=np.random.randint(0,8,((6,)))
    #array([7, 0, 6, 7, 1, 5])
    pd.Series(list(it.combinations(np.unique(b),2)))

给出

    0    (0, 1)
    1    (0, 5)
    2    (0, 6)
    3    (0, 7)
    4    (1, 5)
    5    (1, 6)
    6    (1, 7)
    7    (5, 6)
    8    (5, 7)
    9    (6, 7)
    dtype: object

Python，Numpy：numpy.array（）向量的所有UNIQUE组合

2 个答案: