Question

我正在尝试成熟地优化类似于这个mwe代码的东西。我正在使用列表理解，但相信我应该能够以某种方式对其进行矢量化。

A = numpy.arange(20000)
B = numpy.arange(20000, 50000)
C = [bin(i^j).count('1') for i in A for j in B].count(1)

（这是搜索群组A中距离群组B中的成员距离1的所有成员。）大小是正确的数量级，但我会重复整个序列约100次。 C的平均大小预计约为10k。

我使用uhamming为bin(i^j).count('1')创建通用函数numpy.frompyfunc未成功;我正在

module 'numpy' has no attribute 'uhamming'

我很高兴让C成为一个数组。谢谢你的期待！

仅供参考，这是使用（2000）和（2000,5000）的最小化版本的分析输出：

     12000007 function calls in 5.442 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    2.528    2.528    5.342    5.342 <string>:4(<listcomp>)
6000000    1.527    0.000    1.527    0.000 {method 'count' of 'str' objects}
6000000    1.287    0.000    1.287    0.000 {built-in method builtins.bin}
    1    0.089    0.089    0.089    0.089 {method 'count' of 'list' objects}
    1    0.011    0.011    5.442    5.442 <string>:2(<module>)
    1    0.000    0.000    5.442    5.442 {built-in method builtins.exec}
    2    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.arange}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Answer 1

错误表明你正在使用的某个地方

numpy.unhamming

快速而肮脏地使用frompyfunc将是：

In [126]: def unhamming(i,j):
     ...:     return bin(i^j).count('1')
     ...: 
In [127]: f = np.frompyfunc(unhamming, 2,1)

该函数需要2个输入，并返回1.

使用较小的数组：

In [124]: A = np.arange(200)
In [125]: B = np.arange(200,500)
In [128]: C = [bin(i^j).count('1') for i in A for j in B].count(1)
In [131]: C
Out[131]: 336

使用＆＃39;矢量化＆＃39;功能：

In [129]: f(A,B[:,None])
Out[129]: 
array([[3, 4, 4, ..., 3, 3, 4],
       [4, 3, 5, ..., 2, 4, 3],
       [4, 5, 3, ..., 4, 2, 3],
       ..., 
       [6, 5, 7, ..., 4, 6, 5],
       [6, 7, 5, ..., 6, 4, 5],
       [7, 6, 6, ..., 5, 5, 4]], dtype=object)

要知道有多少1个将其转换为列表，或者使用numpy sum。

In [130]: _.ravel().tolist().count(1)
Out[130]: 336
In [132]: (f(A,B[:,None])==1).sum()
Out[132]: 336

速度基本相同

In [133]: timeit C = [bin(i^j).count('1') for i in A for j in B].count(1)
45 ms ± 380 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [134]: timeit (f(A,B[:,None])==1).sum()
46.1 ms ± 97.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

有时frompyfunc比直接迭代提高了2倍的速度提升。但它仍然需要多次调用unhamming函数。这不是真正的矢量化＆＃39; （在将计算转移到C级代码的意义上）。

我怀疑是否可以使用numpy表达式进行相同的计算，并针对A广播B[:,None]。但我会把它留给另一个时间或海报。

C = A ^ B[:,None]

执行部分功能。但我还没有找到适用于数组的bin版本（np.binary_repr没有帮助）。

In [160]: f1 = np.frompyfunc(lambda x: bin(x).count('1'),1,1)
In [161]: f1(A^B[:,None])
Out[161]: 
array([[3, 4, 4, ..., 3, 3, 4],
       [4, 3, 5, ..., 2, 4, 3],
       [4, 5, 3, ..., 4, 2, 3],
       ..., 
       [6, 5, 7, ..., 4, 6, 5],
       [6, 7, 5, ..., 6, 4, 5],
       [7, 6, 6, ..., 5, 5, 4]], dtype=object)
In [162]: (f1(A^B[:,None])==1).sum()
Out[162]: 336
In [163]: timeit (f1(A^B[:,None])==1).sum()
37 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

只是一个小小的改进。

搜索[numpy]和汉明出现了这个问题：Fastest way to get hamming distance for integer array

这是@Divaker答案之一的改编：

def foo(A, B):
    x = np.bitwise_xor(A,B[:,None])
    r = (1 << np.arange(15))[:,None]
    xr = (r[:,None]&x)
    xrc = (xr>0).sum(axis=0)
    return (xrc==1).sum()
In [280]: foo(A,B)
Out[280]: 336

可以调整它，例如调整r的大小，更改广播和重新整形。但最后的总和匹配。

使用非通用函数向量化列表理解

1 个答案: