如何使用numpy找到两个非常大的矩阵的行之间的成对差异?

时间:2017-03-24 04:18:55

标签: python numpy matrix numpy-broadcasting

给定两个矩阵,我想计算所有行之间的成对差异。每个矩阵有1000行和100列,所以它们相当大。我尝试使用for循环和纯广播,但for循环似乎工作得更快。难道我做错了什么?这是代码:

from numpy import *
A = random.randn(1000,100)
B = random.randn(1000,100)

start = time.time()
for a in A:
   sum((a - B)**2,1)
print time.time() - start

# pure broadcasting
start = time.time()
((A[:,newaxis,:] - B)**2).sum(-1)
print time.time() - start

广播方法需要大约1秒的时间,对于大型矩阵来说甚至更长。任何想法如何加速纯粹使用numpy?

4 个答案:

答案 0 :(得分:5)

这是另一种表现方式:

  

(a-b)^ 2 = a ^ 2 + b ^ 2 - 2ab

前两个词为np.einsum,第三个词为dot-product -

import numpy as np

np.einsum('ij,ij->i',A,A)[:,None] + np.einsum('ij,ij->i',B,B) - 2*np.dot(A,B.T)

运行时测试

方法 -

def loopy_app(A,B):
    m,n = A.shape[0], B.shape[0]
    out = np.empty((m,n))
    for i,a in enumerate(A):
       out[i] = np.sum((a - B)**2,1)
    return out

def broadcasting_app(A,B):
    return ((A[:,np.newaxis,:] - B)**2).sum(-1)

# @Paul Panzer's soln
def outer_sum_dot_app(A,B):
    return np.add.outer((A*A).sum(axis=-1), (B*B).sum(axis=-1)) - 2*np.dot(A,B.T)

# @Daniel Forsman's soln
def einsum_all_app(A,B):
    return np.einsum('ijk,ijk->ij', A[:,None,:] - B[None,:,:], \
                                        A[:,None,:] - B[None,:,:])

# Proposed in this post
def outer_einsum_dot_app(A,B):
    return np.einsum('ij,ij->i',A,A)[:,None] + np.einsum('ij,ij->i',B,B) - \
                                                            2*np.dot(A,B.T)

计时 -

In [51]: A = np.random.randn(1000,100)
    ...: B = np.random.randn(1000,100)
    ...: 

In [52]: %timeit loopy_app(A,B)
    ...: %timeit broadcasting_app(A,B)
    ...: %timeit outer_sum_dot_app(A,B)
    ...: %timeit einsum_all_app(A,B)
    ...: %timeit outer_einsum_dot_app(A,B)
    ...: 
10 loops, best of 3: 136 ms per loop
1 loops, best of 3: 302 ms per loop
100 loops, best of 3: 8.51 ms per loop
1 loops, best of 3: 341 ms per loop
100 loops, best of 3: 8.38 ms per loop

答案 1 :(得分:2)

这是一种避免环路和大中间体的解决方案:

from numpy import *
import time

A = random.randn(1000,100)
B = random.randn(1000,100)

start = time.time()
for a in A:
   sum((a - B)**2,1)
print time.time() - start

# pure broadcasting
start = time.time()
((A[:,newaxis,:] - B)**2).sum(-1)
print time.time() - start

#matmul
start = time.time()
add.outer((A*A).sum(axis=-1), (B*B).sum(axis=-1)) - 2*dot(A,B.T)
print time.time() - start

打印:

0.546781778336
0.674743175507
0.10723400116

答案 2 :(得分:2)

np.einsum

的另一项工作
np.einsum('ijk,ijk->ij', A[:,None,:] - B[None,:,:], A[:,None,:] - B[None,:,:])

答案 3 :(得分:0)

类似于@ paul-panzer,一种计算任意维数组的成对差异的通用方法是广播如下:

让v成为大小为(n,d)的NumPy数组:

import numpy as np
v_tiled_across = np.tile(v[:, np.newaxis, :], (1, v.shape[0], 1))
v_tiled_down = np.tile(v[np.newaxis, :, :], (v.shape[0], 1, 1))
result = v_tiled_across - v_tiled_down

为更好地了解正在发生的情况,请想象v的每个d维行都像旗杆一样支撑起来,然后上下复制。现在,当您进行逐分量减法时,您将获得每个成对的组合。

__

还有scipy.spatial.distance.pdist,它以成对方式计算指标。

from scipy.spatial.distance import pdist, squareform
pairwise_L2_norms = squareform(pdist(v))