通过numpy

时间:2015-05-11 01:46:45

标签: python logging numpy filter

我有一个python函数,它接受两个列表,在两个输入中查找两个在同一索引处具有正值的对,并通过附加到这两个正值中的每一个来创建两个输出列表。我有一个工作职能:

def get_pairs_in_first_quadrant(x_in, y_in):
    """If both x_in[i] and y_in[i] are > 0 then both will appended to the output list. If either are negative
    then the pair of them will be absent from the output list.
    :param x_in: A list of positive or negative floats
    :param y_in: A list of positive or negative floats
    :return: A list of positive floats <= in length to the inputs.
    """
    x_filtered, y_filtered = [], []
    for x, y in zip(x_in, y_in):
        if x > 0 and y > 0:
            x_filtered.append(x)
            y_filtered.append(y)
    return x_filtered, y_filtered

如何使用numpy加快速度?

1 个答案:

答案 0 :(得分:3)

你可以通过简单地找到它们都是正面的指数来做到这一点:

import numpy as np

a = np.random.random(10) - .5
b = np.random.random(10) - .5

def get_pairs_in_first_quadrant(x_in, y_in):
    i = np.nonzero( (x_in>0) & (y_in>0) )   # main line of interest
    return x_in[i], y_in[i]

print a  # [-0.18012451 -0.40924713 -0.3788772   0.3186816   0.14811581 -0.04021951 -0.21278312 -0.36762629 -0.45369899 -0.46374929]
print b  # [ 0.33005969 -0.03167875  0.11387641  0.22101336  0.38412264 -0.3880842 0.08679424  0.3126209  -0.08760505 -0.40921421]
print get_pairs_in_first_quadrant(a, b)   # (array([ 0.3186816 ,  0.14811581]), array([ 0.22101336,  0.38412264]))

<小时/> 我对Jaime建议在不调用nonzero的情况下使用布尔索引感兴趣,所以我运行了一些时序测试。结果有点令人感兴趣,因为它们的优势比与正匹配的数量是非单调的,但基本上,至少对于速度而言,使用哪一个并不重要(尽管nonzero通常有点更快,可以快两倍):

threshold = .6
a = np.random.random(10000) - threshold
b = np.random.random(10000) - threshold

def f1(x_in, y_in):
    i = np.nonzero( (x_in>0) & (y_in>0) )   # main line of interest
    return x_in[i], y_in[i]

def f2(x_in, y_in):
    i = (x_in>0) & (y_in>0)  # main line of interest
    return x_in[i], y_in[i]

print threshold, len(f1(a,b)[0]), len(f2(a,b)[0])
print timeit("f1(a, b)", "from __main__ import a, b, f1, f2", number = 1000)
print timeit("f2(a, b)", "from __main__ import a, b, f1, f2", number = 1000)

对于不同的阈值,它给出了:

0.05 9086 9086
0.0815141201019
0.104746818542

0.5 2535 2535
0.0715141296387
0.153401851654

0.95 21 21
0.027126789093
0.0324990749359