Question

我有一个Python对象的numpy数组。我想将数组与python对象进行比较，我不希望与==运算符进行比较，但只需参考比较即可满足我的要求。

import numpy as np
a = np.array(["abc", "def"], dtype="object")
a == "abc"

我确信我的数组引用副本就足够了。假设我的数组中的所有字符串都被实习。

这主要是为了在比较数值时提高性能。 Python对象比较非常慢。

a是“abc”不会做我想要的，因为

In [1]: import numpy as np

In [2]: a = np.array(["abc", "def"], dtype="object")

In [3]: a == "abc"
Out[3]: array([ True, False], dtype=bool)

In [4]: a is "abc"
Out[4]: False

我想要a == "abc"的结果，但我不会将Python的__eq__方法用于相同的is运算符。

Answer 1

参考比较足以满足我的要求

要比较对象标识，请使用is代替==：

if a is b:
   ...

来自documentation：

运算符is和is not测试对象标识：x is y当且仅当x和y是同一对象时才为真。 x is not y产生反向真值。

编辑：要将is应用于数组的每个元素，您可以使用：

In [6]: map(lambda x:x is "abc", a)
Out[6]: [True, False]

或简单地说：

In [9]: [x is "abc" for x in a]
Out[9]: [True, False]

Answer 2

使用np.vectorize：

vector_is = np.vectorize(lambda x, y: x is y, otypes=[bool])

然后你有

>>> a = np.array(["abc", "def"], dtype="object")

>>> vector_is(a, "abc")
array([ True, False], dtype=bool)

不幸的是，我不知道你是否可以在这里使用operator.is_，因为我得到了

ValueError: failed to determine the number of arguments for <built-in function is_>

这似乎比列表理解稍微慢一些（可能是因为lambda调用），尽管它的优点是它所接受的参数更灵活：

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'vector_is(a, "abcd")'
10 loops, best of 3: 28.3 msec per loop

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' '[x is "abcd" for x in a]'
100 loops, best of 3: 20 msec per loop

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'np.fromiter((x is "abcd" for x in a), bool, len(a))'
10 loops, best of 3: 23.8 msec per loop

最后一种方法np.fromiter((x is "abcd" for x in a), bool, len(a))是从列表推导方法中获取numpy数组的一种方法。

不幸的是，所有这些都比使用==慢得多：

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'a == "abcd"'                                        
1000 loops, best of 3: 1.42 msec per loop

仅比较numpy中的对象引用

2 个答案: