我想为数组创建一个'mask'索引数组,基于该数组的元素是否是某些set的成员。我想要的是如下:
x = np.arange(20)
interesting_numbers = {1, 5, 7, 17, 18}
x_mask = np.array([xi in interesting_numbers for xi in x])
我想知道是否有更快的方法来执行最后一行。实际上,它通过重复调用__contains__
方法在Python中构建一个列表,然后将该列表转换为numpy数组。
我想要x_mask = x[x in interesting_numbers]
之类的内容,但这不是有效的语法。
答案 0 :(得分:2)
您可以使用np.in1d
:
np.in1d(x, list(interesting_numbers))
#array([False, True, False, False, False, True, False, True, False,
# False, False, False, False, False, False, False, False, True,
# True, False], dtype=bool)
计时,如果数组x
很大,它会更快:
x = np.arange(10000)
interesting_numbers = {1, 5, 7, 17, 18}
%timeit np.in1d(x, list(interesting_numbers))
# 10000 loops, best of 3: 41.1 µs per loop
%timeit x_mask = np.array([xi in interesting_numbers for xi in x])
# 1000 loops, best of 3: 1.44 ms per loop
答案 1 :(得分:1)
这是np.searchsorted
-
# Setup inputs with random numbers that are not necessarily sorted
In [353]: x = np.random.choice(100000, 10000, replace=0)
In [354]: interesting_numbers = set(np.random.choice(100000, 1000, replace=0))
In [355]: x_mask = np.array([xi in interesting_numbers for xi in x])
# Verify output with set_membership
In [356]: np.allclose(x_mask, set_membership(x, interesting_numbers))
Out[356]: True
# @Psidom's solution
In [357]: %timeit np.in1d(x, list(interesting_numbers))
1000 loops, best of 3: 1.04 ms per loop
In [358]: %timeit set_membership(x, interesting_numbers)
1000 loops, best of 3: 682 µs per loop
运行时测试 -
aaa,123,bbb
bbb,234,ccc
ddd,456,avc
eee,333,aaa