根据值的集合成员资格为numpy数组创建掩码

时间:2017-04-19 13:53:15

标签: python numpy indexing

我想为数组创建一个'mask'索引数组,基于该数组的元素是否是某些set的成员。我想要的是如下:

x = np.arange(20)
interesting_numbers = {1, 5, 7, 17, 18}
x_mask = np.array([xi in interesting_numbers for xi in x])

我想知道是否有更快的方法来执行最后一行。实际上,它通过重复调用__contains__方法在Python中构建一个列表,然后将该列表转换为numpy数组。

我想要x_mask = x[x in interesting_numbers]之类的内容,但这不是有效的语法。

2 个答案:

答案 0 :(得分:2)

您可以使用np.in1d

np.in1d(x, list(interesting_numbers))
#array([False,  True, False, False, False,  True, False,  True, False,
#       False, False, False, False, False, False, False, False,  True,
#        True, False], dtype=bool)

计时,如果数组x很大,它会更快:

x = np.arange(10000)
interesting_numbers = {1, 5, 7, 17, 18}

%timeit np.in1d(x, list(interesting_numbers))
# 10000 loops, best of 3: 41.1 µs per loop

%timeit x_mask = np.array([xi in interesting_numbers for xi in x])
# 1000 loops, best of 3: 1.44 ms per loop

答案 1 :(得分:1)

这是np.searchsorted -

的一种方法
# Setup inputs with random numbers that are not necessarily sorted
In [353]: x = np.random.choice(100000, 10000, replace=0)

In [354]: interesting_numbers = set(np.random.choice(100000, 1000, replace=0))

In [355]: x_mask = np.array([xi in interesting_numbers for xi in x])

# Verify output with set_membership
In [356]: np.allclose(x_mask, set_membership(x, interesting_numbers))
Out[356]: True

# @Psidom's solution
In [357]: %timeit np.in1d(x, list(interesting_numbers))
1000 loops, best of 3: 1.04 ms per loop

In [358]: %timeit set_membership(x, interesting_numbers)
1000 loops, best of 3: 682 µs per loop

运行时测试 -

aaa,123,bbb    
bbb,234,ccc    
ddd,456,avc    
eee,333,aaa