我似乎还在与the "in" operator in numpy斗争。情况如下:
>>> a = np.random.randint(1, 10, (2, 2, 3))
>>> a
array([[[9, 8, 8],
[4, 9, 1]],
[[6, 6, 3],
[9, 3, 5]]])
我想获得第二个元素在(6, 8)
中的三元组的索引。我直观地尝试的方式是:
>>> a[:, :, 1] in (6, 8)
ValueError: The truth value of an array with more than one element...
我的最终目标是在这些位置插入前面乘以2的数字。使用上面的示例,a
应该变为:
array([[[9, 18, 8], #8 @ pos #2 --> replaced by 9 @ pos #1 by 2
[4, 9, 1]],
[[6, 12, 3], #6 @ pos #2 --> replaced by 6 @ pos #1 by 2
[9, 3, 5]]])
提前感谢您的建议和时间!
答案 0 :(得分:2)
这是一个适用于任意长度元组的方法。它使用numpy.in1d
函数。
import numpy as np
np.random.seed(1)
a = np.random.randint(1, 10, (2, 2, 3))
print(a)
check_tuple = (6, 9, 1)
bool_array = np.in1d(a[:,:,1], check_tuple)
ind = np.where(bool_array)[0]
a0 = a[:,:,0].reshape((len(bool_array), ))
a1 = a[:,:,1].reshape((len(bool_array), ))
a1[ind] = a0[ind] * 2
print(a)
输出:
[[[6 9 6]
[1 1 2]]
[[8 7 3]
[5 6 3]]]
[[[ 6 12 6]
[ 1 2 2]]
[[ 8 7 3]
[ 5 10 3]]]
答案 1 :(得分:1)
import numpy as np
a = np.array([[[9, 8, 8],
[4, 9, 1]],
[[6, 6, 3],
[9, 3, 5]]])
ind=(a[:,:,1]<=8) & (a[:,:,1]>=6)
a[ind,1]=a[ind,0]*2
print(a)
产量
[[[ 9 18 8]
[ 4 9 1]]
[[ 6 12 3]
[ 9 3 5]]]
如果你想检查一个不是简单范围的集合中的成员资格,那么我喜欢使用Python循环的mac's idea和使用np.in1d的bellamyj's idea。哪个更快取决于check_tuple
:
<强> test.py:强>
import numpy as np
np.random.seed(1)
N = 10
a = np.random.randint(1, 1000, (2, 2, 3))
check_tuple = np.random.randint(1, 1000, N)
def using_in1d(a):
idx = np.in1d(a[:,:,1], check_tuple)
idx=idx.reshape(a[:,:,1].shape)
a[idx,1] = a[idx,0] * 2
return a
def using_in(a):
idx = np.zeros(a[:,:,0].shape,dtype=bool)
for n in check_tuple:
idx |= a[:,:,1]==n
a[idx,1] = a[idx,0]*2
return a
assert np.allclose(using_in1d(a),using_in(a))
当N = 10时,using_in
稍快一些:
% python -m timeit -s'import test' 'test.using_in1d(test.a)'
10000 loops, best of 3: 156 usec per loop
% python -m timeit -s'import test' 'test.using_in(test.a)'
10000 loops, best of 3: 143 usec per loop
当N = 100时,using_in1d
要快得多:
% python -m timeit -s'import test' 'test.using_in1d(test.a)'
10000 loops, best of 3: 171 usec per loop
% python -m timeit -s'import test' 'test.using_in(test.a)'
1000 loops, best of 3: 1.15 msec per loop
答案 2 :(得分:1)
还有另一种基于使用查找表的方法,我从Cellprofiler的一个开发人员那里学到了这个方法。 首先,您需要创建一个查找表(LUT),其大小与数组中的最大数字相同。对于每个可能的数组值,LUT具有True或false值。 例如:
# create a large volume image with random numbers
a = np.random.randint(1, 1000, (50, 1000 , 1000))
labels_to_find=np.unique(np.random.randint(1,1000,500))
# create filter mask LUT
def find_mask_LUT(inputarr, obs):
keep = np.zeros(np.max(inputarr)+1, bool)
keep[np.array(obs)] = True
return keep[inputarr]
# This will return a mask that is the
# same shape as a, with True is a is one of the
# labels we look for, False otherwise
find_mask_LUT(a, labels_to_find)
这非常快(比np.in1d快得多,而且速度不依赖于对象的数量。)
答案 3 :(得分:0)
受unutbu's answer的启发,我找到了这个可能的解决方案:
>>> l = (8, 6)
>>> idx = np.zeros((2, 2), dtype=bool)
>>> for n in l:
... idx |= a[:,:,1] == n
>>> idx
array([[ True, False],
[ True, False]], dtype=bool)
>>> a[idx]
array([[9, 8, 8],
[6, 6, 3]])
但是,它需要事先知道要调查的数组的尺寸。