Question

假设我有一个100元素的numpy数组。我对这个数组的一个子集执行一些计算 - 可能有20个元素满足某些条件。然后我在这个子集中选择一个索引，如何（有效地）恢复第一个数组中的索引？我不想对a中的所有值执行计算，因为它很昂贵，所以我只想在需要的地方执行它（满足条件的地方）。

这是一些伪代码来证明我的意思（这里的'条件'是列表理解）：

a = np.arange(100)                                 # size = 100
b = some_function(a[[i for i in range(0,100,5)]])  # size = 20
Index = np.argmax(b)

# Index gives the index of the maximum value in b,
# but what I really want is the index of the element
# in a

修改

我不太清楚，所以我提供了一个更完整的例子。我希望这能让我更明确地了解自己的目标。我觉得有一些聪明而有效的方法可以做到这一点，没有一些循环或查找。

CODE：

import numpy as np

def some_function(arr):
   return arr*2.0

a = np.arange(100)*2.                              # size = 100
b = some_function(a[[i for i in range(0,100,5)]])  # size = 20
Index = np.argmax(b)

print Index
# Index gives the index of the maximum value in b, but what I really want is
# the index of the element in a

# In this specific case, Index will be 19.  So b[19] is the largest value
# in b.  Now, what I REALLY want is the index in a.  In this case, that would
# 95 because some_function(a[95]) is what made the largest value in b.
print b[Index]
print some_function(a[95])

# It is important to note that I do NOT want to change a.  I will perform
# several calculations on SOME values of a, then return the indices of 'a' where
# all calculations meet some condition.

Answer 1

我不确定我是否理解你的问题。所以，如果我错了，请纠正我。

假设您有类似

的内容

a = np.arange(100)
condition = (a % 5 == 0) & (a % 7 == 0)
b = a[condition]
index = np.argmax(b)
# The following should do what you want
a[condition][index]

或者如果你不想使用面具：

a = np.arange(100)
b_indices = np.where(a % 5 == 0)
b = a[b_indices]
index = np.argmax(b)
# Get the value of 'a' corresponding to 'index'
a[b_indices][index]

这是你想要的吗？

Answer 2

通常，在对数组进行任何更改之前，您将根据条件存储索引。您可以使用索引进行更改。

如果您的数组是a：

>>> a = np.random.random((10,5))
>>> a
array([[ 0.22481885,  0.80522855,  0.1081426 ,  0.42528799,  0.64471832],
       [ 0.28044374,  0.16202575,  0.4023426 ,  0.25480368,  0.87047212],
       [ 0.84764143,  0.30580141,  0.16324907,  0.20751965,  0.15903343],
       [ 0.55861168,  0.64368466,  0.67676172,  0.67871825,  0.01849056],
       [ 0.90980614,  0.95897292,  0.15649259,  0.39134528,  0.96317126],
       [ 0.20172827,  0.9815932 ,  0.85661944,  0.23273944,  0.86819205],
       [ 0.98363954,  0.00219531,  0.91348196,  0.38197302,  0.16002007],
       [ 0.48069675,  0.46057327,  0.67085243,  0.05212357,  0.44870942],
       [ 0.7031601 ,  0.50889065,  0.30199446,  0.8022497 ,  0.82347358],
       [ 0.57058441,  0.38748261,  0.76947605,  0.48145936,  0.26650583]])

b是您的子阵列：

>>> b = a[2:4,2:7]
>>> b
array([[ 0.16324907,  0.20751965,  0.15903343],
       [ 0.67676172,  0.67871825,  0.01849056]])

可以证明a仍然拥有b中的数据：

>>> b.base
array([[ 0.22481885,  0.80522855,  0.1081426 ,  0.42528799,  0.64471832],
       [ 0.28044374,  0.16202575,  0.4023426 ,  0.25480368,  0.87047212],
       [ 0.84764143,  0.30580141,  0.16324907,  0.20751965,  0.15903343],
       [ 0.55861168,  0.64368466,  0.67676172,  0.67871825,  0.01849056],
       [ 0.90980614,  0.95897292,  0.15649259,  0.39134528,  0.96317126],
       [ 0.20172827,  0.9815932 ,  0.85661944,  0.23273944,  0.86819205],
       [ 0.98363954,  0.00219531,  0.91348196,  0.38197302,  0.16002007],
       [ 0.48069675,  0.46057327,  0.67085243,  0.05212357,  0.44870942],
       [ 0.7031601 ,  0.50889065,  0.30199446,  0.8022497 ,  0.82347358],
       [ 0.57058441,  0.38748261,  0.76947605,  0.48145936,  0.26650583]])

您可以通过两种方式对a和b进行更改：

>>> b+=1
>>> b
array([[ 1.16324907,  1.20751965,  1.15903343],
       [ 1.67676172,  1.67871825,  1.01849056]])
>>> a
array([[ 0.22481885,  0.80522855,  0.1081426 ,  0.42528799,  0.64471832],
       [ 0.28044374,  0.16202575,  0.4023426 ,  0.25480368,  0.87047212],
       [ 0.84764143,  0.30580141,  1.16324907,  1.20751965,  1.15903343],
       [ 0.55861168,  0.64368466,  1.67676172,  1.67871825,  1.01849056],
       [ 0.90980614,  0.95897292,  0.15649259,  0.39134528,  0.96317126],
       [ 0.20172827,  0.9815932 ,  0.85661944,  0.23273944,  0.86819205],
       [ 0.98363954,  0.00219531,  0.91348196,  0.38197302,  0.16002007],
       [ 0.48069675,  0.46057327,  0.67085243,  0.05212357,  0.44870942],
       [ 0.7031601 ,  0.50889065,  0.30199446,  0.8022497 ,  0.82347358],
       [ 0.57058441,  0.38748261,  0.76947605,  0.48145936,  0.26650583]])

或者：

>>> a[2:4,2:7]+=1
>>> a
array([[ 0.22481885,  0.80522855,  0.1081426 ,  0.42528799,  0.64471832],
       [ 0.28044374,  0.16202575,  0.4023426 ,  0.25480368,  0.87047212],
       [ 0.84764143,  0.30580141,  1.16324907,  1.20751965,  1.15903343],
       [ 0.55861168,  0.64368466,  1.67676172,  1.67871825,  1.01849056],
       [ 0.90980614,  0.95897292,  0.15649259,  0.39134528,  0.96317126],
       [ 0.20172827,  0.9815932 ,  0.85661944,  0.23273944,  0.86819205],
       [ 0.98363954,  0.00219531,  0.91348196,  0.38197302,  0.16002007],
       [ 0.48069675,  0.46057327,  0.67085243,  0.05212357,  0.44870942],
       [ 0.7031601 ,  0.50889065,  0.30199446,  0.8022497 ,  0.82347358],
       [ 0.57058441,  0.38748261,  0.76947605,  0.48145936,  0.26650583]])
>>> b
array([[ 1.16324907,  1.20751965,  1.15903343],
       [ 1.67676172,  1.67871825,  1.01849056]])

两者都是等价的，两者都不比另一个贵。因此，只要保留从b创建a的索引，就可以始终查看基础数组中已更改的数据。通常，在对切片执行操作时甚至不需要创建子数组。

修改

这假定some_func返回子数组中某些条件为真的索引。

我认为当函数返回索引并且您只想将该函数作为子数组提供时，您仍然需要存储该子数组的索引并使用它们来获取基本数组索引。例如：

>>> def some_func(a): ... return np.where(a>.8) >>> a = np.random.random((10,4)) >>> a array([[ 0.94495378, 0.55532342, 0.70112911, 0.4385163 ], [ 0.12006191, 0.93091941, 0.85617421, 0.50429453], [ 0.46246102, 0.89810859, 0.31841396, 0.56627419], [ 0.79524739, 0.20768512, 0.39718061, 0.51593312], [ 0.08526902, 0.56109783, 0.00560285, 0.18993636], [ 0.77943988, 0.96168229, 0.10491335, 0.39681643], [ 0.15817781, 0.17227806, 0.17493879, 0.93961027], [ 0.05003535, 0.61873245, 0.55165992, 0.85543841], [ 0.93542227, 0.68104872, 0.84750821, 0.34979704], [ 0.06888627, 0.97947905, 0.08523711, 0.06184216]]) >>> i_off, j_off = 3,2 >>> b = a[i_off:,j_off:] #b >>> i = some_func(b) #indicies in b >>> i (array([3, 4, 5]), array([1, 1, 0])) >>> map(sum, zip(i,(i_off, j_off))) # indicies in a [array([6, 7, 8]), array([3, 3, 2])]

修改2

这假定some_func返回子数组b的修改副本。

你的例子看起来像这样：

import numpy as np def some_function(arr): return arr*2.0 a = np.arange(100)*2. # size = 100 idx = np.array(range(0,100,5)) b = some_function(a[idx]) # size = 20 b_idx = np.argmax(b) a_idx = idx[b_idx] # indices in a translated from indices in b print b_idx, a_idx print b[b_idx], a[a_idx] assert b[b_idx] == 2* a[a_idx] #true!

Answer 3

使用辅助数组a_index，它只是a元素的索引，因此a_index[3,5] = (3,5)。然后，您可以将原始索引设为a_index[condition == True][Index]。

如果您可以保证b是a的视图，则可以使用两个数组的memory layout信息来查找b和a的索引之间的转换。

Answer 4

这样的事情有用吗？

mask = S == 1
ind_local = np.argmax(X[mask])

G = np.ravel_multi_index(np.where(mask), mask.shape)
ind_global = np.unravel_index(G[ind_local], mask.shape)

return ind_global

这将返回argmax的全局索引。

Python / Numpy - 从子集获取索引到主数组

4 个答案: