我有一些代码用于根据2D圆形窗口中的相邻值计算图像中的缺失值。它还使用来自相同位置处的一个或多个时间相邻图像的值(即,在第三维中移位的相同2D窗口)。
对于缺少的每个位置,我需要不必根据整个窗口中的所有可用值来计算值,而只需要计算具有值的空间上最近的n个单元格(在两个图像/ Z轴中)位置),其中n是小于2D窗口中单元格总数的某个值。
此时,计算窗口中的所有内容要快得多,因为我的排序方法是使用数据获取最近的n个单元格是函数中最慢的部分,因为每次都必须重复即使窗口坐标方面的距离不变。我不确定这是否必要,并且觉得我必须能够获得一次排序的距离,然后在仅选择可用单元格的过程中屏蔽它们。
这是我的代码,用于选择要在间隙单元位置的窗口中使用的数据:
# radius will in reality be ~100
radius = 2
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
dist = np.sqrt(x**2 + y**2)
circle_template = dist > radius
# this will in reality be a very large 3 dimensional array
# representing daily images with some gaps, indicated by 0s
dataStack = np.zeros((2,5,5))
dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape)
dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape)
testdata = dataStack[1]
alternatedata = dataStack[0]
random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata
testdata[random_gap_locations] = 0
testdata[radius, radius] = 0
# in reality we will go through every gap (zero) location in the data
# for each image and for each gap use slicing to get a window of
# size (radius*2+1, radius*2+1) around it from each image, with the
# gap being at the centre i.e.
# testgaplocation = [radius, radius]
# and the variables testdata, alternatedata below will refer to these
# slices
locations_to_exclude = np.logical_or(circle_template, np.logical_or
(testdata==0, alternatedata==0))
# the places that are inside the circular mask and where both images
# have data
locations_to_include = ~locations_to_exclude
number_available = np.count_nonzero(locations_to_include)
# we only want to do the interpolation calculations from the nearest n
# locations that have data available, n will be ~100 in reality
number_required = 3
available_distances = dist[locations_to_include]
available_data = testdata[locations_to_include]
available_alternates = alternatedata[locations_to_include]
if number_available > number_required:
# In this case we need to find the closest number_required of elements, based
# on distances recorded in dist, from available_data and available_alternates
# Having to repeat this argsort for each gap cell calculation is slow and feels
# like it should be avoidable
sortedDistanceIndices = available_distances.argsort(kind = 'mergesort',axis=None)
requiredIndices = sortedDistanceIndices[0:number_required]
selected_data = np.take(available_data, requiredIndices)
selected_alternates = np.take(available_alternates , requiredIndices)
else:
# we just use available_data and available_alternates as they are...
# now do stuff with the selected data to calculate a value for the gap cell
这是有效的,但是功能总时间的一半以上是在屏蔽的空间距离数据的argsort中。 (总共1.4mS的〜900uS - 这个功能将运行数百亿次,所以这是一个重要的区别!)
我确信我必须能够在函数外部执行此argsort,当最初设置空间距离窗口时,然后在屏蔽中包含那些排序索引,以获得第一个howManyToCalculate索引而不具有重新排序。答案可能涉及将我们正在提取的各个位放入记录数组中 - 但我无法弄清楚如果是这样的话。任何人都可以看到我如何使这部分过程更有效率?
答案 0 :(得分:1)
所以你想在循环之外进行排序:
sorted_dist_idcs = dist.argsort(kind='mergesort', axis=None)
然后使用原始代码中的一些变量,这是我能想到的,虽然它仍然感觉像是一次重大的往返......
loc_to_incl_sorted = locations_to_include.take(sorted_dist_idcs)
sorted_dist_idcs_to_incl = sorted_dist_idcs[loc_to_incl_sorted]
required_idcs = sorted_dist_idcs_to_incl[:number_required]
selected_data = testdata.take(required_idcs)
selected_alternates = alternatedata.take(required_idcs)
请注意required_idcs
引用testdata
中的位置,而不是原始代码中的available_data
。这段代码我使用take
来方便索引扁平数组。
答案 1 :(得分:0)
@moarningsun - 感谢您的评论和回答。这些让我走上了正确的轨道,但是当差距<1时,对我来说不太合适。从数据边缘开始的半径:在这种情况下,我在间隙单元周围使用一个窗口,该窗口被“修剪”到数据边界。在这种情况下,索引反映“完整”窗口,因此不能用于从有界窗口中选择单元格。
不幸的是,当我澄清原始问题时,我编辑了我的代码的一部分,但事实证明它是相关的。
我现在已经意识到,如果你在argsort
的输出上再次使用argsort
,那么你就会获得排名;即整个数组排序时每个项目的位置。我们可以安全地屏蔽它们,然后获取它们中最小的number_required
(并在结构化数组上执行此操作以同时获取相应的数据)。
这意味着循环中有另一种排序,但实际上我们可以使用分区而不是完整排序,因为我们所需要的只是最小的num_required
项。如果num_required
远远小于数据项的数量,那么这比执行argsort
要快得多。
例如,num_required
= 80且num_available
= 15000,完整argsort
需要〜900μs而argpartition
后跟索引和切片以获得前80需要~110μs 。我们仍然需要argsort
来获得排名(而不仅仅是根据距离进行分区)以获得mergesort的稳定性,从而在距离不唯一时获得“正确的” 。
如下所示,我的代码现在在〜610uS上运行实际数据,包括此处未显示的实际计算。我现在对此感到满意,但似乎还有其他一些显然很小的因素可能对运行时产生影响,而这些因素很难理解。
例如,将circle_template
与dist
,ranks
以及此处未显示的其他字段一起放入结构化数组中,将整体功能的运行时间加倍(即使我们不知道)在循环中访问circle_template
!)。更糟糕的是,在np.partition
的结构化数组上使用order=['ranks']
会使整体函数运行时间增加几乎两个数量级与使用np.argpartition
相比,如下所示!
# radius will in reality be ~100
radius = 2
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
dist = np.sqrt(x**2 + y**2)
circle_template = dist > radius
ranks = dist.argsort(axis=None,kind='mergesort').argsort().reshape(dist.shape)
diam = radius * 2 + 1
# putting circle_template in this array too doubles overall function runtime!
fullWindowArray = np.zeros((diam,diam),dtype=[('ranks',ranks.dtype.str),
('thisdata',dayDataStack.dtype.str),
('alternatedata',dayDataStack.dtype.str),
('dist',spatialDist.dtype.str)])
fullWindowArray['ranks'] = ranks
fullWindowArray['dist'] = dist
# this will in reality be a very large 3 dimensional array
# representing daily images with some gaps, indicated by 0s
dataStack = np.zeros((2,5,5))
dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape)
dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape)
testdata = dataStack[1]
alternatedata = dataStack[0]
random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata
testdata[random_gap_locations] = 0
testdata[radius, radius] = 0
# in reality we will loop here to go through every gap (zero) location in the data
# for each image
gapz, gapy, gapx = 1, radius, radius
desLeft, desRight = gapx - radius, gapx + radius+1
desTop, desBottom = gapy - radius, gapy + radius+1
extentB, extentR = dataStack.shape[1:]
# handle the case where the gap is < search radius from the edge of
# the data. If this is the case, we can't use the full
# diam * diam window
dataL = max(0, desLeft)
maskL = 0 if desLeft >= 0 else abs(dataL - desLeft)
dataT = max(0, desTop)
maskT = 0 if desTop >= 0 else abs(dataT - desTop)
dataR = min(desRight, extentR)
maskR = diam if desRight <= extentR else diam - (desRight - extentR)
dataB = min(desBottom,extentB)
maskB = diam if desBottom <= extentB else diam - (desBottom - extentB)
# get the slice that we will be working within
# ranks, dist and circle are already populated
boundedWindowArray = fullWindowArray[maskT:maskB,maskL:maskR]
boundedWindowArray['alternatedata'] = alternatedata[dataT:dataB, dataL:dataR]
boundedWindowArray['thisdata'] = testdata[dataT:dataB, dataL:dataR]
locations_to_exclude = np.logical_or(boundedWindowArray['circle_template'],
np.logical_or
(boundedWindowArray['thisdata']==0,
boundedWindowArray['alternatedata']==0))
# the places that are inside the circular mask and where both images
# have data
locations_to_include = ~locations_to_exclude
number_available = np.count_nonzero(locations_to_include)
# we only want to do the interpolation calculations from the nearest n
# locations that have data available, n will be ~100 in reality
number_required = 3
data_to_use = boundedWindowArray[locations_to_include]
if number_available > number_required:
# argpartition seems to be v fast when number_required is
# substantially < data_to_use.size
# But partition on the structured array itself with order=['ranks']
# is almost 2 orders of magnitude slower!
reqIndices = np.argpartition(data_to_use['ranks'],number_required)[:number_required]
data_to_use = np.take(data_to_use,reqIndices)
else:
# we just use available_data and available_alternates as they are...
pass
# now do stuff with the selected data to calculate a value for the gap cell