我可能需要帮助:
在一个大的布尔numpy数组(导入的栅格)(2000x2000)上,我尝试只选择大于800个单位的元素。 (总元素数> 1000)
我尝试了一个循环:
labeled_array, num_features = scipy.ndimage.label(my_np_array, structure = None, output = np.int)
print num_features
RasterYSize, RasterXSize = my_np_array.shape
big_zones = np.zeros((RasterYSize, RasterXSize), dtype=np.bool)
print "Iterating in progress"
# Improvement could be needed to reduce the number of loops
for i in range(1, num_features):
zone_array = (labeled_array == i)
zone = np.sum(zone_array)
if zone > 800:
big_zones += zone_array
但我确信有更好的方法可以做到这一点。
答案 0 :(得分:1)
这是一种基于对np.bincount
进行分箱并在big_zones += zone_array
与np.in1d
执行累积ORing的向量化方法 -
from scipy.ndimage import label
# Label with scipy
labeled_array, num_features = label(my_np_array, structure = None, output = np.int)
# Set the threshold
thresh = 800
# Get the binned counts with "np.bincount" and check against threshold
matches = np.bincount(labeled_array.ravel())>thresh
# Get the IDs corresponding to matches and get rid of the starting "0" and
# "num_features", as you won't have those in "range(1, num_features)" either
match_feat_ID = np.nonzero(matches)[0]
valid_match_feat_ID = np.setdiff1d(match_feat_ID,[0,num_features])
# Finally, use "np.in1d" to do ORing operation corresponding to the iterative
# "big_zones += zone_array" operation on the boolean array "big_zones".
# Since "np.in1d" works with 1D arrays only, reshape back to 2D shape
out = np.in1d(labeled_array,valid_match_feat_ID).reshape(labeled_array.shape)
运行时测试并验证输出
功能定义 -
def original_app(labeled_array,num_features,thresh):
big_zones = np.zeros((my_np_array.shape), dtype=np.bool)
for i in range(1, num_features):
zone_array = (labeled_array == i)
zone = np.sum(zone_array)
if zone > thresh:
big_zones += zone_array
return big_zones
def vectorized_app(labeled_array,num_features,thresh):
matches = np.bincount(labeled_array.ravel())>thresh
match_feat_ID = np.nonzero(matches)[0]
valid_match_feat_ID = np.setdiff1d(match_feat_ID,[0,num_features])
return np.in1d(labeled_array,valid_match_feat_ID).reshape(labeled_array.shape)
计时和输出验证 -
In [2]: # Inputs
...: my_np_array = np.random.rand(200,200)>0.5
...: labeled_array, num_features = label(my_np_array, structure = None, output = np.int)
...: thresh = 80
...:
In [3]: out1 = original_app(labeled_array,num_features,thresh)
In [4]: out2 = vectorized_app(labeled_array,num_features,thresh)
In [5]: np.allclose(out1,out2)
Out[5]: True
In [6]: %timeit original_app(labeled_array,num_features,thresh)
1 loops, best of 3: 407 ms per loop
In [7]: %timeit vectorized_app(labeled_array,num_features,thresh)
100 loops, best of 3: 2.5 ms per loop