作为标题,我说我得到一个(n,2)numpy数组,记录一系列段的开始和结束索引,例如n = 6:
import numpy as np
# x records the (start, end) index pairs corresponding to six segments
x = np.array(([0,4], # the 1st seg ranges from index 0 ~ 4
[5,9], # the 2nd seg ranges from index 5 ~ 9, etc.
[10,13],
[15,20],
[23,30],
[31,40]))
现在我想将这些段组合在一起,它们之间的间隔很小。例如,如果间隔不大于1,则合并连续的段,因此所需的输出将为:
y = np.array([0,13], # Cuz the 1st seg's end is close to 2nd's start,
# and 2nd seg's end is close to 3rd's start, so are combined.
[15,20], # The 4th seg is away from the prior and posterior segs,
# so it remains untouched.
[23,40]) # The 5th and 6th segs are close, so are combined
这样输出段就会变成三个而不是六个。 任何建议将不胜感激!
答案 0 :(得分:2)
如果我们能够假设这些片段是有序的并且没有一个完全包含在邻居中,那么你可以通过识别一个范围内的结束值与下一个范围的开始之间的差距超过你的位置来实现这一点标准:
#include "..." search starts here:
#include <...> search starts here:
src
../src
src/essentia
../src/essentia
src/essentia/scheduler
../src/essentia/scheduler
src/essentia/streaming
../src/essentia/streaming
src/essentia/streaming/algorithms
../src/essentia/streaming/algorithms
src/essentia/utils
../src/essentia/utils
src/3rdparty
../src/3rdparty
src/3rdparty/spline
../src/3rdparty/spline
src/3rdparty/vamp-plugin-sdk-2.4
../src/3rdparty/vamp-plugin-sdk-2.4
/usr/include/taglib
/usr/include/qt4
/usr/include/qt4/QtCore
/usr/local/include/gaia2/
/usr/include/c++/6
/usr/include/x86_64-linux-gnu/c++/6
/usr/include/c++/6/backward
/usr/lib/gcc/x86_64-linux-gnu/6/include
/usr/local/include
/usr/lib/gcc/x86_64-linux-gnu/6/include-fixed
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.
然后将这些碎片拼接在一起:
start = x[1:, 0] # select columns, ignoring the beginning of the first range
end = x[:-1, 1] # and the end of the final range
mask = start>end+1 # identify where consecutive rows have too great a gap
答案 1 :(得分:2)
这是一个NumPy矢量化解决方案 -
def merge_boundaries(x):
mask = (x[1:,0] - x[:-1,1])!=1
idx = np.flatnonzero(mask)
start = np.r_[0,idx+1]
stop = np.r_[idx, x.shape[0]-1]
return np.c_[x[start,0], x[stop,1]]
示例运行 -
In [230]: x
Out[230]:
array([[ 0, 4],
[ 5, 9],
[10, 13],
[15, 20],
[23, 30],
[31, 40]])
In [231]: merge_boundaries(x)
Out[231]:
array([[ 0, 13],
[15, 20],
[23, 40]])