考虑一个单调增长的形状(n,)
的numpy数组。
X = np.array([2,3,7,19,110,112,120,140,161])
我的问题是有效地提取每个范围(i,j)
,以便:
X[i:j].sum() >= v and X[i:j-1].sum() < v
我不确定这种形式化。换句话说,我需要“超过v的最小可能跨度”。我想另一种说法就是“所有跨越v而且不是另一个跨度的子集”。
到目前为止,我所做的最好的是基于两个嵌套的for循环:
def variable_length_spans(X, v):
n, = X.shape
for i in xrange(0, n):
sum_ = 0
for j in xrange(i, n):
sum_ += X[j]
if sum_ >= v:
yield (i,j+1)
break
给出了:
list(variable_length_spans(X,10))
[(0, 3), (1, 3), (2, 4), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9)]
这必须是一种更有效/更优雅的方式。然而,我可以找到方法。任何提案都将受到热烈赞赏!
F。
使用20K随机元素(结果平均超过10次运行):
使用1M随机元素(结果平均超过50次运行):
答案 0 :(得分:1)
目前这是一个二次算法,它可以在线性时间内完成,如下所示:
def spans(X, v):
n, = X.shape
i = 0
total = 0
for j in xrange(0, n):
total += X[j]
while total >= v:
yield (i, j+1)
total -= X[i]
i += 1
答案 1 :(得分:1)
基于broadcasting
-
# Get cumulative summations
cumsums = X.cumsum()
# Elementwise subtractions between cumsums & its one place shifted version
diffs = cumsums[:,None] - np.append(0,cumsums[:-1])
# Detect cumulative summation span check
mask = diffs >= v
# Get valid mask for later selection purpose
valid = mask.any(0)
# Get first trigger indices
max_idx = np.argmax(mask,0)+1
# Concatenate row indices alongwith trigger ones for final output
out = np.column_stack((np.arange(max_idx.size),max_idx))[valid]
示例输入,输出 -
In [212]: X
Out[212]: array([ 2, 3, 7, 19, 110, 112, 120, 140, 161])
In [213]: v
Out[213]: 10
In [214]: out
Out[214]:
array([[0, 3],
[1, 3],
[2, 4],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9]])