Python / Numpy查找长度变量跨度

时间:2016-02-04 18:15:37

标签: python algorithm performance numpy scipy

考虑一个单调增长的形状(n,) 的numpy数组。

X = np.array([2,3,7,19,110,112,120,140,161])

我的问题是有效地提取每个范围(i,j),以便:

X[i:j].sum() >= v and X[i:j-1].sum() < v

我不确定这种形式化。换句话说,我需要“超过v的最小可能跨度”。我想另一种说法就是“所有跨越v而且不是另一个跨度的子集”。

到目前为止,我所做的最好的是基于两个嵌套的for循环:

def variable_length_spans(X, v):
    n, = X.shape
    for i in xrange(0, n):
        sum_ = 0
        for j in xrange(i, n):
            sum_ += X[j]
            if sum_ >= v:
                yield (i,j+1)
                break

给出了:

list(variable_length_spans(X,10))
[(0, 3), (1, 3), (2, 4), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9)]

这必须是一种更有效/更优雅的方式。然而,我可以找到方法。任何提案都将受到热烈赞赏!

F。

更新#1:时间

使用20K随机元素(结果平均超过10次运行):

  • variable_length_spans:0.009332秒
  • davis_spans: 0.009259 sec
  • spans_broadcast:1.896222 sec

使用1M随机元素(结果平均超过50次运行):

  • variable_length_spans: 0.528101 sec
  • davis_broadcast:0.534576 sec

2 个答案:

答案 0 :(得分:1)

目前这是一个二次算法,它可以在线性时间内完成,如下所示:

def spans(X, v):
    n, = X.shape
    i = 0
    total = 0
    for j in xrange(0, n):
        total += X[j]
        while total >= v:
            yield (i, j+1)
            total -= X[i]
            i += 1

答案 1 :(得分:1)

基于broadcasting -

的矢量化方法
# Get cumulative summations
cumsums = X.cumsum()

# Elementwise subtractions between cumsums & its one place shifted version
diffs = cumsums[:,None] - np.append(0,cumsums[:-1])

# Detect cumulative summation span check
mask = diffs >= v

# Get valid mask for later selection purpose
valid = mask.any(0)

# Get first trigger indices
max_idx = np.argmax(mask,0)+1

# Concatenate row indices alongwith trigger ones for final output
out = np.column_stack((np.arange(max_idx.size),max_idx))[valid]

示例输入,输出 -

In [212]: X
Out[212]: array([  2,   3,   7,  19, 110, 112, 120, 140, 161])

In [213]: v
Out[213]: 10

In [214]: out
Out[214]: 
array([[0, 3],
       [1, 3],
       [2, 4],
       [3, 4],
       [4, 5],
       [5, 6],
       [6, 7],
       [7, 8],
       [8, 9]])