Question

我正在寻找一种方法来计算numpy的累计和，但是如果累积总和非常大，请不要前滚该值（或将其设置为零）接近零和负面。

例如

a = np.asarray([0, 4999, -5000, 1000])
np.cumsum(a)

返回[0, 4999, -1, 999]

但是，我想在计算过程中将[2] - 值（-1）设置为零。问题是这个决定只能在计算过程中完成，因为中间结果不是先验的。

预期的数组是：[0, 4999, 0, 1000]

这样做的原因是我得到非常小的值（浮点数，而不是示例中的整数），这是由浮点计算引起的，实际上应该为零。计算累积和会使那些导致错误的值复杂化。

Answer 1

Kahan summation algorithm可以解决问题。不幸的是，它是not implemented in numpy。这意味着需要自定义实现：

def kahan_cumsum(x):
    x = np.asarray(x)
    cumulator = np.zeros_like(x)
    compensation = 0.0

    cumulator[0] = x[0]    
    for i in range(1, len(x)):
        y = x[i] - compensation
        t = cumulator[i - 1] + y
        compensation = (t - cumulator[i - 1]) - y
        cumulator[i] = t
    return cumulator

我不得不承认，这并不是问题所要求的。（在示例中，cumsum的第3个输出处的值为-1是正确的）。但是，我希望这能解决问题背后的实际问题，这与浮点精度有关。

Answer 2

我想知道舍入是否能满足您的要求：

np.cumsum(np.around(a,-1))
# the -1 means it rounds to the nearest 10

给出

array([   0, 5000,    0, 1000])

这与您从答案中放入预期数组的方式不完全一样，但使用around（可能将decimals参数设置为0）可能会在您将其应用于浮动问题时起作用

Answer 3

可能最好的方法是在Cython中写这个位（将文件命名为cumsum_eps.pyx）：

cimport numpy as cnp
import numpy as np

cdef inline _cumsum_eps_f4(float *A, int ndim, int dims[], float *out, float eps):
    cdef float sum
    cdef size_t ofs

    N = 1
    for i in xrange(0, ndim - 1):
        N *= dims[i]
    ofs = 0
    for i in xrange(0, N):
        sum = 0
        for k in xrange(0, dims[ndim-1]):
            sum += A[ofs]
            if abs(sum) < eps:
                sum = 0
            out[ofs] = sum
            ofs += 1

def cumsum_eps_f4(cnp.ndarray[cnp.float32_t, mode='c'] A, shape, float eps):
    cdef cnp.ndarray[cnp.float32_t] _out
    cdef cnp.ndarray[cnp.int_t] _shape
    N = np.prod(shape)
    out = np.zeros(N, dtype=np.float32)
    _out = <cnp.ndarray[cnp.float32_t]> out
    _shape = <cnp.ndarray[cnp.int_t]> np.array(shape, dtype=np.int)
    _cumsum_eps_f4(&A[0], len(shape), <int*> &_shape[0], &_out[0], eps)
    return out.reshape(shape)


def cumsum_eps(A, axis=None, eps=np.finfo('float').eps):
    A = np.array(A)
    if axis is None:
        A = np.ravel(A)
    else:
        axes = list(xrange(len(A.shape)))
        axes[axis], axes[-1] = axes[-1], axes[axis]
        A = np.transpose(A, axes)
    if A.dtype == np.float32:
        out = cumsum_eps_f4(np.ravel(np.ascontiguousarray(A)), A.shape, eps)
    else:
        raise ValueError('Unsupported dtype')
    if axis is not None: out = np.transpose(out, axes)
    return out

然后你可以像这样编译它（Windows，Visual C ++ 2008命令行）：

\Python27\Scripts\cython.exe cumsum_eps.pyx
cl /c cumsum_eps.c /IC:\Python27\include /IC:\Python27\Lib\site-packages\numpy\core\include
F:\Users\sadaszew\Downloads>link /dll cumsum_eps.obj C:\Python27\libs\python27.lib /OUT:cumsum_eps.pyd

或者像这样（Linux使用.so扩展名/ Cygwin使用.dll扩展名，gcc）：

cython cumsum_eps.pyx
gcc -c cumsum_eps.c -o cumsum_eps.o -I/usr/include/python2.7 -I/usr/lib/python2.7/site-packages/numpy/core/include
gcc -shared cumsum_eps.o -o cumsum_eps.so -lpython2.7

并使用如下：

from cumsum_eps import *
import numpy as np
x = np.array([[1,2,3,4], [5,6,7,8]], dtype=np.float32)

>>> print cumsum_eps(x)
[  1.   3.   6.  10.  15.  21.  28.  36.]
>>> print cumsum_eps(x, axis=0)
[[  1.   2.   3.   4.]
 [  6.   8.  10.  12.]]
>>> print cumsum_eps(x, axis=1)
[[  1.   3.   6.  10.]
 [  5.  11.  18.  26.]]
>>> print cumsum_eps(x, axis=0, eps=1)
[[  1.   2.   3.   4.]
 [  6.   8.  10.  12.]]
>>> print cumsum_eps(x, axis=0, eps=2)
[[  0.   2.   3.   4.]
 [  5.   8.  10.  12.]]
>>> print cumsum_eps(x, axis=0, eps=3)
[[  0.   0.   3.   4.]
 [  5.   6.  10.  12.]]
>>> print cumsum_eps(x, axis=0, eps=4)
[[  0.   0.   0.   4.]
 [  5.   6.   7.  12.]]
>>> print cumsum_eps(x, axis=0, eps=8)
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  8.]]
>>> print cumsum_eps(x, axis=1, eps=3)
[[  0.   0.   3.   7.]
 [  5.  11.  18.  26.]]

等等，当然通常eps会有一些小的值，这里的整数只是为了演示/打字的容易程度。

如果你需要这个也是双倍的_f8变体是很容易写的，另一个案例必须在cumsum_eps（）中处理。

当您对实施感到满意时，您应该将其作为setup.py的正确部分 - Cython setup.py

更新＃1：如果你在运行环境中有良好的编译器支持，你可以尝试[Theano] [3]来实现补偿算法或你原来的想法：

import numpy as np
import theano
import theano.tensor as T
from theano.ifelse import ifelse

A=T.vector('A')

sum=T.as_tensor_variable(np.asarray(0, dtype=np.float64))

res, upd=theano.scan(fn=lambda cur_sum, val: ifelse(T.lt(cur_sum+val, 1.0), np.asarray(0, dtype=np.float64), cur_sum+val), outputs_info=sum, sequences=A)

f=theano.function(inputs=[A], outputs=res)

f([0.9, 2, 3, 4])

将给出[0 2 3 4]输出。无论是Cython还是其中，你都可以获得至少+/-性能的本机代码。

条件numpy累积和

3 个答案: