平滑numpy / pandas中的一系列加权值

时间:2017-09-15 01:04:42

标签: python pandas numpy scipy signal-processing

我有一个pandas DataFrame测量值和相应的权重:

df = pd.DataFrame({'x': np.random.randn(1000), 'w': np.random.rand(1000)})

我想在逐个元素的同时平滑测量值(x) 权重(w)。这与滑动窗口的重量无关, 我也想申请(例如三角形窗口或更高级的东西)。因此,要计算每个窗口内的平滑值,函数不仅应通过窗口函数(例如三角形)对x的切片元素进行加权,还应通过w中的相应元素加权。{/ p>

据我所知,pd.rolling_apply不会这样做,因为它适用于 给定xw上的函数。同样,pd.rolling_window也不会考虑源DataFrame的元素权重;加权窗口(例如“三角形”)可以是用户定义的,但是预先固定。

这是我的慢速实施:

def rolling_weighted_triangle(x, w, window_size):
    """Smooth with triangle window, also using per-element weights."""
    # Simplify slicing
    wing = window_size // 2

    # Pad both arrays with mirror-image values at edges
    xp = np.r_[x[wing-1::-1], x, x[:-wing-1:-1]]
    wp = np.r_[w[wing-1::-1], w, w[:-wing-1:-1]]

    # Generate a (triangular) window of weights to slide
    incr = 1. / (wing + 1)
    ramp = np.arange(incr, 1, incr)
    triangle = np.r_[ramp, 1.0, ramp[::-1]]

    # Apply both sets of weights over each window
    slices = (slice(i - wing, i + wing + 1) for i in xrange(wing, len(x) + wing))
    out = (np.average(xp[slc], weights=triangle * wp[slc]) for slc in slices)
    return np.fromiter(out, x.dtype)

如何使用numpy / scipy / pandas加快速度?

数据帧可以占用RAM的一个重要部分(10k到200M行),例如在前面分配每个元素的窗口权重的2D数组太多了。我正在尝试最小化临时数组的使用,也许正在使用 np.lib.stride_tricks.as_stridednp.apply_along_axisnp.convolve,但没有找到完全复制上述内容的任何内容。

这是等效的统一窗口,而不是三角形(使用get_sliding_window trick from here) - 关闭但不完全在那里:

def get_sliding_window(a, width):
    """Sliding window over a 2D array.

    Source: https://stackoverflow.com/questions/37447347/dataframe-representation-of-a-rolling-window/41406783#41406783
    """
    # NB: a = df.values or np.vstack([x, y]).T
    s0, s1 = a.strides
    m, n = a.shape
    return as_strided(a,
                     shape=(m-width+1, width, n),
                     strides=(s0, s0, s1))


def rolling_weighted_average(x, w, window_size):
    """Rolling weighted average with a uniform 'boxcar' window."""
    wing = window_size // 2
    window_size = 2 * wing + 1
    xp = np.r_[x[wing-1::-1], x, x[:-wing-1:-1]]
    wp = np.r_[w[wing-1::-1], w, w[:-wing-1:-1]]
    x_w = np.vstack([xp, wp]).T
    wins = get_sliding_window(x_w, window_size)
    # TODO - apply triangle window weights - multiply over wins[,:,1]?
    result = np.average(wins[:,:,0], axis=1, weights=wins[:,:,1])
    return result

1 个答案:

答案 0 :(得分:1)

你可以在那里使用卷积,就像这样 -

def rolling_weighted_triangle_conv(x, w, window_size):
    """Smooth with triangle window, also using per-element weights."""
    # Simplify slicing
    wing = window_size // 2

    # Pad both arrays with mirror-image values at edges
    xp = np.concatenate(( x[wing-1::-1], x, x[:-wing-1:-1] ))
    wp = np.concatenate(( w[wing-1::-1], w, w[:-wing-1:-1] ))

    # Generate a (triangular) window of weights to slide
    incr = 1. / (wing + 1)
    ramp = np.arange(incr, 1, incr)
    triangle = np.r_[ramp, 1.0, ramp[::-1]]

    D = np.convolve(wp*xp, triangle)[window_size-1:-window_size+1]
    N = np.convolve(wp, triangle)[window_size-1:-window_size+1]    
    return D/N

运行时测试

In [265]: x = np.random.randn(1000)
     ...: w = np.random.rand(1000)
     ...: WSZ = 7
     ...: 

In [266]: out1 = rolling_weighted_triangle(x, w, window_size=WSZ)
     ...: out2 = rolling_weighted_triangle_conv(x, w, window_size=WSZ)
     ...: print(np.allclose(out1, out2))
     ...: 
True

In [267]: %timeit rolling_weighted_triangle(x, w, window_size=WSZ)
     ...: %timeit rolling_weighted_triangle_conv(x, w, window_size=WSZ)
     ...: 
100 loops, best of 3: 10.2 ms per loop
10000 loops, best of 3: 32.9 µs per loop

300x+ 加速!