从1D NumPy数组创建NaN填充元素的滑动窗口

时间:2016-11-18 18:16:09

标签: performance numpy scipy time-series vectorization

我有一个时间序列x[0], x[1], ... x[n-1],存储为1维numpy数组。我想将其转换为以下矩阵:

NaN,        ... , NaN ,   x[0]
NaN,        ... , x[0],   x[1]
.
.
NaN,  x[0], ... , x[n-3],x[n-2]
x[0], x[1], ... , x[n-2],x[n-1]

我想使用这个矩阵来加速时间序列计算。 numpyscipy中是否有功能可以执行此操作? (我不想在python中使用for循环来执行此操作)

1 个答案:

答案 0 :(得分:3)

使用np.lib.stride_tricks.as_strided -

的一种方法
<section class="grid">
    <div class="item" style="background: url('img/1.png') center center no-repeat; background-size: cover;"></div>
    <div class="item" style="background: url('img/2.png') center center no-repeat; background-size: cover;"></div>
    <div class="item" style="background: url('img/4.png') center center no-repeat; background-size: cover;"></div>
    <div class="item" style="background: url('img/3.png') center center no-repeat; background-size: cover;"></div>
</section>

示例运行 -

def nanpad_sliding2D(a):
    L = a.size
    a_ext = np.concatenate(( np.full(a.size-1,np.nan) ,a))
    n = a_ext.strides[0]
    strided = np.lib.stride_tricks.as_strided     
    return strided(a_ext, shape=(L,L), strides=(n,n))

In [41]: a Out[41]: array([48, 82, 96, 34, 93, 25, 51, 26]) In [42]: nanpad_sliding2D(a) Out[42]: array([[ nan, nan, nan, nan, nan, nan, nan, 48.], [ nan, nan, nan, nan, nan, nan, 48., 82.], [ nan, nan, nan, nan, nan, 48., 82., 96.], [ nan, nan, nan, nan, 48., 82., 96., 34.], [ nan, nan, nan, 48., 82., 96., 34., 93.], [ nan, nan, 48., 82., 96., 34., 93., 25.], [ nan, 48., 82., 96., 34., 93., 25., 51.], [ 48., 82., 96., 34., 93., 25., 51., 26.]])

的内存效率

正如@Eric的评论中所提到的,这种基于步幅的方法将是一种记忆效率高的方法,因为输出只是strides NaNs-padded版本的视图。让我们测试一下 -

1D

让我们通过将值分配到In [158]: a # Sample 1D input Out[158]: array([37, 95, 87, 10, 35]) In [159]: L = a.size # Run the posted approach ...: a_ext = np.concatenate(( np.full(a.size-1,np.nan) ,a)) ...: n = a_ext.strides[0] ...: strided = np.lib.stride_tricks.as_strided ...: out = strided(a_ext, shape=(L,L), strides=(n,n)) ...: In [160]: np.may_share_memory(a_ext,out) O/p might be a view into extended version Out[160]: True 然后检查a_ext来确认输出实际上是一个视图。

outa_ext的初始值:

out

修改In [161]: a_ext Out[161]: array([ nan, nan, nan, nan, 37., 95., 87., 10., 35.]) In [162]: out Out[162]: array([[ nan, nan, nan, nan, 37.], [ nan, nan, nan, 37., 95.], [ nan, nan, 37., 95., 87.], [ nan, 37., 95., 87., 10.], [ 37., 95., 87., 10., 35.]])

a_ext

查看新的In [163]: a_ext[:] = 100

out

确认这是一种观点。

最后,让我们测试一下内存要求:

In [164]: out
Out[164]: 
array([[ 100.,  100.,  100.,  100.,  100.],
       [ 100.,  100.,  100.,  100.,  100.],
       [ 100.,  100.,  100.,  100.,  100.],
       [ 100.,  100.,  100.,  100.,  100.],
       [ 100.,  100.,  100.,  100.,  100.]])

因此,输出即使它显示为In [131]: a_ext.nbytes Out[131]: 72 In [132]: out.nbytes Out[132]: 200 个字节实际上只是200个字节,因为它是一个大小为72字节的扩展数组的视图。

使用Scipy's toeplitz -

的另一种方法
72