有效的方法来生成一维阵列的多个移位的二维阵列

时间:2017-01-07 09:16:58

标签: python pandas numpy

考虑数组intent.putExtra(EXTRA_MESSAGE,message); intent.putExtra(EXTRA_COLOR_CHOICE,colorChoice);

a

我想构建一个这个数组的增加移位版本的数组,看起来像这样。我想要一个通用的方法来做这个任意数量的移位小于原始数组的长度。在这种情况下,轮班次数import numpy as np import pandas as pd np.random.seed([3,1415]) a = np.random.randint(100, size=10) print(a) [11 98 74 90 15 55 13 11 13 26] 等于n

5

1 个答案:

答案 0 :(得分:3)

我想到了两种方法来做到这一点

使用发电机

def multi_shift(a, n):
    yield a
    while n > 1:
        a = np.append(np.nan, a[:-1])
        yield a
        n -= 1

np.stack(multi_shift(a, 5)).T

使用广播

构建切片
rng = np.arange(len(a))
slc = rng[:, None] - rng[:5]

np.where(slc >= 0, a[slc], np.nan)
[[ 11.  nan  nan  nan  nan]
 [ 98.  11.  nan  nan  nan]
 [ 74.  98.  11.  nan  nan]
 [ 90.  74.  98.  11.  nan]
 [ 15.  90.  74.  98.  11.]
 [ 55.  15.  90.  74.  98.]
 [ 13.  55.  15.  90.  74.]
 [ 11.  13.  55.  15.  90.]
 [ 13.  11.  13.  55.  15.]
 [ 26.  13.  11.  13.  55.]]

时间测试

Divakar's stride functions from this post

from scipy.linalg import toeplitz
from numpy.lib.stride_tricks import as_strided as strided 

def pir1(a, n):
    return np.stack(multi_shift(a, n)).T

def pir2(a, n):
    rng = np.arange(len(a))
    slc = rng[:, None] - rng[:5]

    return np.where(slc >= 0, a[slc], np.nan)

# Suggested by @WarrenWeckesser
def toeplitz1(a, n):
    return toeplitz(a, np.array([np.nan] * n))

# from @Divakar
def strided_nan_filled(a, W):
    a_ext = np.concatenate((np.full(W-1, np.nan), a))
    n = a_ext.strides[0]
    out = strided(a_ext, shape=(a.size, W), strides=(n, n))[:,::-1]
    return out

def strided_nan_filled_v2(a, W):
    a_ext = np.concatenate(( np.full(W-1,np.nan) ,a))
    n = a_ext.strides[0]
    return strided(a_ext[W-1:], shape=(a.size,W), strides=(n,-n))

试用

from timeit import timeit

cols = pd.MultiIndex.from_product(
    [['pir1', 'pir2', 'toeplitz1', 'stride'], [10, 100]])
results = pd.DataFrame(index=[100, 1000], columns=cols)

np.random.seed([3,1415])
for i in results.index:
    a = np.random.rand(i)
    for j in results.columns:
        stmt = '{}(a, {})'.format(*j)
        iprt = 'from __main__ import a, {}'.format(j[0])
        results.set_value(i, j, timeit(stmt, iprt, number=100))

results.stack().plot.barh()

enter image description here

放弃pri1toeplitz
显然那些花了太长时间 这看起来毫无疑问stride是可行的方法。

from timeit import timeit

cols = pd.MultiIndex.from_product(
    [['pir2', 'strided_nan_filled', 'strided_nan_filled_v2'], [10, 100]])
results = pd.DataFrame(index=[100, 1000, 10000], columns=cols)

np.random.seed([3,1415])
for i in results.index:
    a = np.random.rand(i)
    for j in results.columns:
        stmt = '{}(a, {})'.format(*j)
        iprt = 'from __main__ import a, {}'.format(j[0])
        results.set_value(i, j, timeit(stmt, iprt, number=100))

results.stack().plot.barh()

enter image description here