给定一个大小为n
且整数为m
的numpy数组,我想生成数组的所有顺序m
长度子序列,最好是二维数组。
实施例:
>>> subsequences(arange(10), 4)
array([[0, 1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6, 7],
[2, 3, 4, 5, 6, 7, 8],
[3, 4, 5, 6, 7, 8, 9]])
我能做到这一点的最好方法是
def subsequences(arr, m):
n = arr.size
# Create array of indices, essentially solution for "arange" input
indices = cumsum(vstack((arange(n - m + 1), ones((m-1, n - m + 1), int))), 0)
return arr[indices]
我缺少一个更好的,最好是内置的功能吗?
答案 0 :(得分:5)
from scipy.linalg import hankel
def subsequences(v, m):
return hankel(v[:m], v[m-1:])
答案 1 :(得分:4)
这是一种非常快速且内存效率高的方法,它只是一个"视图"进入原始数组:
from numpy.lib.stride_tricks import as_strided
def subsequences(arr, m):
n = arr.size - m + 1
s = arr.itemsize
return as_strided(arr, shape=(m,n), strides=(s,s))
如果需要写入此数组,则应首先创建np.copy
,否则您将修改原始数组以及"子序列中的相应条目"数组也是如此。
答案 2 :(得分:2)
你走在正确的轨道上。
您可以利用以下广播技巧,从两个1dim indices
s创建一个2dim arange
数组:
arr = arange(7)[::-1]
arr
=> array([6, 5, 4, 3, 2, 1, 0])
n = arr.size
m = 3
indices = arange(m) + arange(n-m+1).reshape(-1, 1) # broadcasting rulez
indices
=>
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
arr[indices]
=>
array([[6, 5, 4],
[5, 4, 3],
[4, 3, 2],
[3, 2, 1],
[2, 1, 0]])
答案 3 :(得分:0)
from itertools import tee, islice
import collections
import numpy as np
# adapted from https://docs.python.org/2/library/itertools.html
def consumed(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
return iterator
def subsequences(iterable, b):
return np.array([list(consumed(it, i))[:b] for i, it in enumerate(tee(iterable, len(iterable) - b + 1))]).T
print subsequences(np.arange(10), 4)
import numpy as np
def subsequences(iterable, b):
return np.array([iterable[i:i + b] for i in range(len(iterable) - b + 1)]).T
print subsequences(np.arange(10), 4)