对列表中的滑动窗口对进行Pythonic迭代?

时间:2012-10-22 15:24:29

标签: python list iteration itertools list-manipulation

在滑动对中迭代列表的Pythonic最有效的方法是什么?这是一个相关的例子:

>>> l
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> for x, y in itertools.izip(l, l[1::2]): print x, y
... 
a b
b d
c f

这是成对迭代,但我们如何在滑动对上进行迭代?意味着迭代对:

a b
b c
c d
d e
etc.

是对的迭代,除了每次将1对元素滑动而不是2个元素。感谢。

7 个答案:

答案 0 :(得分:12)

你可以更简单。只需压缩列表,列表偏移一个。

In [4]: zip(l, l[1:])
Out[4]: [('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('e', 'f'), ('f', 'g')]

答案 1 :(得分:7)

怎么样:

for x, y in itertools.izip(l, l[1:]): print x, y

答案 2 :(得分:4)

这是一个小型的生成器,我为了类似的场景写了一段时间:

def pairs(items):
    items_iter = iter(items)
    prev = next(items_iter)

    for item in items_iter:
        yield prev, item
        prev = item

答案 3 :(得分:3)

这是一个适用于迭代器/生成器以及列表的任意大小的滑动窗口的函数

def sliding(seq, n):
  return izip(*starmap(islice, izip(tee(seq, n), count(0), repeat(None))))
但是,Nathan的解决方案可能更有效率。

答案 4 :(得分:1)

由于在列表中添加两个后续条目所定义的时间显示在下方,并从最快到最慢排序。

吉尔

In [69]: timeit.repeat("for x,y in itertools.izip(l, l[1::1]): x + y", setup=setup, number=1000)
Out[69]: [1.029047966003418, 0.996290922164917, 0.998831033706665]

Geoff Reedy

In [70]: timeit.repeat("for x,y in sliding(l,2): x+y", setup=setup, number=1000)
Out[70]: [1.2408790588378906, 1.2099130153656006, 1.207326889038086]

Alestanis

In [66]: timeit.repeat("for i in range(0, len(l)-1): l[i] + l[i+1]", setup=setup, number=1000)
Out[66]: [1.3387370109558105, 1.3243639469146729, 1.3245630264282227]

chmullig

In [68]: timeit.repeat("for x,y in zip(l, l[1:]): x+y", setup=setup, number=1000)
Out[68]: [1.4756009578704834, 1.4369518756866455, 1.5067830085754395]
Nathan Villaescusa

In [63]: timeit.repeat("for x,y in pairs(l): x+y", setup=setup, number=1000)
Out[63]: [2.254757881164551, 2.3750967979431152, 2.302199125289917]

sr2222

注意减少的重复次数......

In [60]: timeit.repeat("for x,y in SubsequenceIter(l,2): x+y", setup=setup, number=100)
Out[60]: [1.599524974822998, 1.5634570121765137, 1.608154058456421]

设置代码:

setup="""
from itertools import izip, starmap, islice, tee, count, repeat
l = range(10000)

def sliding(seq, n):
  return izip(*starmap(islice, izip(tee(seq, n), count(0), repeat(None))))

class SubsequenceIter(object):

    def __init__(self, iterable, subsequence_length):

        self.iterator = iter(iterable)
        self.subsequence_length = subsequence_length
        self.subsequence = [0]

    def __iter__(self):

        return self

    def next(self):

        self.subsequence.pop(0)
        while len(self.subsequence) < self.subsequence_length:
            self.subsequence.append(self.iterator.next())
        return self.subsequence

def pairs(items):
    items_iter = iter(items)
    prev = items_iter.next()

    for item in items_iter:
        yield (prev, item)
        prev = item
"""

答案 5 :(得分:0)

不是最有效,但非常灵活:

class SubsequenceIter(object):

    def __init__(self, iterable, subsequence_length):

        self.iterator = iter(iterable)
        self.subsequence_length = subsequence_length
        self.subsequence = [0]

    def __iter__(self):

        return self

    def next(self):

        self.subsequence.pop(0)
        while len(self.subsequence) < self.subsequence_length:
            self.subsequence.append(self.iterator.next())
        return self.subsequence

用法:

for x, y in SubsequenceIter(l, 2):
    print x, y

答案 6 :(得分:0)

无需导入,只要使用var[indexing]提供对象列表或字符串即可,这将起作用 经过python 3.6

测试
# This will create windows with all but 1 overlap
def ngrams_list(a_list, window_size=5, skip_step=1):
    return list(zip(*[a_list[i:] for i in range(0, window_size, skip_step)]))

循环本身以a_list为字母创建(显示为window = 5,OP会希望window=2

['ABCDEFGHIJKLMNOPQRSTUVWXYZ',
 'BCDEFGHIJKLMNOPQRSTUVWXYZ', 
 'CDEFGHIJKLMNOPQRSTUVWXYZ', 
 'DEFGHIJKLMNOPQRSTUVWXYZ',
 'EFGHIJKLMNOPQRSTUVWXYZ']

zip(*result_of_for_loop)将收集所有完整的垂直列作为结果。而且,如果您要重叠的部分少于全部,那么

# You can sample that output to get less overlap:
def sliding_windows_with_overlap(a_list, window_size=5, overlap=2):
    zip_output_as_list = ngrams_list(a_list, window_size)])
    return zip_output_as_list[::overlap+1]

使用overlap=2会跳过以BC开头的列,并选择D

[('A', 'B', 'C', 'D', 'E'),
 ('D', 'E', 'F', 'G', 'H'), 
 ('G', 'H', 'I', 'J', 'K'), 
 ('J', 'K', 'L', 'M', 'N'), 
 ('M', 'N', 'O', 'P', 'Q'), 
 ('P', 'Q', 'R', 'S', 'T'), 
 ('S', 'T', 'U', 'V', 'W'), 
 ('V', 'W', 'X', 'Y', 'Z')]

编辑:看起来就像@chmullig提供的选项一样