Question

我试图在pysam module的上下文中理解Python的迭代器。通过在所谓的AlignmentFile类上使用fetch方法，可以得到一个由文件iter中的记录组成的正确迭代器file。我可以使用各种方法来访问每个记录（可迭代），例如query_name的名称：

import pysam
iter = pysam.AlignmentFile(file, "rb", check_sq=False).fetch(until_eof=True)
for record in iter:
  print(record.query_name)

事实上，记录成对出现，以便人们想要像：

while True:
  r1 = iter.__next__() 
  r2 = iter.__next__()
  print(r1.query_name)     
  print(r2.query_name)

调用 next （）可能不是百万条记录的正确方法，但是如何使用for循环在迭代对中使用相同的迭代器。我查看了来自itertools的石斑鱼食谱和SO Iterate an iterator by chunks (of n) in Python? [duplicate]（甚至重复！）和What is the most “pythonic” way to iterate over a list in chunks?，但无法使其发挥作用。

Answer 1

首先，不要使用变量名iter，因为它已经是内置函数的名称。

要回答您的问题，只需在迭代器上使用itertools.izip（Python 2）或zip（Python 3）。

您的代码可能看起来像

一样简单

for next_1, next_2 in zip(iterator, iterator):
    # stuff

编辑：哎呀，我的原始答案一直都是正确的，不要介意迭代工具。

编辑2：如果处理可能产生不均匀对象数量的迭代器，请考虑itertools.izip_longest：

>>> from itertools import izip_longest
>>> iterator = (x for x in (1,2,3))
>>> 
>>> for next_1, next_2 in izip_longest(iterator, iterator):
...     next_1, next_2
... 
(1, 2)
(3, None)

Python消耗迭代器成对

1 个答案: