Question

这个问题有两个问题：背景：我有2个大文件，文件1的每一行是“AATTGGCCAA”，文件2的每一行是“AATTTTCCAA”。每个文件有20,000行，我有一个python代码，我必须依次在每对行上运行。

首先，您将如何让python代码在每个文件的相同编号行上运行，例如两个文件的第1行？其次，在第1行等运行后，如何让文件在两个文件中向下移动到第2行？

Answer 1

文件对象是迭代器。您可以将它们传递给任何需要可迭代对象的函数，它将起作用。对于您的特定用例，您希望使用zip内置函数，该函数并行迭代多个对象，并生成每个可迭代的一个对象的元组。

with open(filename1) as file1, open(filename2) as file2:
    for line1, line2 in zip(file1, file2):
        do_something(line1, line2)

在Python 3中，zip是一个迭代器，所以这是有效的。如果你需要在Python 2中做同样的事情，你可能想要使用itertools.izip，因为常规zip会导致两个文件中的所有数据被读入预先列出。

Answer 2

文件对象是迭代器。您可以打开它们，然后在对象上调用.next（）以获取下一行。一个例子

For line in file1:
    other_line = file2.next()
    do_something(line, other_line)

Answer 3

以下代码使用两个Python功能：
1.发电机功能
2.文件对象视为迭代器

def get_line(file_path):
# Generator function
    with open(file_path) as file_obj:
        for line in file_obj:
            # Give one line and return control to the calling scope
            yield line

# Generator function will not be executed here
# Instead we get two generator instances
lines_a = get_line(path_to_file_a)
lines_b = get_line(path_to_file_b)
while True:
    try:
        # Now grab one line from each generator
        line_pair = (next(lines_a), next(lines_b))
    except StopIteration:
        # This exception means that we hit EOF in one of the files so exit the loop
        break
        do_something(line_pair)

假设您的代码包含在do_something(line_pair)函数中，该函数接受一个长度为2的元组来保存这对行。

Answer 4

这是允许您从多个文件同步处理行的代码：

from contextlib import ExitStack

with ExitStack() as stack:
     files = [stack.enter_context(open(filename)) for filename in filenames]
     for lines in zip(*files):
         do_something(*lines)

例如，对于2个文件，它为给定文件中的每对行调用do_something(line_from_file1, line_from_file2)。

让python程序逐行读取2个文件并在每一行上执行程序

4 个答案: