示例

Question

我有一个主文件和一组辅助文件，但是直到查找主文件后，我才知道辅助文件的名称。

主文件包含两列：一些数据和第二个文件名，例如

data1_from_master   hidden_file1
data2_from_master   hidden_file2
data3_from_master   hidden_file1
data4_from_master   hidden_file3
data5_from_master   hidden_file1

我想要做的是创建一个生成器，该生成器从主文件的第一列生成一个元素，然后从一个辅助文件中生成一行数据。例如，

data1_from_master    line1_from_file1
data2_from_master    line1_from_file2
data3_from_master    line2_from_file1
data4_from_master    line1_from_file3
data5_from_master    line3_from_file1

主文件中的行数等于所有辅助文件中的行数的总和，因此，一旦遍历了主文件，所有的辅助文件也将被遍历。

如果我只有两个要打开的文件，而且我事先知道了它们的名称，我可以做类似的事情。

with open(master_file, 'r') as a, open(hidden_file, 'r') as b:
    for line1, line2 in zip(a, b):
        yield (line1, line2)

但难题是，在读取主文件的给定行之前，我不知道要读取哪个辅助文件。然后，尝试构建一个包含多个不同文件的行的生成器会增加复杂性。

Answer 1

您要使用ExitStack。这是contextlib库提供的帮助程序类，用于组合上下文管理器。它可用于使多个文件在单个with语句中保持打开状态。

from contextlib import ExitStack

def iter_master_file(filename):
    with ExitStack() as stack:
        master = stack.enter_context(open(filename))
        hidden_files = {}

        for line in master:
            # You can parse the lines as you like
            # Here I just assume the last word is a file name
            *data, file = line.split()

            if file not in hidden_files:
                hidden_files[file] = stack.enter_context(open(file))

            yield ' '.join(data), next(hidden_files[file]).strip()

示例

为该示例设置一些文件。

文件

master.txt

master says hidden1.txt is: hidden1.txt
master says hidden2.txt is: hidden2.txt
master says hidden1.txt is: hidden1.txt
master says hidden2.txt is: hidden2.txt

hidden1.txt

I am hidden file 1 line 1
I am hidden file 1 line 2

hidden2.txt

I am hidden file 2 line 1
I am hidden file 2 line 2

这是实际示例。

代码

for data, hidden_data in iter_master_file('master.txt'):
    print(data, hidden_data)

输出

master says hidden1.txt is: I am hidden file 1 line 1
master says hidden2.txt is: I am hidden file 2 line 1
master says hidden1.txt is: I am hidden file 1 line 2
master says hidden2.txt is: I am hidden file 2 line 2

Answer 2

您可以保留打开文件的“缓存”，并在需要时调用fileobj.readline()：

def read_master_file(master):
    other_files = {}
    for line in master:
        data, name = line.split()
        if name not in otherfiles:
            other_files[name] = open(name)
        yield data, other_files[name].readline()
    for f in other_files.values():
        f.close()

用作：

with open('master') as master:
    for data, line in read_master_file(master):
        # do stuff

不幸的是，在这种情况下，您必须使用没有with的文件，因为您不知道必须处理多少个文件。

您可以编写一个自定义上下文管理器来保存“缓存”以实现以下目的：

def read_master_file(master):
    with OtherFiles() as other_files:
        for line in master:
            data, name = line.split()
            yield data, other_files.get_file(name).readline()

get_file将在其中查找缓存并可能打开文件，而__exit__的{{1}}方法将关闭打开的文件。

但是，如果这是唯一使用它的地方，那真的没有意义。

从多个文件行创建生成器

2 个答案:

示例

文件

master.txt

hidden1.txt

hidden2.txt

代码

输出