Question

这个小脚本读取文件，尝试将每一行与正则表达式匹配，并将匹配的行附加到另一个文件：

regex = re.compile(r"<http://dbtropes.org/resource/Film/.*?> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbtropes.org/resource/Main/.*?> \.")

with open("dbtropes-v2.nt", "a") as output, open("dbtropes.nt", "rb") as input:
    for line in input.readlines():
        if re.findall(regex,line):
            output.write(line)

input.close()
output.close()

然而，大约5分钟后剧本突然停止。终端说“＃34;进程停止＆＃34;”，输出文件保持空白。

可以在此处下载输入文件：http://dbtropes.org/static/dbtropes.zip 这是4.3Go n-triples文件。

我的代码有问题吗？还有别的吗？任何提示都会受到赞赏！

Answer 1

因为内存耗尽而停止了。 input.readlines()在返回行列表之前将整个文件读入内存。

相反，使用input作为迭代器。这一次只读取几行，并立即返回。

不要这样做：

for line in input.readlines():

这样做：

for line in input:

考虑到每个人的建议，您的计划将成为：

regex = re.compile(r"<http://dbtropes.org/resource/Film/.*?> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbtropes.org/resource/Main/.*?> \.")

with open("dbtropes.nt", "rb") as input:
    with open("dbtropes-v2.nt", "a") as output
        for line in input:
            if regex.search(line):
                output.write(line)

Answer 2

使用for line in input而不是readlines()来阻止它阅读整个文件。

一个小问题：如果您将文件作为上下文管理器打开，则不需要关闭文件。你可能会发现它像这样清洁：

with open("dbtropes-v2.nt", "a") as output
     with open("dbtropes.nt", "rb") as input:
          for line in input:
              if re.findall(regex,line):
                  output.write(line)

为什么这个写入文件的python脚本会突然停止？

2 个答案: