Question

我花了相当多的时间试图让Linux差异和补丁工具在python中使用字符串。为了实现这一点，我尝试使用命名管道，因为它们似乎是最强大的方法。问题是这对大文件不起作用。

示例：

a, b = str1, str2 # ~1MB each string

fname1, fname2 = mkfifos(2)
proc = subprocess.Popen(['diff', fname1, fname2], \
                         stdout=subprocess.PIPE, stderr=subprocess.PIPE)

print('Writing first file.')
with open(fname1, 'w') as f1:
    f1.write(a)
print('Writing second file.')
with open(fname2, 'w') as f2:
    f2.write(b)

这在第一次写作时就会挂起。如果弄清楚如果我使用a[:6500]它会在第二次写入时挂起。所以我认为它与缓冲区有关。我尝试在每次写入，关闭之后手动刷新，使用低级os.open(f, 'r', 0)和0缓冲区，但仍然存在同样的问题。

我想过循环写入块，但是在像Python这样的高级语言中感觉不对。我有什么想法吗？

Answer 1

命名管道仍然是管道。它在Linux上有一个有限的缓冲区。除非有人同时从管道的另一端读取，否则您无法写入无限输出。

如果f1.write(a)阻止，则意味着diff不会立即读取所有输入文件（这似乎是合乎逻辑的：diff程序的目的是比较文件< em>逐行 - 从第一个文件读取的内容远远超过第二个文件的读数。）

要同时将不同的数据写入不同的地方，可以使用threads / async.io：

#!/usr/bin/env python3
from subprocess import Popen, PIPE
from threading import Thread

def write_input_async(path, text):
    def writelines():
        with open(path, 'w') as file:
            for line in text.splitlines(keepends=True):
                file.write(line)
    Thread(target=writelines, daemon=True).start()

with named_pipes(2) as paths, \
    Popen(['diff'] + paths, stdout=PIPE,stderr=PIPE, universal_newlines=True) as p:
    for path, text in zip(paths, [a, b]):
        write_input_async(path, text)
    output, errors = p.communicate()

其中named_pipes(n) context manager is defined here。

注意：除非你致电.communicate();只要任何stdout / stderr OS管道缓冲区填满，diff进程就会挂起。

你可以consider whether difflib.context_diff(a, b) would work in your case。

将字符串作为子进程传递＆＃39;通过Python中的多个命名管道输入

1 个答案: