Question

我有一个二进制原始数据文件in.dat，它存储4个int32值。

$ xxd in.dat 
00000000: 0100 0000 0200 0000 0300 0000 0400 0000  ................

我想将它们读入np.ndarray，乘以2，然后以与in.dat相同的原始二进制格式将它们写到stdout。预期的输出就像

$ xxd out.dat 
00000000: 0200 0000 0400 0000 0600 0000 0800 0000  ................

代码是这样的，

#!/usr/bin/env python3

import sys
import numpy as np

if __name__ == '__main__':
    y = np.fromfile(sys.stdin, dtype='int32')
    y *= 2
    sys.stdout.buffer.write(y.astype('int32').tobytes())
    exit(0)

我发现它可以与<一起使用，

$ python3 test.py <in.dat >out.dat

但是它不适用于管道|。错误消息到了。

$ cat in.dat | python3 test.py >out.dat
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    y = np.fromfile(sys.stdin, dtype='int32')
OSError: obtaining file position failed

我在这里想念什么？

Answer 1

这是因为在重定向文件时，stdin是可搜索的（例如，因为它不是TTY或管道，所以它只是被赋予FD 1的文件）。尝试使用cat foo.txt | python3 test.py和python3 test.py <foo.txt调用以下脚本（假设foo.txt包含一些文本）：

import sys

sys.stdin.seek(1)
print(sys.stdin.read())

前者将出现以下错误：

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    sys.stdin.seek(1)
io.UnsupportedOperation: underlying stream is not seekable

也就是说，numpy对于您在此处尝试执行的操作来说是一种过大的杀伤力。您可以通过几行和struct轻松实现这一点：

import struct
import sys

FORMAT = '@i'


def main():
    try:
        while True:
            num = struct.unpack(FORMAT, sys.stdin.buffer.read(struct.calcsize(FORMAT)))
            sys.stdout.buffer.write(struct.pack(FORMAT, num * 2))
    except EOFError:
        pass

if __name__ == '__main__':
    main()

编辑：也不需要sys.exit(0)。这是默认设置。

Answer 2

如果您使用np.frombuffer，则应该同时使用以下两种方式：

pipebytes.py

import numpy as np
import sys
print(np.frombuffer(sys.stdin.buffer.read(), dtype=np.int32))

现在，

Juans-MacBook-Pro:temp juan$ xxd testdata.dat
00000000: 0100 0000 0200 0000 0300 0000            ............
Juans-MacBook-Pro:temp juan$ python pipebytes.py < testdata.dat
[1 2 3]
Juans-MacBook-Pro:temp juan$ cat testdata.dat | python pipebytes.py
[1 2 3]
Juans-MacBook-Pro:temp juan$

尽管，我怀疑这会复制数据。

具有原始二进制数据的np.ndarray上的Python3管道I / O失败

2 个答案: