Question

我在bash脚本中运行python脚本时发现了raw_input / readline的一些奇怪行为。

简而言之，当一次将所有stdin（每个条目由一个新行分隔）传递给父脚本时，bash子脚本将仅采用他们需要的stdin，而python子脚本将使用所有的stdin，不为下一个孩子留下任何东西。我想出了一个简单的例子来证明我的意思：

父脚本（parent.sh）

#!/bin/bash ./child.sh ./child.sh ./child.py ./child.py

Bash子脚本（child.sh）

#!/bin/bash read -a INPUT echo "sh: got input: ${INPUT}"

Python子脚本（child.py）

#!/usr/bin/python -B import sys INPUT = raw_input() print "py: got input: {}".format(INPUT)

预期结果

./parent.sh <<< $'aa\nbb\ncc\ndd' >> sh: got input: aa >> sh: got input: bb >> py: got input: cc >> py: got input: dd

实际结果

./parent.sh <<< $'aa\nbb\ncc\ndd\n' >> sh: got input: aa >> sh: got input: bb >> py: got input: cc >> Traceback (most recent call last): >> File "./child.py", line 5, in <module> >> INPUT = raw_input() >> EOFError: EOF when reading a line

raw_input似乎清除了stdin中的所有剩余行。使用sys.stdin.readline而不是raw_input不会引发EOFError，但是收到的输入是空字符串，而不是预期的'dd'。

这里发生了什么？如何避免此行为，以便最后一个子脚本收到预期的输入？

编辑：为了确定，我在stdin中添加了几行，结果是一样的：

./parent.sh <<< $'aa\nbb\ncc\ndd\nff\nee\n' >> sh: got input: aa >> sh: got input: bb >> py: got input: cc >> Traceback (most recent call last): >> File "./child.py", line 5, in <module> >> INPUT = raw_input() >> EOFError: EOF when reading a line

Answer 1

这是演示同一问题的更简单方法：

printf "%s\n" foo bar | {
    head -n 1
    head -n 1
}

从各方面来看，这看起来应该打印两行，但bar神秘地丢失了。

这是因为阅读线是谎言。 UNIX编程模型不支持它。

相反，基本上所有工具的作用都是消耗整个缓冲区，分割出第一行，并留下剩下的缓冲区用于下一次调用。这适用于head，Python raw_input()，C fgets()，Java BufferedReader.readLine()以及其他所有内容。

由于UNIX将整个缓冲区计为消耗，无论程序实际最终使用多少，程序退出时都会丢弃其余的缓冲区。

然而，

bash可以解决它：它逐字节读取，直到它到达换行符。这是非常低效的，但它允许read仅从流中消耗一行，而其余的则留在下一个进程中。

你可以通过打开一个原始的，无缓冲的阅读器在Python中做同样的事情：

import sys
import os
f = os.fdopen(sys.stdin.fileno(), 'rb', 0)
line=f.readline()[:-1]
print "Python read: ", line

我们可以用同样的方式测试：

printf "%s\n" foo bar | {
    python myscript
    python myscript
}

打印

Python read: foo
Python read: bar

Answer 2

默认情况下，python解释器将缓冲标准输入。您可以使用-u选项禁用此行为，但效率较低。

parent.sh

/bin/bash

./child.sh
./child.sh
python -u child.py
python -u child.py

输出

./parent.sh <<< $'aa\nbb\ncc\ndd'
sh: got input: aa
sh: got input: bb
py: got input: cc 
py: got input: dd

Python子脚本消耗所有stdin

父脚本（parent.sh）

Bash子脚本（child.sh）

Python子脚本（child.py）

预期结果

实际结果

2 个答案:

parent.sh

输出