我的代码:
import sys
def byFreq(pair):
return pair[1]
def main():
bigrams = {}
for line in sys.stdin:
line = line.lower()
words = line.split()
for i in range (len(words)-1):
bigram = (words[i],words[i+1])
bigrams[bigram] = bigrams.get(bigram,0) + 1
bigrams = list(bigrams.items())
bigrams.sort(key=byFreq, reverse=True)
for i in range(len(bigrams)):
bg, count = bigrams[i]
print("{0:<15}{1:<15}{2:>5}" .format(bg[0], bg[1], count))
if __name__ == "__main__":
main()
我希望能够在命令行中使用我的python3文件,例如。 cat myfile.txt | python3 bigrams.py | head -5
执行我的文件会导致以下输出(使用MacOS终端):
van de 25
in de 14
aan de 10
in het 9
de regering 9
Traceback (most recent call last):
File "bigram.py", line 37, in <module>
main()
File "bigram.py", line 33, in main
print("{0:<15}{1:<15}{2:>5}" .format(bg[0], bg[1], count))
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe
它确实打印了5行,但也出现了管道错误。这可以使用以下方法解决:
import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
但是,这似乎不是摆脱错误的好方法。还有其他(更好)的方法吗?
此外,有没有更好的方法来将bigrams作为输出?
希望任何人都可以帮助我。
干杯,Thijmen。
答案 0 :(得分:0)
您需要将stderr
重定向到stdout
,然后才能将head
重新定位到顶行:
cat myfile.txt | python3 bigrams.py 2>&1 | head -5
我建议将输入文件名和(顶部)行的数量作为命令行参数传递给标准输出:
def main():
bigrams = {}
#pass input filename as the first argument
ifilename = sys.argv[1]
lines = open(ifilename,"r").readlines()
#pass number of lines to print as a second argument
show_top_n = int(sys.argv[2])
for line in lines:
line = line.lower()
words = line.split()
for i in range (len(words)-1):
bigram = (words[i],words[i+1])
bigrams[bigram] = bigrams.get(bigram,0) + 1
bigrams = list(bigrams.items())
bigrams.sort(key=byFreq, reverse=True)
for i in range(show_top_n):
bg, count = bigrams[i]
print("{0:<15}{1:<15}{2:>5}" .format(bg[0], bg[1], count))
你会像这样启动它:
python bigrams.py myfile.txt 5