使用python3显示(标记化的)文本的bigrams会导致管道错误

时间:2018-01-03 19:19:57

标签: python python-3.x error-handling

我的代码:

import sys

def byFreq(pair):
    return pair[1]

def main():

    bigrams = {}

    for line in sys.stdin:

        line = line.lower()
        words = line.split()

        for i in range (len(words)-1):

            bigram = (words[i],words[i+1])
            bigrams[bigram] = bigrams.get(bigram,0) + 1

    bigrams = list(bigrams.items())
    bigrams.sort(key=byFreq, reverse=True)

    for i in range(len(bigrams)):
        bg, count = bigrams[i]
        print("{0:<15}{1:<15}{2:>5}" .format(bg[0], bg[1], count))


if __name__ == "__main__":
    main()

我希望能够在命令行中使用我的python3文件,例如。 cat myfile.txt | python3 bigrams.py | head -5

执行我的文件会导致以下输出(使用MacOS终端):

van            de                25
in             de                14
aan            de                10
in             het                9
de             regering           9
Traceback (most recent call last):
  File "bigram.py", line 37, in <module>
    main()
  File "bigram.py", line 33, in main
    print("{0:<15}{1:<15}{2:>5}" .format(bg[0], bg[1], count))
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe

它确实打印了5行,但也出现了管道错误。这可以使用以下方法解决:

import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)

但是,这似乎不是摆脱错误的好方法。还有其他(更好)的方法吗?

此外,有没有更好的方法来将bigrams作为输出?

希望任何人都可以帮助我。

干杯,Thijmen。

1 个答案:

答案 0 :(得分:0)

您需要将stderr重定向到stdout,然后才能将head重新定位到顶行:

cat myfile.txt | python3 bigrams.py 2>&1 | head -5

我建议将输入文件名和(顶部)行的数量作为命令行参数传递给标准输出:

def main():
    bigrams = {}
    #pass input filename as the first argument
    ifilename = sys.argv[1]  
    lines = open(ifilename,"r").readlines()
    #pass number of lines to print as a second argument 
    show_top_n = int(sys.argv[2])

    for line in lines:
        line = line.lower()
        words = line.split()

        for i in range (len(words)-1):
            bigram = (words[i],words[i+1])
            bigrams[bigram] = bigrams.get(bigram,0) + 1

    bigrams = list(bigrams.items())
    bigrams.sort(key=byFreq, reverse=True)

    for i in range(show_top_n):
        bg, count = bigrams[i]
        print("{0:<15}{1:<15}{2:>5}" .format(bg[0], bg[1], count))

你会像这样启动它:

python bigrams.py myfile.txt 5