在Python中将打印输出重定向到.txt文件

时间:2016-03-23 08:05:25

标签: python parsing text

我是Python的初学者。我在这个问题上尝试了很多来自stackoverflow答案的方法,但它们都不适用于我的脚本 我有这个小脚本要使用,但是我无法获得.txt文件的巨大结果,所以我可以分析数据。如何将打印输出重定向到计算机上的txt文件?

from nltk.util import ngrams
import collections

with open("text.txt", "rU") as f:
    sixgrams = ngrams(f.read().decode('utf8').split(), 2)

result = collections.Counter(sixgrams)
print result
for item, count in sorted(result.iteritems()):
    if count >= 2:
        print " ".join(item).encode('utf8'), count

4 个答案:

答案 0 :(得分:5)

只需在命令行上执行:python script.py > text.txt

答案 1 :(得分:4)

print statement in Python 2.x支持重定向(>> fileobj):

...
with open('output.txt', 'w') as f:
    print >>f, result
    for item, count in sorted(result.iteritems()):
        if count >= 2:
            print >>f, " ".join(item).encode('utf8'), count

在Python 3.x中,print function接受可选的关键字参数file

print("....", file=f)

如果你在Python 2.6+中做from __future__ import print_function,即使在Python 2.x中也可以实现上述方法。

答案 2 :(得分:1)

使用BufferedWriter你可以这样做

os = io.BufferedWriter(io.FileIO(pathOut, "wb"))
os.write( result+"\n")
for item, count in sorted(result.iteritems()):
     if count >= 2:
     os.write(" ".join(item).encode('utf8')+ str(count)+"\n")

outs.flush()
outs.close()

答案 3 :(得分:0)

正如Antti所说,你应该更喜欢python3并且让所有这些烦人 你身后的python2垃圾。以下脚本适用于python2和python3。

要读取/写入文件,请使用io模块中的open功能 python2 / python3兼容。 Allways使用with语句打开像文件一样的资源。 with用于包含Python Context Manager内块的执行。文件描述符具有上下文管理器实现,并将在离开with块时自动关闭。

不依赖于python,如果你想读取文本文件,你应该知道 编码此文件以正确读取(如果您不确定尝试utf-8 第一)。此外,正确的UTF-8签名为utf-8,模式为U depricated。

#!/usr/bin/env python
# -*- coding: utf-8; mode: python -*-

from nltk.util import ngrams
import collections
import io, sys

def main(inFile, outFile):

    with io.open(inFile, encoding="utf-8") as i:
        sixgrams = ngrams(i.read().split(), 2)

    result = collections.Counter(sixgrams)
    templ = "%-10s %s\n"

    with io.open(outFile, "w", encoding="utf-8") as o:

        o.write(templ %  (u"count",  u"words"))
        o.write(templ %  (u"-" * 10, u"-" * 30))

        # Sorting might be expensive. Before sort, filter items you don't want
        # to handle, btw. place *count* in front of the tuple.

        filtered = [ (c, w) for w, c in result.items() if c > 1]
        filtered.sort(reverse=True)

        for count, item in filtered:
            o.write(templ % (count, " ".join(item)))

if __name__ == '__main__':
    sys.exit(main("text.txt", "out_text.txt"))

使用输入text.txt文件:

At eight o'clock on Thursday morning and Arthur didn't feel very good 
he missed 100 € on Thursday morning. The Euro symbol of 100 € is here
to test the encoding of non ASCII characters, because encoding errors
do occur only on Thursday morning.

我得到以下output_text

count      words
---------- ------------------------------
3          on Thursday
2          Thursday morning.
2          100 €