我是Python的初学者。我在这个问题上尝试了很多来自stackoverflow答案的方法,但它们都不适用于我的脚本 我有这个小脚本要使用,但是我无法获得.txt文件的巨大结果,所以我可以分析数据。如何将打印输出重定向到计算机上的txt文件?
from nltk.util import ngrams
import collections
with open("text.txt", "rU") as f:
sixgrams = ngrams(f.read().decode('utf8').split(), 2)
result = collections.Counter(sixgrams)
print result
for item, count in sorted(result.iteritems()):
if count >= 2:
print " ".join(item).encode('utf8'), count
答案 0 :(得分:5)
只需在命令行上执行:python script.py > text.txt
答案 1 :(得分:4)
print
statement in Python 2.x支持重定向(>> fileobj
):
...
with open('output.txt', 'w') as f:
print >>f, result
for item, count in sorted(result.iteritems()):
if count >= 2:
print >>f, " ".join(item).encode('utf8'), count
在Python 3.x中,print
function接受可选的关键字参数file
:
print("....", file=f)
如果你在Python 2.6+中做from __future__ import print_function
,即使在Python 2.x中也可以实现上述方法。
答案 2 :(得分:1)
使用BufferedWriter你可以这样做
os = io.BufferedWriter(io.FileIO(pathOut, "wb"))
os.write( result+"\n")
for item, count in sorted(result.iteritems()):
if count >= 2:
os.write(" ".join(item).encode('utf8')+ str(count)+"\n")
outs.flush()
outs.close()
答案 3 :(得分:0)
正如Antti所说,你应该更喜欢python3并且让所有这些烦人 你身后的python2垃圾。以下脚本适用于python2和python3。
要读取/写入文件,请使用io模块中的open
功能
python2 / python3兼容。 Allways使用with
语句打开像文件一样的资源。 with
用于包含Python Context Manager内块的执行。文件描述符具有上下文管理器实现,并将在离开with
块时自动关闭。
不依赖于python,如果你想读取文本文件,你应该知道
编码此文件以正确读取(如果您不确定尝试utf-8
第一)。此外,正确的UTF-8签名为utf-8
,模式为U
depricated。
#!/usr/bin/env python
# -*- coding: utf-8; mode: python -*-
from nltk.util import ngrams
import collections
import io, sys
def main(inFile, outFile):
with io.open(inFile, encoding="utf-8") as i:
sixgrams = ngrams(i.read().split(), 2)
result = collections.Counter(sixgrams)
templ = "%-10s %s\n"
with io.open(outFile, "w", encoding="utf-8") as o:
o.write(templ % (u"count", u"words"))
o.write(templ % (u"-" * 10, u"-" * 30))
# Sorting might be expensive. Before sort, filter items you don't want
# to handle, btw. place *count* in front of the tuple.
filtered = [ (c, w) for w, c in result.items() if c > 1]
filtered.sort(reverse=True)
for count, item in filtered:
o.write(templ % (count, " ".join(item)))
if __name__ == '__main__':
sys.exit(main("text.txt", "out_text.txt"))
使用输入text.txt
文件:
At eight o'clock on Thursday morning and Arthur didn't feel very good
he missed 100 € on Thursday morning. The Euro symbol of 100 € is here
to test the encoding of non ASCII characters, because encoding errors
do occur only on Thursday morning.
我得到以下output_text
:
count words
---------- ------------------------------
3 on Thursday
2 Thursday morning.
2 100 €