按书本排序的单词(.txt文件)

时间:2015-11-02 21:49:53

标签: python frequency words word-frequency

我正在使用:

from collections import Counter
wordlist = open('mybook.txt','r').read().split()
c = Counter(wordlist)
print c

# result :
# Counter({'the': 9530, 'to': 5004, 'a': 4203, 'and': 4202, 'was': 4197, 'of': 3912, 'I': 2852, 'that': 2574, ... })

打印一本书的所有单词,按频率排序。

如何在.txt输出文件中写入此结果?

g = open('wordfreq.txt','w')
g.write(c)   # here it fails 

以下是所需的输出wordfreq.txt

  ,9530   到,5004
  a,5004
  并且,4203
  是,4197
  ...

3 个答案:

答案 0 :(得分:1)

如果你想以排序的方式写它,你可以这样做。

from collections import Counter
wordlist = open('so.py', 'r').read().split()
word_counts = Counter(wordlist)

write_file = open('wordfreq.txt', 'w')
for w, c in sorted(word_counts.iteritems(), key=lambda x: x[1], reverse=True):
    write_file.write('{w}, {c}\n'.format(w=w, c=c))

答案 1 :(得分:0)

我认为这可能是您需要的帮助:如何以您请求的格式打印字典。前四行是您的原始代码。

{{1}}

答案 2 :(得分:0)

我认为这可以更简单地完成。我还使用了一个上下文管理器(with)来自动关闭文件

from collections import Counter

with open('mybook.txt', 'r') as mybook:
    wordcounts = Counter(mybook.read().split())

with open('wordfreq.txt', 'w') as write_file:
    for item in word_counts.most_common():
        print('{}, {}'.format(*item), file=write_file)

如果文件特别大,您可以使用

一次性将其全部读入内存
    wordcounts = Counter(x for line in mybook for x in line.split())