Question

我正在使用matplotlib对神曲中的地狱的所有Cantos（章节）进行文学分析。我正在进行的大部分分析都是字数统计，特别是单词出现，我能够将给定的文本拆分成其组成单词的列表，然后将该列表转换为按最高字数排序的字典。当然字典不是有序结构，这是我遇到问题的地方。

import matplotlib.pyplot as plt
import numpy as np
from collections import Counter
def main():
    def read_files(foo):
        file1 = open(foo, 'r')
        # Reads the file
        read1 = file1.read()
        # This converts each character in the file to lowercase to facilitate
        # word analysis
        read1_lower = read1.lower()

         # This converts the read into a list, splitting it by spaces
        list1 = list(read1_lower.split(' '))

        # Uses the counter from collections to order the list in dictionary format
        x = Counter(list1)
        print x
        # Makes a bar plot using matplotlib  
        plt.bar(range(len(x)), x.values(), align='center')
        plt.xticks(range(len(x)), x.keys())

        plt.show()
        print 'Canto I:'
        read_files('CantoI.txt')
main()

这里以第一个Canto为例。当打印x时，它会生成一个很好的词典：

>>> 
Canto I:
"Counter({'the': 57, 'and': 43, 'i': 36, 'that': 31, 'to': 24, 'of': 20,     
'me': 19, 'thou': 19, 'was': 16, 'he': 16, 'so': 15, 'which': 14, 'a': 13, 
'with': 13, 'my': 11, 'her': 11,..."

依此类推。它仍然可以绘制这个计数器，但是当它被绘制时，每个单词都会被绘制出来，并且整个绘图的压缩程度太大而无法使用。

所以这是我的问题：为字典中的前十个单词生成条形图会更有用。有没有办法可以在plt.bar和plt.xticks的'range'部分指出，尽管字典没有固有的排序？

Answer 1

Python OrderedDict可能是您正在寻找的。

Python Matplotlib-按特定顺序绘制字典

1 个答案: