单词列表中排名前10位的最常用单词长度

时间:2019-11-07 14:43:15

标签: python

我正在编写一个函数,该函数返回名为wordlist.txt的文件中前10个最常用的单词长度,该文件包含从a到z的所有单词。我编写了一个函数(名为“ value_length”),该函数返回某个列表中每个单词的长度的列表。我还在字典中应用了Counter模块(以单词的长度作为键,以这些长度的频率作为值)来解决该问题。

from collections import Counter

def value_length(seq):
    '''This function takes a sequence and returns a list that contains 
    the length of each element
    '''
    value_l = []
    for i in range(len(seq)):
        length = len(seq[i])
        value_l.append(length)
    print(value_l) 

# open the txt file 
fileobj = open("wordlist.txt", "r")
file_content = []

# create a list with length of every single word   
for line in fileobj:
    file_content.append(line)
    wordlist_lengths = value_length(file_content)

# create a dictionary that has the number of occurrence of each length as key
occurrence = {x:file_content.count(x) for x in file_content}
c = Counter(occurrence)
c.most_common(10)  

但是,每当我运行此代码时,都不会得到我想要的结果;我只能从value_length函数获得结果(即具有每个单词长度的极长列表)。换句话说,Python不会解释字典。我不明白我的错误是什么。

2 个答案:

答案 0 :(得分:0)

无需将长度存储在列表中,也无需使用列表的count方法;您已经导入了Counter,因此只需使用它即可进行计数。

c = Counter()
for word in seq:
    length = len(word)
    c[length] += 1

答案 1 :(得分:0)

此代码将找到每个列表项的长度并对它们进行排序。然后,您可以简单地从出现次数+列表中出现次数中得出一个元组:

words = ["Hi", "bye", "hello", "what", "no", "crazy", "why", "say", "imaginary"]

lengths = [len(w) for w in words]
print(lengths)
sortedLengths = sorted(lengths)
print(sortedLengths)

countedLengths = [(w, sortedLengths.count(w)) for w in sortedLengths]
print(countedLengths)

此打印:

[2, 3, 5, 4, 2, 5, 3, 3, 9]
[2, 2, 3, 3, 3, 4, 5, 5, 9]
[(2, 2), (2, 2), (3, 3), (3, 3), (3, 3), (4, 1), (5, 2), (5, 2), (9, 1)]