Python:计算文件中出现一定长度的单词的频率

时间:2017-05-23 19:00:46

标签: python-3.x

我想创建一个打印文本统计信息的列表:

我到目前为止的最后一次尝试,我没有工作。

 f = open("input.txt", encoding="utf-8")  
 text = f.read()split()   

 words = []  
 one_l_words = []
 two_l_words = []

 for each in lines:
     words += each.split(" ")

 for each in words:
     if len(each) == 1:
         one_l_word.append(each)

 for each in words:
     if len(each) == 2:
         two_l_word.append(each)

 number_of_1lwords = len(one_l_words)
 number_of_2lwords = len(two_l_words) 


 print(one_l_words) 
 print(two_l_words)

第一个问题是,我的代码不能正常工作,但无论如何我认为我的代码很复杂。因为我想计算从长度为1到长度为30的单词,它应该是一个简单的程序。

基本上它应该是这样的列表:

length | How often a word of this length occures
2      12415

4 个答案:

答案 0 :(得分:1)

使用字典尝试以下内容:

f = open("airline.py")
words = f.read().split()
counts = {}
for i in words:
    if len(i) not in counts:
        counts[len(i)] = 1
    else:
        counts[len(i)]+=1

counts = sorted(counts.items(), key=lambda x:x[0]) #Converts to a list of tuples and sorts

print "length\t\tHow often a word of this length occurs"
for j in counts:
    print str(j[0])+"\t\t"+str(j[1])

示例输出:

Length  How often a word of this length occurs
1       21
2       7
3       32
4       4
5       11
6       11
7       5
8       13
9       8
10      14
11      10
12      5
13      12
14      9
15      5
17      3
18      6
19      1
20      1
21      3
22      1
27      1

答案 1 :(得分:0)

您可以使用以下词典:

dico = {}
for i in range(1 ,31): # just to init the dict and avoid checking if index exist...
    dico[i] = 0

with open("input.txt", encoding="utf-8") as f: # better to use in that way
    line = f.read()
    for word in line.split(" "):
        dico[len(word)] += 1   

print(dico)

我希望它有所帮助,

答案 2 :(得分:0)

在这种情况下,collections.defaultdict(int)非常合适:

import collections


def main():
    counts = collections.defaultdict(int)
    with open('input.txt', 'rt', encoding='utf-8') as file:
        for word in file.read().split():
            counts[len(word)] += 1
    print('length | How often a word of this length occurs')
    for i in sorted(counts.keys()):
        print('%-6d | %d' % (i, counts[i]))


if __name__ == '__main__':
    main()

答案 3 :(得分:-1)

#something like this might work
a = 'aAa@!@121'
d = {}
for i in a:
    d[i]=d.get(i,0)+1

print(d)

#d has characters as key and value will be the count of character present in the string