如何计算python中每个不同长度的单词?

时间:2015-10-23 07:24:08

标签: python

我必须为文本文件计算单词。

可能有字符串带有-,或以-开头。例如good-morning,但不会计算-

filename = input("Please input a file: ")     
openfile = codecs.open(filename,'r',encoding='utf8')

我只知道len()计算单词,我应该用什么样的commant来计算不同的单词长度并排除-

str=[]
for line in filename.readlines():
print('Word length')
str.append(filename)
len(str)
print(len)

我能得到正确的计数吗?

3 个答案:

答案 0 :(得分:0)

检查一下......你也可以在代码中理解这一点..

import re
randomText ="Enter your text"
randomText = randomText.replace('\n','')
wordFrequency = {}
randomText = randomText.split(' ')
for word in randomText:
    word = re.sub('[^A-Za-z0-9]+', '', word)
    currentWordLength = len(word)
    if(word):
        if  currentWordLength not in wordFrequency.keys():

            wordFrequency[currentWordLength]=1
        else:
            wordFrequency[currentWordLength]=      wordFrequency[currentWordLength]+1
for key in wordFrequency.keys():
    print("{0} --> {1}".format(key,wordFrequency[key]))

答案 1 :(得分:0)

当您要求提示时,您必须使用dict来跟踪长度。您可以使用setdefault方法提供dict,如果该密钥不存在,则会添加新密钥:

d = {}

for word in words:
    d.setdefault(len(word), 0)  # Create the key if it doesn't exist
                                # And set it to 0
    d[len(word)] += 1  # Add one word

你最终会得到一个dict,其中包含键中的单词长度和计数值,例如

{1: 123, 2: 232, 3: 175 ... }

如果您不想计算-,可以在计算之前将其删除:

clean_word = word.replace("-", "")  # Replace - with nothing

答案 2 :(得分:0)

疯狂,不可读,lambda版本:)

from collections import Counter
input = 'here are some words\nblah, blah, good-morning. -the end-'
Counter(map (len, filter(lambda c: c.isalpha() or c.isspace(), input).split()))

输出:

Counter({4: 4, 3: 3, 11: 1, 5: 1})