我必须为文本文件计算单词。
可能有字符串带有-
,或以-
开头。例如good-morning
,但不会计算-
。
filename = input("Please input a file: ")
openfile = codecs.open(filename,'r',encoding='utf8')
我只知道len()
计算单词,我应该用什么样的commant来计算不同的单词长度并排除-
?
str=[]
for line in filename.readlines():
print('Word length')
str.append(filename)
len(str)
print(len)
我能得到正确的计数吗?
答案 0 :(得分:0)
检查一下......你也可以在代码中理解这一点..
import re
randomText ="Enter your text"
randomText = randomText.replace('\n','')
wordFrequency = {}
randomText = randomText.split(' ')
for word in randomText:
word = re.sub('[^A-Za-z0-9]+', '', word)
currentWordLength = len(word)
if(word):
if currentWordLength not in wordFrequency.keys():
wordFrequency[currentWordLength]=1
else:
wordFrequency[currentWordLength]= wordFrequency[currentWordLength]+1
for key in wordFrequency.keys():
print("{0} --> {1}".format(key,wordFrequency[key]))
答案 1 :(得分:0)
当您要求提示时,您必须使用dict
来跟踪长度。您可以使用setdefault
方法提供dict,如果该密钥不存在,则会添加新密钥:
d = {}
for word in words:
d.setdefault(len(word), 0) # Create the key if it doesn't exist
# And set it to 0
d[len(word)] += 1 # Add one word
你最终会得到一个dict,其中包含键中的单词长度和计数值,例如
{1: 123, 2: 232, 3: 175 ... }
如果您不想计算-
,可以在计算之前将其删除:
clean_word = word.replace("-", "") # Replace - with nothing
答案 2 :(得分:0)
疯狂,不可读,lambda版本:)
from collections import Counter
input = 'here are some words\nblah, blah, good-morning. -the end-'
Counter(map (len, filter(lambda c: c.isalpha() or c.isspace(), input).split()))
输出:
Counter({4: 4, 3: 3, 11: 1, 5: 1})