我试图让这个python代码摆脱与单词相关的标点符号并计算独特的单词。出于某种原因,它仍然在计算“你好”。和“你好”。非常感激任何的帮助。
def word_distribution(words):
word_dict = {}
words = words.lower()
words = words.split()
for word in words:
if ord('a') <= ord(word[-1]) <= ord('z'):
pass
elif ord('A') <= ord(word[-1]) <= ord('Z'):
pass
else:
word[:-1]
word_dict = {word:words.count(word)+1 for word in set(words)}
return(word_dict)
答案 0 :(得分:1)
我不知道你为什么要加1来计算。
def word_distribution(words):
word_dict = {}
words = words.lower().split()
for word in words:
if ord('a') <= ord(word[-1]) <= ord('z'):
pass
elif ord('A') <= ord(word[-1]) <= ord('Z'):
pass
word_dict = {word:words.count(word) for word in set(words)}
return(word_dict)
{'你好':2,'我的':1,'名字':1,'是':1}
编辑:
作为brianpck,指出:
def word_distribution(words):
word_dict = {}
words = words.lower().split()
word_dict = {word:words.count(word) for word in set(words)}
return(word_dict)
也会给出相同的结果。
答案 1 :(得分:1)
你使它变得太复杂了,正如Sohier Dane在评论中提到的那样,你可以利用其他帖子删除标点并简化脚本:
import string
def word_distribution(words):
words = words.translate(None, string.punctuation).lower()
d = {}
for w in words.split():
if w not in d.keys():
d[w] = 1
else:
d[w] += 1
return d
结果:
>>> x='Hello My Name Is hello.'
>>> print word_distribution(x)
>>> {'is': 1, 'my': 1, 'hello': 2, 'name': 1}
答案 2 :(得分:1)
肯定有更好的方法来实现您的目标,但这个答案会修复您的代码。
字符串是不可变的,列表是可变的。您的代码中没有任何地方正在修改列表。并且words[-1]
不会产生任何影响,因为您没有重新分配它,字符串是不可变的
def word_distribution(words):
word_dict = {}
words = words.lower()
words = words.split()
for word in words:
index = words.index(word)
if ord('a') <= ord(word[-1]) <= ord('z'):
pass
elif ord('A') <= ord(word[-1]) <= ord('Z'):
pass
else:
word = word[:-1]
words[index] = word
word_dict = {word:words.count(word) for word in set(words)}
return(word_dict)