我的Python程序有问题。我正在尝试做一个单词计数器,这是Exercism的练习。
现在,我的程序必须通过13个测试,所有测试都是带空格,字符,数字等的不同字符串。
我曾经有一个问题,因为我会用空格替换所有非字母和数字。这就给像"don't"
这样的单词造成了问题,因为它将把它分成两个字符串,don
和t
。为了解决这个问题,我添加了一个if
语句,该语句排除了替换单个'
标记的效果。
但是,我必须测试的字符串之一是"Joe can't tell between 'large' and large."
。问题在于,由于我排除了'
个市场,因此这里的large
和'large'
被认为是两个不同的事物,它们也是同一个词。我如何告诉我的程序“擦除” 环绕一词?
这是我的代码,我添加了两种情况,一种是上面的字符串,另一种是只有一个'
标记的另一个字符串,不应删除:
def word_count(phrase):
count = {}
for c in phrase:
if not c.isalpha() and not c.isdigit() and c != "'":
phrase = phrase.replace(c, " ")
for word in phrase.lower().split():
if word not in count:
count[word] = 1
else:
count[word] += 1
return count
print(word_count("Joe can't tell between 'large' and large."))
print(word_count("Don't delete that single quote!"))
谢谢您的帮助。
答案 0 :(得分:2)
在列表中包含第一个和最后一个字符后,请使用.strip()
-https://python-reference.readthedocs.io/en/latest/docs/str/strip.html
def word_count(phrase):
count = {}
for c in phrase:
if not c.isalpha() and not c.isdigit() and c != "'":
phrase = phrase.replace(c, " ")
print(phrase)
for word in phrase.lower().split():
word = word.strip("\'")
if word not in count:
count[word] = 1
else:
count[word] += 1
return count
答案 1 :(得分:2)
模块string拥有一些不错的文本常量-对您来说重要的是punctuation
。模块collections holds Counter-用于计数的专用词典类:
from collections import Counter
from string import punctuation
# lookup in set is fastest
ps = set(string.punctuation) # "!#$%&'()*+,-./:;<=>?@[\]^_`{|}~
def cleanSplitString(s):
"""cleans all punctualtion from the string s and returns split words."""
return ''.join([m for m in s if m not in ps]).lower().split()
def word_count(sentence):
return dict(Counter(cleanSplitString(sentence))) # return a "normal" dict
print(word_count("Joe can't tell between 'large' and large."))
print(word_count("Don't delete that single quote!"))
输出:
{'joe': 1, 'cant': 1, 'tell': 1, 'between': 1, 'large': 2, 'and': 1}
{'dont': 1, 'delete': 1, 'that': 1, 'single': 1, 'quote': 1}
如果要将标点符号保留在单词中,请使用:
def cleanSplitString_2(s):
"""Cleans all punctuations from start and end of words, keeps them if inside."""
return [w.strip(punctuation) for w in s.lower().split()]
输出:
{'joe': 1, "can't": 1, 'tell': 1, 'between': 1, 'large': 2, 'and': 1}
{"don't": 1, 'delete': 1, 'that': 1, 'single': 1, 'quote': 1}