Question

我的Python程序有问题。我正在尝试做一个单词计数器，这是Exercism的练习。

现在，我的程序必须通过13个测试，所有测试都是带空格，字符，数字等的不同字符串。我曾经有一个问题，因为我会用空格替换所有非字母和数字。这就给像"don't"这样的单词造成了问题，因为它将把它分成两个字符串，don和t。为了解决这个问题，我添加了一个if语句，该语句排除了替换单个'标记的效果。

但是，我必须测试的字符串之一是"Joe can't tell between 'large' and large."。问题在于，由于我排除了'个市场，因此这里的large和'large'被认为是两个不同的事物，它们也是同一个词。我如何告诉我的程序“擦除” 环绕一词？

这是我的代码，我添加了两种情况，一种是上面的字符串，另一种是只有一个'标记的另一个字符串，不应删除：

def word_count(phrase):
    count = {}
    for c in phrase:
        if not c.isalpha() and not c.isdigit() and c != "'":
            phrase = phrase.replace(c, " ")
    for word in phrase.lower().split():
        if word not in count:
            count[word] = 1
        else:
            count[word] += 1
    return count

print(word_count("Joe can't tell between 'large' and large."))
print(word_count("Don't delete that single quote!"))

谢谢您的帮助。

Answer 1

在列表中包含第一个和最后一个字符后，请使用.strip()-https://python-reference.readthedocs.io/en/latest/docs/str/strip.html

def word_count(phrase):
    count = {}
    for c in phrase:
        if not c.isalpha() and not c.isdigit() and c != "'":
            phrase = phrase.replace(c, " ")
    print(phrase)
    for word in phrase.lower().split():
        word = word.strip("\'")
        if word not in count:
            count[word] = 1
        else:
            count[word] += 1
    return count

Answer 2

模块string拥有一些不错的文本常量-对您来说重要的是punctuation。模块collections holds Counter-用于计数的专用词典类：

from collections import Counter 
from string import punctuation

# lookup in set is fastest 
ps = set(string.punctuation)  # "!#$%&'()*+,-./:;<=>?@[\]^_`{|}~

def cleanSplitString(s):
    """cleans all punctualtion from the string s and returns split words."""
    return ''.join([m for m in s if m not in ps]).lower().split()

def word_count(sentence):
    return dict(Counter(cleanSplitString(sentence))) # return a "normal" dict

print(word_count("Joe can't tell between 'large' and large.")) 
print(word_count("Don't delete that single quote!"))

输出：

{'joe': 1, 'cant': 1, 'tell': 1, 'between': 1, 'large': 2, 'and': 1}
{'dont': 1, 'delete': 1, 'that': 1, 'single': 1, 'quote': 1}

如果要将标点符号保留在单词中，请使用：

def cleanSplitString_2(s):
    """Cleans all punctuations from start and end of words, keeps them if inside."""
    return [w.strip(punctuation) for w in s.lower().split()]

输出：

{'joe': 1, "can't": 1, 'tell': 1, 'between': 1, 'large': 2, 'and': 1}
{"don't": 1, 'delete': 1, 'that': 1, 'single': 1, 'quote': 1}

Readup on strip()

Python单词计数器对单词是否用引号引起来敏感吗？

2 个答案: