Question

def count_words(text, words):
    count = 0
    text.split()
    print text.split()
    for w in words:
        count += 1
    return count


if __name__ == '__main__':
    #These unit tests are only for self-checking and not necessary for auto-testing
    assert count_words(u"How aresjfhdskfhskd you?", {u"how", u"are", u"you", u"hello"}) == 3, "Example"
    assert count_words(u"Bananas, give me bananas!!!", {u"banana", u"bananas"}) == 2, "BANANAS!"
    assert count_words(u"Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",
                   {u"sum", u"hamlet", u"infinity", u"anything"}) == 1, "Weird text"

好的，我得到了以下问题，计数结果是4（我不知道怎么可能）。

结果应该是4而不应该是3

Answer 1

您的代码没有多大意义，您可以编写如下行：

print text.split(9

在哪里打开支架而不要关闭它。

此外，您的算法：

for w in words:
    count += 1
return count

没有多大意义：你只需计算单词的数量。

您寻找的方法是：

def count_words(text, words):
    count = 0
    for w in words:
        if w in text:
            count += 1
    return count

因此添加约束（如果搜索区分大小写）：

if w in text

检查text 是否包含单词w。

这给出了：

>>> count_words(u"How aresjfhdskfhskd you?", {u"how", u"are", u"you", u"hello"})
2

由于"how"与"How"

不同，因此不会计算

"how"

如果搜索应不区分大小写，您可以使用：

def count_words(text, words):
    count = 0
    text = text.lower()
    for w in words:
        w = w.lower()
        if w in text:
            count += 1
    return count

完全返回测试用例（使用python3）：

>>> count_words(u"How aresjfhdskfhskd you?", {u"how", u"are", u"you", u"hello"})
3
>>> count_words(u"Bananas, give me bananas!!!", {u"banana", u"bananas"})
2
>>> count_words(u"Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",{u"sum", u"hamlet", u"infinity", u"anything"})
1

Answer 2

def count_words(text, words, case_insensitive=False):
    """Returns the number of space-delimited words in `text` that
    appear in some iterable `words`"""

    if case_insensitive:
        text = text.lower()
        words = map(str.lower, words)
    return sum(1 for word in text.split() if word in words)

使用这种生成器表达式是构造此函数的非常惯用的方法。基本上为1中text.split()中的每个单词构建一个充满words个列表的列表，然后返回sum个int

Python单词在列表中识别

2 个答案: