Python单词在列表中识别

时间:2014-12-28 18:19:44

标签: python

def count_words(text, words):
    count = 0
    text.split()
    print text.split()
    for w in words:
        count += 1
    return count


if __name__ == '__main__':
    #These unit tests are only for self-checking and not necessary for auto-testing
    assert count_words(u"How aresjfhdskfhskd you?", {u"how", u"are", u"you", u"hello"}) == 3, "Example"
    assert count_words(u"Bananas, give me bananas!!!", {u"banana", u"bananas"}) == 2, "BANANAS!"
    assert count_words(u"Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",
                   {u"sum", u"hamlet", u"infinity", u"anything"}) == 1, "Weird text"

好的,我得到了以下问题,计数结果是4(我不知道怎么可能)。

结果应该是4而不应该是3

2 个答案:

答案 0 :(得分:3)

您的代码没有多大意义,您可以编写如下行:

print text.split(9

在哪里打开支架而不要关闭它。

此外,您的算法:

for w in words:
    count += 1
return count

没有多大意义:你只需计算单词的数量。

您寻找的方法是:

def count_words(text, words):
    count = 0
    for w in words:
        if w in text:
            count += 1
    return count

因此添加约束(如果搜索区分大小写):

if w in text

检查text 是否包含单词w

这给出了:

>>> count_words(u"How aresjfhdskfhskd you?", {u"how", u"are", u"you", u"hello"})
2
由于"how""How"

不同,因此不会计算

"how"

如果搜索应不区分大小写,您可以使用:

def count_words(text, words):
    count = 0
    text = text.lower()
    for w in words:
        w = w.lower()
        if w in text:
            count += 1
    return count

完全返回测试用例(使用python3):

>>> count_words(u"How aresjfhdskfhskd you?", {u"how", u"are", u"you", u"hello"})
3
>>> count_words(u"Bananas, give me bananas!!!", {u"banana", u"bananas"})
2
>>> count_words(u"Lorem ipsum dolor sit amet, consectetuer adipiscing elit.",{u"sum", u"hamlet", u"infinity", u"anything"})
1

答案 1 :(得分:1)

def count_words(text, words, case_insensitive=False):
    """Returns the number of space-delimited words in `text` that
    appear in some iterable `words`"""

    if case_insensitive:
        text = text.lower()
        words = map(str.lower, words)
    return sum(1 for word in text.split() if word in words)

使用这种生成器表达式是构造此函数的非常惯用的方法。基本上为1text.split()中的每个单词构建一个充满words个列表的列表,然后返回sumint