Question

问题陈述：从完整的text6集合中过滤掉那些单词，首字母大写，所有其他字母小写。将结果存储在变量title_words中。打印出现在title_words中的单词数。

我尝试了所有可能的方法来找到答案，但是不知道我在哪里落后。

import nltk
from nltk.book import text6
title_words = 0
for item in set(text6):
    if item[0].isupper() and item[1:].islower():
        title_words += 1
print(title_words)

我也尝试过这种方式：

title_words = 0
for item in text6:
    if item[0].isupper() and item[1:].islower():
        title_words += 1
print(title_words)

我不确定需要多少计数，无论计数如何，都无法让我通过挑战。如果我在此代码中做任何错误，请告诉我

Answer 1

尝试使用正则表达式：

>>> import re
>>> from nltk.book import text6
>>>
>>> text = ' '.join(set(text6))
>>> title_words = re.findall(r'([A-Z]{1}[a-z]+)', text)
>>> len(title_words)
461

Answer 2

我认为问题出在set(text6)。我建议您遍历text6.tokens。

更新，说明

您提供的代码正确。

问题在于文本可以多次包含相同的单词。进行set(words)会减少可用单词总数，因此您将从不完整的数据集开始。

在检查单词的有效性时，其他响应不一定是错误的，但是它们在相同的错误数据集上进行迭代。

Answer 3

以上建议之一对我有用。下面的示例代码。

title_words = [word for word in text6 if (len(word)==1 and word[0].isupper()) or (word[0].isupper() and word[1:].islower()) ]
print(len(title_words))

Answer 4

在问题中，“将结果存储在title_words变量中。打印title_words中存在的单词数。”

过滤元素列表的结果是相同类型元素的列表。在您的情况下，过滤列表text6（假设它是字符串列表）将导致（较小）字符串列表。您的title_words变量应该是此过滤列表，而不是字符串数；字符串的数量就是列表的长度。

对于大写的单词应该过滤掉（即从较小的列表中删除）还是过滤的（即保留在列表中），这个问题也存在歧义，因此，请尝试两种方法，看看您是否误解了。

Answer 5

text6中有50个单例元素（长度为1的元素），但是，您的代码不会成功通过，例如“ I”或“ W”等。这是正确的，还是您要求输入单词最小长度2？

Answer 6

只需根据问题的要求进行一些更改。

from nltk.book import text6
title_words = []
for item in set(text6):
    if item[0].isupper() and item[1:].islower():
        title_words.append(item)
print(len(title_words))

Answer 7

试试这个：

value

如何查找单词-首字母将大写，其他字母将小写

7 个答案: