Question

我有一份文字文件。我想从这个文档中编译一个字典（DICT）。字典必须只包含以大写字母开头的所有单词。（如果单词在句子的开头是无关紧要的）

到目前为止，我已经这样做了：顺便说一句，我必须使用 for loop 和 split 函数来解决这个问题

DICT = {}

for line in lines: # lines is the text without line breaks 
    words = line.split(" ")
    for word in words:
        if word in DICT:
            DICT[word] += 1
        else:
            DICT[word] = 1

但我想这只会使我的文字中的所有单词都出现字典。

我如何只选择以大写字母开头的单词？
如何验证我是否正确制作了字典？

Answer 1

使用s.isupper() method测试字符串是否为大写。您可以使用索引来选择只第一个字符。

因此，要测试 first 字符是否为大写，请使用：

if word[0].isupper():

如果您想要快速和pythonic方法，请使用collections.Counter() object进行计数，并在所有空格上拆分以删除换行符：

from collections import Counter

counts = Counter()

for line in lines: # lines is the text without line breaks 
    counts.update(word for word in line.split() if word[0].isupper())

这里，没有参数的word.split()在所有空格上拆分，删除行的开头和结尾处的任何空格（包括换行符）。

Answer 2

from itertools import groupby
s = "QWE asd ZXc vvQ QWE"
# extract all the words with capital first letter
caps = [word for word in s.split(" ") if word[0].isupper()]  
# group and count them
caps_counts = {word: len(list(group)) for word, group in groupby(sorted(caps))}

print(caps_counts)

groupby可能效率低于手动循环，因为它需要排序的iterable执行排序，并且排序是O（NlogN）复合，在手动循环的情况下超过O（N）强制性。但这种变体更加“pythonic”。

Answer 3

您可以使用提及的isupper函数检查单词是否以大写字母开头，并在if else声明之前包含此内容。

if word[0].isupper():
    if word in DICT:
        DICT[word] += 1
    else:
        DICT[word] = 1

要验证这一点，您可以使用any方法：

any(word[0].islower() for word in DICT.keys())

应返回False。如果您愿意，可以asset。

为了让一切变得更好，你可以利用defaultdict

from collection import defaultdict

DICT = defaultdict(int)
for line in lines:
    words = line.split(" ")
    for word in words:
        if (word in DICT) and (word[0].isupper()):
            DICT[word] += 1

如何只在字典中放入以大写字母开头的单词？

3 个答案: