Question

我有以下字符串：

S = "to be or not to be, that is the question?"

我希望能够创建一个输出

的字典

{'question': 4, 'is': 1, 'be,': 1, 'or': 1, 'the': 1, 'that': 1, 'be': 1, 'to': 1, 'not': 1}

我得到单词旁边每个单词中元音的数量，而不是每个单词本身的数量。到目前为止，我有：

{x:y for x in S.split() for y in [sum(1 for char in word if char.lower() in set('aeiou')) for word in S.split()]}

输出：

{'or': 4, 'the': 4, 'question?': 4, 'be,': 4, 'that': 4, 'to': 4, 'be': 4, 'is': 4, 'not': 4}

如何从字符串中获取字典，其中值是每个单词的元音数？

Answer 1

单词旁边每个单词中的元音数量，而不是每个单词本身的数量？

>>> s = "to be or not to be, that is the question"

首先删除标点符号：

>>> new_s = s.translate(None, ',?!.')
>>> new_s
'to be or not to be that is the question'

然后拆分空白：

>>> split = new_s.split()
>>> split
['to', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']

现在计算字典中的元音。请注意，没有多余的计数：

>>> vowel_count = {i: sum(c.lower() in 'aeiou' for c in i) for i in split}
>>> vowel_count
{'be': 1, 'that': 1, 'is': 1, 'question': 4, 'to': 1, 'not': 1, 'the': 1, 'or': 1}

Answer 2

您可以使用re（正则表达式模块）查找所有有效字词（\w+ - 不包含空格和逗号），并使用Counter检查频率：

import re

from collections import Counter
s = "tell me what I tell you, to you"
print Counter(re.findall(r'\w+', s))

<强>输出

Counter({'you': 2, 'tell': 2, 'me': 1, 'what': 1, 'I': 1, 'to': 1})

从字符串创建字典，其中值是每个单词的元音数量？

2 个答案: