如何从一个句子创建一个字典,使得键值只包含连续的单词,条件是'key'单词的长度为(x
),'value'单词的长度为( x+1
)?该函数应该在任何结束标点符号(句点,问号,冒号,感叹号)之前仅考虑句子中的单词。
例如,以下句子中的键值对将是
{'best' : ('bballs', 'coach')}.
示例句子:
“最好的篮球队是NYCC州勇士队,因为他们拥有A +++球员和最好的教练,他们应该在这场比赛中赢得大多数比赛。”
答案 0 :(得分:0)
您可以使用split()
从字符串中获取字词。然后迭代这些单词并检查连续单词的长度是否符合要求。要将结果存储在具有单个键的多个值的字典中,可以使用collections.defaultdict
。
from collections import defaultdict
s = "The best balls team is the NYCC State Warriors since they have the A+++ players and best coach and they should win most games this break."
result = defaultdict(list)
words = s.rstrip('.').split()
for idx, word in enumerate(words[:-1]):
if len(word)+1 == len(words[idx+1]):
result[word].append(words[idx+1])
print(result)
输出:
defaultdict(<class 'list'>,
{'NYCC': ['State'],
'The': ['best'],
'and': ['best', 'they'],
'best': ['balls', 'coach'],
'is': ['the'],
'most': ['games'],
'the': ['NYCC', 'A+++'],
'this': ['break'],
'win': ['most']})
答案 1 :(得分:0)
你可以用正则表达式做到这一点。
import re
sentence = 'The best balls team is the NYCC State Warriors since they have the A+++ players and best coach and they should win most games this break.'
result = {}
words = set(re.findall(r'([^ .,!?;]+)', sentence))
for word in words:
re_consec = r'{}\s+([^ .,!?;]{})[^\w]'.format(re.escape(word), '{' + str(len(word) + 1) + '}')
matches = re.findall(re_consec, sentence)
if matches:
result[word] = matches
这会提供下一个result
:
{'NYCC': ['State'],
'The': ['best'],
'and': ['best', 'they'],
'best': ['balls', 'coach'],
'is': ['the'],
'most': ['games'],
'the': ['NYCC', 'A+++'],
'this': ['break'],
'win': ['most']}