通过连续字长创建字典

时间:2018-04-30 07:25:53

标签: python

如何从一个句子创建一个字典,使得键值只包含连续的单词,条件是'key'单词的长度为(x),'value'单词的长度为( x+1)?该函数应该在任何结束标点符号(句点,问号,冒号,感叹号)之前仅考虑句子中的单词。

例如,以下句子中的键值对将是

{'best' : ('bballs', 'coach')}.

示例句子:

  

“最好的篮球队是NYCC州勇士队,因为他们拥有A +++球员和最好的教练,他们应该在这场比赛中赢得大多数比赛。”

2 个答案:

答案 0 :(得分:0)

您可以使用split()从字符串中获取字词。然后迭代这些单词并检查连续单词的长度是否符合要求。要将结果存储在具有单个键的多个值的字典中,可以使用collections.defaultdict

from collections import defaultdict

s = "The best balls team is the NYCC State Warriors since they have the A+++ players and best coach and they should win most games this break."
result = defaultdict(list)

words = s.rstrip('.').split()
for idx, word in enumerate(words[:-1]):
    if len(word)+1 == len(words[idx+1]):
        result[word].append(words[idx+1])

print(result)

输出:

defaultdict(<class 'list'>,
            {'NYCC': ['State'],
             'The': ['best'],
             'and': ['best', 'they'],
             'best': ['balls', 'coach'],
             'is': ['the'],
             'most': ['games'],
             'the': ['NYCC', 'A+++'],
             'this': ['break'],
             'win': ['most']})

答案 1 :(得分:0)

你可以用正则表达式做到这一点。

import re

sentence = 'The best balls team is the NYCC State Warriors since they have the A+++ players and best coach and they should win most games this break.'

result = {}
words = set(re.findall(r'([^ .,!?;]+)', sentence))
for word in words:
    re_consec = r'{}\s+([^ .,!?;]{})[^\w]'.format(re.escape(word), '{' + str(len(word) + 1) + '}')
    matches = re.findall(re_consec, sentence)
    if matches:
        result[word] = matches

这会提供下一个result

{'NYCC': ['State'],
 'The': ['best'],
 'and': ['best', 'they'],
 'best': ['balls', 'coach'],
 'is': ['the'],
 'most': ['games'],
 'the': ['NYCC', 'A+++'],
 'this': ['break'],
 'win': ['most']}