python filter:从具有目标字符串的字符串列表返回元组

时间:2018-05-09 16:20:09

标签: python regex python-3.x nlp

我有一长串的1.5米句子以及我在句子列表中寻找的同样长的单词列表。例如:

list_of_words = ['Turin', 'Milan']
list_of_sents = ['This is a sent about turin.', 'This is a sent about manufacturing.']

我有以下功能,能够使用关键字快速识别这些句子,计算时间相当重要,所以我想避免循环,如果能够:

def find_keyword_comments(test_comments,test_keywords):
    keywords = '|'.join(test_keywords)
    word = re.compile(r"^.*\b({})\b.*$".format(keywords), re.I)
    newlist = filter(word.match, test_comments)
    final = list(newlist)
    return final

我希望它返回一个包含匹配字的元组列表和包含该位置的字符串,而不是返回包含该关键字的字符串列表。所以它目前返回:

final = ['This is a sent about turin.']

我希望它返回

final = [('Turin', 'This is a sent about turin.')]

是否存在我滥用或遗忘的语法功能?

2 个答案:

答案 0 :(得分:0)

您可以获取每个关键字并查找包含该字词的所有评论:

import re
def find_keyword_comments(test_comments,test_keywords):
   return [(word, [c for c in test_comments if re.findall(r'\b{}\b'.format(word), c, flags=re.I)]) for word in test_keywords]

list_of_words = ['Turin', 'Milan']
list_of_sents = ['This is a sent about turin.', 'This is a sent about manufacturing.']
print(find_keyword_comments(list_of_sents, list_of_words))

输出:

[('Turin', ['This is a sent about turin.']), ('Milan', [])]

答案 1 :(得分:0)

这是一种方式:

import re

list_of_words = ['Turin', 'Milan']
list_of_sents = ['This is a sent about turin.', 'This is a sent about manufacturing.']

print([[x, y] for x in list_of_words for y in list_of_sents if re.search(r'\b{}\b'.format(x.lower()), y.lower())])
# [['Turin', 'This is a sent about turin.']]