Question

我有一长串的1.5米句子以及我在句子列表中寻找的同样长的单词列表。例如：

list_of_words = ['Turin', 'Milan']
list_of_sents = ['This is a sent about turin.', 'This is a sent about manufacturing.']

我有以下功能，能够使用关键字快速识别这些句子，计算时间相当重要，所以我想避免循环，如果能够：

def find_keyword_comments(test_comments,test_keywords):
    keywords = '|'.join(test_keywords)
    word = re.compile(r"^.*\b({})\b.*$".format(keywords), re.I)
    newlist = filter(word.match, test_comments)
    final = list(newlist)
    return final

我希望它返回一个包含匹配字的元组列表和包含该位置的字符串，而不是返回包含该关键字的字符串列表。所以它目前返回：

final = ['This is a sent about turin.']

我希望它返回

final = [('Turin', 'This is a sent about turin.')]

是否存在我滥用或遗忘的语法功能？

Answer 1

您可以获取每个关键字并查找包含该字词的所有评论：

import re
def find_keyword_comments(test_comments,test_keywords):
   return [(word, [c for c in test_comments if re.findall(r'\b{}\b'.format(word), c, flags=re.I)]) for word in test_keywords]

list_of_words = ['Turin', 'Milan']
list_of_sents = ['This is a sent about turin.', 'This is a sent about manufacturing.']
print(find_keyword_comments(list_of_sents, list_of_words))

输出：

[('Turin', ['This is a sent about turin.']), ('Milan', [])]

Answer 2

这是一种方式：

import re

list_of_words = ['Turin', 'Milan']
list_of_sents = ['This is a sent about turin.', 'This is a sent about manufacturing.']

print([[x, y] for x in list_of_words for y in list_of_sents if re.search(r'\b{}\b'.format(x.lower()), y.lower())])
# [['Turin', 'This is a sent about turin.']]

python filter：从具有目标字符串的字符串列表返回元组

2 个答案: