我有一长串的1.5米句子以及我在句子列表中寻找的同样长的单词列表。例如:
list_of_words = ['Turin', 'Milan']
list_of_sents = ['This is a sent about turin.', 'This is a sent about manufacturing.']
我有以下功能,能够使用关键字快速识别这些句子,计算时间相当重要,所以我想避免循环,如果能够:
def find_keyword_comments(test_comments,test_keywords):
keywords = '|'.join(test_keywords)
word = re.compile(r"^.*\b({})\b.*$".format(keywords), re.I)
newlist = filter(word.match, test_comments)
final = list(newlist)
return final
我希望它返回一个包含匹配字的元组列表和包含该位置的字符串,而不是返回包含该关键字的字符串列表。所以它目前返回:
final = ['This is a sent about turin.']
我希望它返回
final = [('Turin', 'This is a sent about turin.')]
是否存在我滥用或遗忘的语法功能?
答案 0 :(得分:0)
您可以获取每个关键字并查找包含该字词的所有评论:
import re
def find_keyword_comments(test_comments,test_keywords):
return [(word, [c for c in test_comments if re.findall(r'\b{}\b'.format(word), c, flags=re.I)]) for word in test_keywords]
list_of_words = ['Turin', 'Milan']
list_of_sents = ['This is a sent about turin.', 'This is a sent about manufacturing.']
print(find_keyword_comments(list_of_sents, list_of_words))
输出:
[('Turin', ['This is a sent about turin.']), ('Milan', [])]
答案 1 :(得分:0)
这是一种方式:
import re
list_of_words = ['Turin', 'Milan']
list_of_sents = ['This is a sent about turin.', 'This is a sent about manufacturing.']
print([[x, y] for x in list_of_words for y in list_of_sents if re.search(r'\b{}\b'.format(x.lower()), y.lower())])
# [['Turin', 'This is a sent about turin.']]