假设我有很多关键字。 例如:
myfile_list: "{{ lookup('file', myfile).splitlines() }}"
我有一个pdf文档,然后我从中解析出整个文本, 现在我想得到与单词袋匹配的句子。
让我们说一个句子是:
['profit low', 'loss increased', 'profit lowered']
这应该与单词'The profit in the month of November lowered from 5% to 3%.'
中的单词匹配匹配。
在python中解决此问题的最佳方法是什么?
答案 0 :(得分:0)
如果要检查所有检查单词列表元素是否在长句子内
sentence = 'The profit in the month of November lowered from 5% to 3%.'
words = ['profit','month','5%']
for element in words:
if element in sentence:
#do something with it
print(element)
如果您想变得更清洁,可以使用此循环线将匹配的单词收集到列表中:
sentence = 'The profit in the month of November lowered from 5% to 3%.'
words = ['profit','month','5%']
matched_words = [] # Will collect the matched words in the next life loop:
[matched_words.append(word) for word in words if word in sentence]
print(matched_words)
如果列表中的每个元素上都有“分隔”的单词,则要使用 split()方法来处理它。
sentence = 'The profit in the month of November lowered from 5% to 3%.'
words = ['profit low','month high','5% 3%']
single_words = []
for w in words:
for s in range(len(w.split(' '))):
single_words.append(w.split(' ')[s])
matched_words = [] # Will collect the matched words in the next life loop:
[matched_words.append(word) for word in single_words if word in sentence]
print(matched_words)
答案 1 :(得分:0)
# input
checking_words = ['profit low', 'loss increased', 'profit lowered']
checking_string = 'The profit in the month of November lowered from 5% to 3%.'
trans_check_words = checking_string.split()
# output
for word_bug in [st.split() for st in checking_words]:
if word_bug[0] in trans_check_words and word_bug[1] in trans_check_words:
print(word_bug)
答案 2 :(得分:0)
您可以尝试以下操作:
将单词袋转换为句子:
bag_of_words = ['profit low', 'loss increased', 'profit lowered']
bag_of_word_sent = ' '.join(bag_of_words)
然后是句子列表:
list_sents = ['The profit in the month of November lowered from 5% to 3%.']
使用Levenshtein距离:
import distance
for sent in list_sents:
dist = distance.levenshtein(bag_of_word_sent, sent)
if dist > len(bag_of_word_sent):
# do something
print(dist)