我使用以下代码从文件中提取句子(该句子应包含部分或全部搜索关键字)
search_keywords=['mother','sing','song']
with open('text.txt', 'r') as in_file:
text = in_file.read()
sentences = text.split(".")
for sentence in sentences:
if (all(map(lambda word: word in sentence, search_keywords))):
print sentence
上述代码的问题在于,如果其中一个搜索关键字与句子词不匹配,则不会打印所需的句子。我想要一个代码打印包含部分或全部搜索关键字的句子。如果代码也可以搜索短语并提取相应的句子,那就太好了。
答案 0 :(得分:4)
您似乎想要计算每个句子中search_keyboards
的数量。您可以按如下方式执行此操作:
sentences = "My name is sing song. I am a mother. I am happy. You sing like my mother".split(".")
search_keywords=['mother','sing','song']
for sentence in sentences:
print("{} key words in sentence:".format(sum(1 for word in search_keywords if word in sentence)))
print(sentence + "\n")
# Outputs:
#2 key words in sentence:
#My name is sing song
#
#1 key words in sentence:
# I am a mother
#
#0 key words in sentence:
# I am happy
#
#2 key words in sentence:
# You sing like my mother
或者如果您只想要匹配search_keywords
最匹配的句子,您可以创建字典并找到最大值:
dct = {}
for sentence in sentences:
dct[sentence] = sum(1 for word in search_keywords if word in sentence)
best_sentences = [key for key,value in dct.items() if value == max(dct.values())]
print("\n".join(best_sentences))
# Outputs:
#My name is sing song
# You sing like my mother
答案 1 :(得分:0)
所以你想找到至少包含一个关键词的句子。您可以使用any()代替all()。
编辑: 如果要查找包含最多关键字的句子:
sent_words = []
for sentence in sentences:
sent_words.append(set(sentence.split()))
num_keywords = [len(sent & set(search_keywords)) for sent in sent_words]
# Find only one sentence
ind = num_keywords.index(max(num_keywords))
# Find all sentences with that number of keywords
ind = [i for i, x in enumerate(num_keywords) if x == max(num_keywords)]
答案 2 :(得分:0)
如果我理解正确,您应该使用any()
代替all()
。
if (any(map(lambda word: word in sentence, search_keywords))):
print sentence