我正在考虑解决此问题的最佳方法。
我基本上想要获取一些文本,并将其与关键字进行比较。
当然,我可以这样做:
keyword = 'python 3.5'
title = 'python 3.5 is a programming language'
if keyword in title:
但是,它必须按顺序排列。如果标题文本恰好是:
title = 'my favourite version of python is 3.5!'
那不行。
所以,我尝试了一种方法,将关键字与.split()
分开,然后检查拆分关键字列表中的两个项目是否都在标题变量中,但是没有运气有效的方式。
如果有人知道这样做的好方法,我会非常感激。
答案 0 :(得分:0)
这将完成这项工作:
keyword = 'python 3.5'
title = 'python 3.5 is a programming language'
s=set(keyword.split(" "))
m=set(title.split(" "))
if(len(set.intersection(s,m)==len(s)):
print(True)
假设你不关心重复。也就是说,你考虑
keyword = 'python 3.5 python'
title = 'python 3.5 is a programming language'
成为一对,其中所有关键字确实在标题内。
答案 1 :(得分:0)
因此,您需要在标题中按顺序查找关键短语的每个单词。试试这个:按顺序搜索每个单词;在标题的其余部分继续搜索。
key_phrase = 'python 3.5'
title_list = ['python 3.5 is a programming language',
'my favourite version of python is 3.5!']
key_word = key_phrase.split()
for title in title_list:
remain = title.split()
found = True
for word in key_word:
if word in remain:
pos = remain.index(word)
remain = remain[pos+1:]
else:
found = False
print title, "\tfound=", found
输出:
python 3.5 is a programming language found= True
my favourite version of python is 3.5! found= False
答案 2 :(得分:0)
你可以这样做......如果你想自定义匹配案例的精确度。
keyword = 'python 3.5'
title = 'my favourite version of python is 3.5!'
precision = 100 # 100% precision (both python and 3.5 must exist in title)
if len([x for x in set(keyword.split(' ')) if x in title]) >= round(len(set(keyword.split(' ')))*(precision/100)):
print('Yes')
else:
print('No')
输出:
'Yes'
如果您将title
更改为:
title = 'my favourite version of python is 3.4!'
输出为'No'
但是......对precision
进行了一些修改:
precision = 50
输出为'Yes'
答案 3 :(得分:0)
我认为你需要all()
title = 'my favourite version of python is 3.5!'
keyword = 'python 3.5'
print all(n in title for n in keyword.split())
keyword = 'hello 3.5'
print all(n in title for n in keyword.split())
keyword = 'hello world'
print all(n in title for n in keyword.split())
keyword = 'python 2.0'
print all(n in title for n in keyword.split())
结果
True
False
False
False
答案 4 :(得分:0)
使用内置any()和str.split()功能短单行:
keyword = 'python 3.5'
title = 'my favourite version of python is 3.5!'
print(all(i in title for i in keyword.split()))
输出:
True
答案 5 :(得分:0)
你不想比较列表(它很慢),你应该比较集合。作为奖励,issubset
已经定义:
title = 'python 3.5 is a programming language'
def contains_all_keywords(sentence, keywords):
keywords = set(keywords.split())
return(keywords.issubset(set(sentence.split())))
print(contains_all_keywords(title, 'python 3.5'))
# True
print(contains_all_keywords(title, '3.5 python'))
# True
print(contains_all_keywords(title, 'python 2.7'))
# False