列出与关键字的比较

时间:2017-03-21 21:56:49

标签: python python-3.x

我正在考虑解决此问题的最佳方法。

我基本上想要获取一些文本,并将其与关键字进行比较。

当然,我可以这样做:

keyword = 'python 3.5'
title = 'python 3.5 is a programming language'
if keyword in title:

但是,它必须按顺序排列。如果标题文本恰好是:

title = 'my favourite version of python is 3.5!'

那不行。

所以,我尝试了一种方法,将关键字与.split()分开,然后检查拆分关键字列表中的两个项目是否都在标题变量中,但是没有运气有效的方式。

如果有人知道这样做的好方法,我会非常感激。

6 个答案:

答案 0 :(得分:0)

这将完成这项工作:

keyword = 'python 3.5'
title = 'python 3.5 is a programming language'
s=set(keyword.split(" "))
m=set(title.split(" "))
if(len(set.intersection(s,m)==len(s)): 
   print(True)

假设你不关心重复。也就是说,你考虑

keyword = 'python 3.5 python'
title = 'python 3.5 is a programming language'

成为一对,其中所有关键字确实在标题内。

答案 1 :(得分:0)

因此,您需要在标题中按顺序查找关键短语的每个单词。试试这个:按顺序搜索每个单词;在标题的其余部分继续搜索。

key_phrase = 'python 3.5'
title_list = ['python 3.5 is a programming language',
              'my favourite version of python is 3.5!']

key_word = key_phrase.split()

for title in title_list:
    remain = title.split()
    found = True
    for word in key_word:
        if word in remain:
            pos = remain.index(word)
            remain = remain[pos+1:]
        else:
            found = False

    print title, "\tfound=", found

输出:

python 3.5 is a programming language    found= True
my favourite version of python is 3.5!  found= False

答案 2 :(得分:0)

你可以这样做......如果你想自定义匹配案例的精确度。

keyword = 'python 3.5'
title = 'my favourite version of python is 3.5!'
precision = 100 # 100% precision (both python and 3.5 must exist in title)
if len([x for x in set(keyword.split(' ')) if x in title]) >= round(len(set(keyword.split(' ')))*(precision/100)):
    print('Yes')
else:
    print('No')

输出:

'Yes'

如果您将title更改为:

title = 'my favourite version of python is 3.4!'

输出为'No' 但是......对precision进行了一些修改:

precision = 50

输出为'Yes'

答案 3 :(得分:0)

我认为你需要all()

title = 'my favourite version of python is 3.5!'

keyword = 'python 3.5'
print all(n in title for n in keyword.split())

keyword = 'hello 3.5'
print all(n in title for n in keyword.split())

keyword = 'hello world'
print all(n in title for n in keyword.split())

keyword = 'python 2.0'
print all(n in title for n in keyword.split())

结果

True
False
False
False

答案 4 :(得分:0)

使用内置any()str.split()功能短单行

keyword = 'python 3.5'
title = 'my favourite version of python is 3.5!'

print(all(i in title for i in keyword.split()))

输出:

True

答案 5 :(得分:0)

你不想比较列表(它很慢),你应该比较集合。作为奖励,issubset已经定义:

title = 'python 3.5 is a programming language'

def contains_all_keywords(sentence, keywords):
  keywords = set(keywords.split())
  return(keywords.issubset(set(sentence.split())))

print(contains_all_keywords(title, 'python 3.5'))
# True
print(contains_all_keywords(title, '3.5 python'))
# True
print(contains_all_keywords(title, 'python 2.7'))
# False