我想根据存储在字典列表中的标签来匹配django数据库中的类似文章:
myarticle = {'pk': 17, 'tags': [0, 1, 0, 1, 0]}
allarticles = [{'pk': 1, 'tags': [0, 0, 0, 1, 0]},
{'pk': 2, 'tags': [0, 1, 0, 1, 0]},
{'pk': 3, 'tags': [1, 1, 0, 0, 0]},
{'pk': 4, 'tags': [1, 0, 1, 0, 1]},
{'pk': 5, 'tags': [0, 0, 0, 0, 1]}]
最方便的方法是获取列表,该列表根据输入的Myarticle对匹配标签的数量进行排名。预期结果:
result = [2, 1, 3, 5, 4]
答案 0 :(得分:6)
您可以使用sorted
:
myarticle = {'pk': 17, 'tags': [0, 1, 0, 1, 0]}
allarticles = [{'pk': 1, 'tags': [0, 0, 0, 1, 0]},
{'pk': 2, 'tags': [0, 1, 0, 1, 0]},
{'pk': 3, 'tags': [1, 1, 0, 0, 0]},
{'pk': 4, 'tags': [1, 0, 1, 0, 1]},
{'pk': 5, 'tags': [0, 0, 0, 0, 1]}]
new_articles = sorted(allarticles, key=lambda x:sum(a == b for a, b in zip(myarticle['tags'], x['tags'])), reverse=True)
final_results = [i['pk'] for i in new_articles]
输出:
[2, 1, 3, 5, 4]
答案 1 :(得分:2)
您可以通过numpy.argsort
将第三方NumPy用于矢量化解决方案。
对于更大的输入列表,这应该更有效:
allarticles = allarticles*10000
import numpy as np
def jp(myarticle, allarticles):
arr = np.argsort((np.array([d['tags'] for d in allarticles]) == myarticle['tags']).sum(1))[::-1]
return [allarticles[i]['pk'] for i in arr]
def ajax(myarticle, allarticles):
new_articles = sorted(allarticles, key=lambda x:sum(a == b for a, b in zip(myarticle['tags'], x['tags'])), reverse=True)
return [i['pk'] for i in new_articles]
%timeit jp(myarticle, allarticles) # 49.3 ms per loop
%timeit ajax(myarticle, allarticles) # 112 ms per loop