拿这本词典:
{'local': {'count': 7,
'dining-and-nightlife': {'count': 1,
'bar-clubs': {'count': 1}
},
'activities-events': {'count': 6,
'outdoor-adventures': {'count': 4},
'life-skill-classes': {'count': 2}
}
}}
如何确定最相关的匹配(30%的回旋余地)?例如,活动事件的数量为6,因此6/7 = 85%,其儿童户外冒险的数量为4比6(66%)。因此,最相关的类别是户外冒险。
在这个例子中:
{'local': {'count': 11,
'dining-and-nightlife': {'count': 4,
'bar-clubs': {'count': 4}
},
'activities-events': {'count': 6,
'outdoor-adventures': {'count': 4},
'life-skill-classes': {'count': 2}
}
}}
与酒吧俱乐部(100%)和活动活动(54%)同时享受餐饮和夜生活(33%) 户外冒险(66%)。
我希望百分比截止值由
确定cutoff = 0.3
这里的想法是确定哪个类别最相关,删除较小的结果(低于30%)匹配。
@ F.J在下面回答了这个问题,但现在我希望更新树中的计数。
初始输出:
{'local': {'activities-events': {'count': 6,
'life-skill-classes': {'count': 2},
'outdoor-adventures': {'count': 4}},
'count': 11,
'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}}
发布输出:
{'local': {'activities-events': {'count': 6,
'life-skill-classes': {'count': 2},
'outdoor-adventures': {'count': 4}},
'count': 10,
'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}}
答案 0 :(得分:1)
以下内容应该有效,请注意这将修改输入字典:
def keep_most_relevant(d, cutoff=0.3):
for k, v in list(d.items()):
if k == 'count':
continue
if 'count' in d and v['count'] < d['count'] * cutoff:
del d[k]
else:
keep_most_relevant(v)
示例:
>>> d1 = {'local': {'count': 7, 'dining-and-nightlife': {'count': 1, 'bar-clubs': {'count': 1}}, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2}}}}
>>> keep_most_relevant(d1)
>>> pprint.pprint(d1)
{'local': {'activities-events': {'count': 6,
'life-skill-classes': {'count': 2},
'outdoor-adventures': {'count': 4}},
'count': 7}}
>>> d2 = {'local': {'count': 11, 'dining-and-nightlife': {'count': 4, 'bar-clubs': {'count': 4}}, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2}}}}
>>> keep_most_relevant(d2)
>>> pprint.pprint(d2)
{'local': {'activities-events': {'count': 6,
'life-skill-classes': {'count': 2},
'outdoor-adventures': {'count': 4}},
'count': 11,
'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}}
答案 1 :(得分:0)
def matches(match, cutoff):
total = float(match['count'])
for k in match:
if k == 'count':
continue
score = match[k]['count'] / total
if score >= cutoff:
yield (k, score)
m = list(matches(match[k], cutoff))
if m: yield max(m, key=lambda (c, s): s)
def best_matches(d, cutoff):
for k in d:
for m in matches(d[k], cutoff):
yield m
>>> d = {'local': {'count': 7,
'dining-and-nightlife': {'count': 1,
'bar-clubs': {'count': 1}
},
'activities-events': {'count': 6,
'outdoor-adventures': {'count': 4},
'life-skill-classes': {'count': 2}
}
}}
>>> print list(best_matches(d, 0.3))
[('activities-events', 0.8571428571428571), ('outdoor-adventures', 0.66666666666666663)]
>>> d = {'local': {'count': 11,
'dining-and-nightlife': {'count': 4,
'bar-clubs': {'count': 4}
},
'activities-events': {'count': 6,
'outdoor-adventures': {'count': 4},
'life-skill-classes': {'count': 2}
}
}}
>>> print list(best_matches(d, 0.3))
[('dining-and-nightlife', 0.36363636363636365), ('bar-clubs', 1.0), ('activities-events', 0.54545454545454541), ('outdoor-adventures', 0.66666666666666663)]