Question

我有一个字典，键和值为元组，其中键是（queryID，句子），值是（score，documentID）（第一项是数字，第二项是字符串，在键和值元组中）。

d={(1,'bla bla'):(10,'doc1'),(1,'yada yada'):(20,'doc2'),(2,'bla bla'):(30,'doc1'),(2,'more of the same'):(40,'doc3')}

我已将此dict按查询ID分组并按分数排序，因此对于每个查询ID，我都有按分数排序的项目。

我想要做的是为每个查询ID获取已排序的dict中的前k个项目。所以，如果我有100个项目的查询ID = 1，并且相同的qID = 2，我想为他们每个人得到排序的字典中的前k项。怎么可以这样做？

这是我的代码的一部分 - 获取已排序的字典 -

sorted_dict=collections.OrderedDict(sorted(sen_dict.items(), key= lambda x: (-int(x[0][0]),x[1][0]),reverse=True)

Answer 1

您可以循环遍历字典并附加结果数组。如果qID线性增加1，我想这样的事情应该有用。

results=[]
i = 1

for key in d:
    if key[0]==i:
        currentResult=d[key]
    else:
        results.append(currentResult)
        currentScore=0
        i+=1
results.append(currentResult)

这仅适用于总是只有一个得分最高的项目，但它可以轻松附加到同一分数的多个项目的工作中。

results=[]
i = 1
currentResults=[]
currentScore = 0   

for key in d:
    if key[0]==i:
        if currentScore == d[key][0]:
            currentResults.append(d[key])
        elif currentScore < d[key][0]:
            currentResults = [d[key]]
            currentScore = d[key][0]
    else:
        results.append(currentResult)
        i+=1
results.append(currentResult)

我想这样的事情应该有用。

Answer 2

这使用您的sorted_dict变量来获得与每个查询ID相关的前K个最高分数。

k = 2 #assign how many top values you want to see
id = 1 #assign queryID 
topK = [val for key,val in sorted_dict.items() if key[0] == id][0:k]
print topK

使用键作为元组，为每个子键获取排序字典中的顶级项

2 个答案: