我正在使用附加稀疏向量的文档,如下所示:
{
"_id" : ObjectId
"vec" : [
{
"dim" : 1,
"weight" : 8
},
{
"dim" : 3,
"weight" : 3
}
]
}
我正在尝试使用相同格式的输入向量和集合中的所有文档来获取规范化的点积。我可以通过这个非常麻烦的聚合查询来完成它,但我想知道是否有更有效的方法。
[
{$unwind: "$vec"},
{$project: {
squareWeight: {multiply: ["$vec.weight","$vec.weight"]}, //for the norm
dim: "$vec.dim",
weight: "$vec.weight"
inputVec: {$literal:[{dim:2,weight: 5},{dim:5, weight:2}]} //input vector
}},
{$project: {
dim: 1,
squareWeight: 1,
scores: {
$map: { //multiplying each input element with the vector weight
input: "$inputVec"
as: "input"
in: {$cond: [
{$eq: ["$$input.dim","$dim"]},
{$multiply: ["$$input.weight", "$weight"]},
0
]} //in
} //map
} //scores
}}, //project
{$unwind: "$scores"},
{$project: {
scores :1,
squareWeight: {
$cond: [{$eq: ["scores,0"]},0,"$squareWeight"]] //to avoid multiple counting
}
}},
{$group: {
_id: "$_id",
score: {$sum: "$scores"},
squareSum: {$sum: "$squareWeight"}
}}
]
我现在可以通过score/(sqrt(squareSum) * ||inputVec||)
这感觉不是最有效的方式所以我正在寻找改进。
感谢。