稀疏矢量点积与mongo聚合

时间:2015-02-07 08:34:51

标签: mongodb cosine-similarity

我正在使用附加稀疏向量的文档,如下所示:

{
 "_id" : ObjectId
 "vec" : [ 
    {
        "dim" : 1,
        "weight" : 8
    }, 
    {
        "dim" : 3,
        "weight" : 3
    }
  ]
}

我正在尝试使用相同格式的输入向量和集合中的所有文档来获取规范化的点积。我可以通过这个非常麻烦的聚合查询来完成它,但我想知道是否有更有效的方法。

[
  {$unwind: "$vec"},
  {$project: {
    squareWeight: {multiply: ["$vec.weight","$vec.weight"]}, //for the norm
    dim: "$vec.dim",
    weight: "$vec.weight"
    inputVec: {$literal:[{dim:2,weight: 5},{dim:5, weight:2}]} //input vector
  }},
  {$project: {
    dim: 1,
    squareWeight: 1,
    scores: {
      $map: { //multiplying each input element with the vector weight
        input: "$inputVec"
        as: "input"
        in: {$cond: [
          {$eq: ["$$input.dim","$dim"]},
          {$multiply: ["$$input.weight", "$weight"]},
          0
        ]}  //in
      }  //map
    }  //scores
  }},  //project
  {$unwind: "$scores"},
  {$project: {
    scores :1,
    squareWeight: {
      $cond: [{$eq: ["scores,0"]},0,"$squareWeight"]] //to avoid multiple counting
    }
  }},
  {$group: {
    _id: "$_id",
    score: {$sum: "$scores"},
    squareSum: {$sum: "$squareWeight"}
  }}
]

我现在可以通过score/(sqrt(squareSum) * ||inputVec||)

计算归一化结果

这感觉不是最有效的方式所以我正在寻找改进。

感谢。

0 个答案:

没有答案