Question

我有以下用户收藏

[{
    "_id": 1,
    "adds": ["111", "222", "333", "111"]
}, {
    "_id": 2,
    "adds": ["555", "666", "777", "555"]
}, {
    "_id": 3,
    "adds": ["888", "999", "000", "888"]
}]

我需要在adds数组中找到重复项

预期输出应为

[{
    "_id": 1,
    "adds": ["111"]
}, {
    "_id": 2,
    "adds": [ "555"]
}, {
    "_id": 3,
    "adds": ["888"]
}]

我尝试使用许多运算符$setUnion，$setDifference，但都没有成功。

请帮助！

Answer 1

您可以使用$range生成从1到n的数字数组，其中n是adds的{{3}}。然后，您可以“遍历”该数字，并检查adds处index（$size）处的index是否存在，如果是，则应将其视为重复项。您可以使用$arrayElemAt来检查数组中是否存在将0和index指定为搜索范围的元素。

然后，您只需要使用$project和$indexOfArray用实际元素替换索引。您也可以添加$map以避免在最终结果集中重复重复。

db.users.aggregate([
    {
        $addFields: {
            duplicates: {
                $filter: {
                    input: { $range: [ 1, { $size: "$adds" } ] },
                    as: "index",
                    cond: {
                        $ne: [ { $indexOfArray: [ "$adds", { $arrayElemAt: [ "$adds", "$$index" ]  }, 0, "$$index" ] }, -1 ]
                    }
                }
            }
        }
    },
    {
        $project: {
            _id: 1,
            adds: {
                $setUnion: [ { $map: { input: "$duplicates", as: "d", in: { $arrayElemAt: [ "$adds", "$$d" ] } } }, [] ]
            }
        }
    }
])

打印：

{ "_id" : 1, "adds" : [ "111" ] }
{ "_id" : 2, "adds" : [ "555" ] }
{ "_id" : 3, "adds" : [ "888" ] }

Answer 2

这是您可能要比较性能的另一个版本：

db.users.aggregate({
  $project:{
    "adds":{
      $reduce:{
        "input":{$range:[0,{$size:"$adds"}]}, // loop variable from 0 to max. index of $adds array
      //"input":{$range:[0,{$subtract:[{$size:"$adds"},1]}]}, // this would be enough but looks more complicated
        "initialValue":[],
        "in":{
            $let:{
              "vars":{
                "curr": { $arrayElemAt: [ "$adds", "$$this"] } // the element we're looking at
              },
              "in":{
                // if there is another identical element after the current one then we have a duplicate
                $cond:[
                  {$ne:[{$indexOfArray:["$adds","$$curr",{$add:["$$this",1]}]},-1]},
                  {$setUnion:["$$value",["$$curr"]]}, // combine duplicates found so far with new duplicate
                  "$$value" // continue with current value
                ]
              }
            }
        }
      }
    }
  }
})

该逻辑基于我们通过$range运算符获得的循环变量。该循环变量允许顺序访问adds数组。对于我们查看的每个项目，我们检查在当前索引之后是否还有另一个相同的项目。如果是，我们有一个副本，否则没有。

Answer 3

您可以尝试以下汇总。这个想法是收集不同的值并遍历值，并检查该值是否存在于adds数组中；如果存在，则保留该值，否则忽略该值。

db.users.aggregate({
  "$project":{
    "adds":{
      "$reduce":{
        "input":{"$setUnion":["$adds",[]]},
        "initialValue":[],
        "in":{
          "$concatArrays":[
            "$$value",
            {"$let":{
              "vars":{
                "match":{
                  "$filter":{"input":"$adds","as":"a","cond":{"$eq":["$$a","$$this"]}}
                }},
                "in":{
                  "$cond":[{"$gt":[{"$size":"$$match"},1]},["$$this"],[]]
                }
            }}
          ]
        }
      }
    }
  }
})

在没有$ unwind的情况下在数组中查找重复项

3 个答案: