从组中删除最大差异

时间:2017-06-07 20:52:48

标签: mongodb mongodb-query aggregation-framework

如果我有一组对象,每个对象具有相同的描述,但具有不同的数量。

{

    {
    "_id": "101",
    "description": "DD from my employer1",
    "amount": 1000.33
    },
    {
    "_id": "102",
    "description": "DD from my employer1",
    "amount": 1000.34
    },
    {
    "_id": "103",
    "description": "DD from my employer1",
    "amount": 999.35
    },
    {
    "_id": "104",
    "description": "DD from my employer1"",
    "amount": 5000.00
    },
    {
    "_id": "105",
    "description": "DD from my employer2",
    "amount": 2000.01
    },
    {
    "_id": "106",
    "description": "DD from my employer2",
    "amount": 1999.33
    },
    {
    "_id": "107",
    "description": "DD from my employer2",
    "amount": 1999.33
    }

}

下面,我可以使用以下内容对它们进行分组:

{
{
    "$group": {
        "_id": {
            "$subtract": [
                {
                    "$trunc": "$amount"
                },
                {
                    "$mod": [
                        {
                            "$trunc": "$amount"
                        },
                        10
                    ]
                }
            ]
        },
        "results": {
            "$push": "$_id"
        }
    }
},
{
    "$redact": {
        "$cond": {
            "if": {
                "$gt": [
                    {
                        "$size": "$results"
                    },
                    1
                ]
            },
            "then": "$$KEEP",
            "else": "$$PRUNE"
        }
    }
},
{
    "$unwind": "$results"
},
{
    "$group": {
        "_id": null,
        "results": {
            "$push": "$results"
        }
    }
}
}

是否有办法包括组中的所有金额(_ids:101,102和103加105,106,107),即使它们有一个小的差异,但排除奖金金额,在上面的样本中为_id 104?

我正在寻找只有_ids的简单数组输出。

寻找以下结果:

{ "result": [ "101", "102", "103", "105", "106", "107" ] }

1 个答案:

答案 0 :(得分:0)

我认为这对实际数据来说有点主观,但如果它只是与“平均”付款的显着“正”差异,那么这是最适用的算法:

db.collection.aggregate([
  { "$group": {
    "_id": "$description",
    "avg": { "$avg": "$amount" },
    "docs": { "$push": { "_id": "$_id", "amount": "$amount" } }
  }},
  { "$addFields": {
    "docs": {
      "$filter": {
        "input": "$docs",
        "as": "doc",
        "cond": {
          "$gt": [ "$avg", { "$subtract": [ "$$doc.amount", "$avg" ] } ]
        }
      }
    }
  }},
  { "$unwind": "$docs" },
  { "$group": {
    "_id": null,
    "results": { "$push": "$docs._id" }
  }}
])

根据您提供的数据,这将排除"104"金额,因为金额与“雇主1”的平均金额之差大于平均值本身。这将是一个大的“向上”变化的情况。

与依赖于在分组文档中创建数组的所有“分组”方法一样,您需要在现实场景中小心不要破坏BSON限制。