从基本文档集中删除重复的文档以进行汇总

时间:2018-12-19 03:00:44

标签: mongodb

下面的脚本会生成一条记录,这是由于我设置的FIRST $ group遇到了问题,该问题旨在选择文档的$ first版本(根据该组中列出的条件)。

我认为这是在选择$ first公司和连接以及第一个transaction_date,这会导致6个符合该条件的文档,然后按月进行$ unwind,$ match和$ group到$ sum的值

如果我尝试在任何字段上删除一些$ first,它会抱怨该值不是累积值。

如何让第一组简单地查看按排序处理的文档,并排除_id.company,_id.connection和_id.transaction_date相同的文档的较早重复版本?

我相信我的方法是正确的,但是我犯了一个小学生错误。

db.getCollection("9SP_Data").aggregate([

// find documents of a type and within a number range
{"$match" : {"_id.object_category" : "revenue-transaction"
        ,"_id.transaction_date": {
            $gte: 20160101000000,
            $lt: 20170101000000
            },
}},


// sort into order so that if duplicates, the new document listed at top
{$sort : { "_id.connection":1,
       "_id.company":1,
       "_id.transaction_reference":1,
       "_id.transaction_date":-1,
       "object_creation_date": -1 }},

{$group : { _id: "$_id.transaction_reference",  
            "company" : {$first : "$_id.company"},
            "connection" : {$first : "$_id.connection"}, 
            "transaction_date" : {$first : "$_id.transaction_date"}, 
            "object_category" : {$first : "$_id.object_category"}, 
            "object_origin_category" : {$first : "$_id.object_origin_category"}, 
            "object_origin_type" : {$first : "$_id.object_origin_type"}, 
            "object_origin" : {$first : "_id.$object_origin"}, 
            "transaction_status" : {$first: "$_id.transaction_status"}, 
            "line_items" : {$first: "$line_items"}}},

    {"$unwind" :  "$line_items"},
    {"$match"  :  {"line_items.item_category":"sales-revenue"}},
    {"$group" : {
       "_id":
           {
            "company" : "$connection",
            "sum_by_date":  {$trunc:{ $divide: ["$transaction_date", 100000000 ]}},
            //  10000000000 - by year
            //  100000000 - by month 
            //  1000000 - by date 
            //  10000 - by hour 
            //  100 - by minute 
            "category" : "$line_items.item_category",
            "origin_category" : "$object_origin_category",
            "object_origin_type" : "$object_origin_type",
            "object_origin" : "$object_origin"
           },
        "metric_value"  : { $sum: "$line_items.item_net_total_value" },

        // count number of documents (I think this is counting line_items but I need number of distinct documents by _id.transaction_reference)
        "metric_volume":{$sum:1}}
},

// format the output to include the following values
{$project : {
    "_id.company"               : "$_id.company",
    "_id.metric_name"           : {$literal : "revenue"},
    "_id.metric_category"       : {$literal : "sales"},
    "_id.metric_type"           : {$literal : "month"},
    "_id.metric_lookup"         : "$_id.sum_by_date",
    "_id.object_origin_category": "$origin_category",
    "_id.object_origin_type"    : "$object_origin_type",
    "_id.object_origin"         : "$object_origin",
    "metric_value"              : "$metric_value",
    "metric_volume"             : "$metric_volume"
    }}
])

0 个答案:

没有答案