Mongodb中的Max和group by

时间:2017-04-24 10:02:24

标签: sql-server mongodb mongodb-query aggregation-framework

首先,我们只是从SQL Server迁移到Mongodb。 我有一个包含字段TFN, Impressions的集合。我需要在mongo中转换sql查询但是暂时陷入困境。

情景是我需要从集合中选择top 5 impressions

group by on the basis of tfns
Select Top 5 a.TFN, a.MaxImpression as MaxCount from ( 
  Select TFN, Max(Impressions) MaxImpression 
  from tblData 
  Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate
  Group by TFN 
  ) a

这是Sql Server中的查询。我需要使用mongodb来实现相同的场景。到目前为止,我已经完成了mongo的聚合和组函数,但无法实现与sql相同的输出。

注意:我无法在MongoDb中将Max子句与Group by之间建立链接

这是我尝试过的实现

db.getCollection("_core.data").aggregate([
       { 
           $match: 
           {
               $and: [
                   {
                       "TFN": 
                       {
                           $in: tfns 

                       }

                   } ,
                   { 
                       "TrendDate": 
                       {
                           $gte : 20170421,
                           $lte: 20170421

                       }
                   }]
           }
        }, 
        {
            $group: 
            {
               _id:"Impressions", 
               Impression: {
                   $max : "$Impressions"
               }
            }  
        }
    ])

其次尝试

db.getCollection("_core.adwordsPull.static").group({
    key: { TFN: 1,  Impressions: 1 },
    cond: { TFN:  {
                               $in: tfns 

                           },
                       { 
                           "TrendDate": 
                           {
                               $gte : 20170421,
                               $lte: 20170421

                           }
                       } },
    reduce: function( curr, result ) {

                result.total += curr.Impression;
             },
    initial: { total : 0 }
})

这种方法有什么问题,我怎么能纠正它们呢?

修改1:示例数据

TFN Impression  TrendDate
84251456    12  20170424
84251456    15  20170424
84251456    18  20170424
84251456    19  20170424
84251456    22  20170424
84251456    23  20170423
84251456    24  20170423

84251455    25  20170423
84251455    30  20170423
84251455    35  20170424
84251455    24  20170423
84251455    22  20170423
84251455    21  20170424
84251455    22  20170424

预期输出:

TFN  MaxCount
84251456    22
84251455    35

1 个答案:

答案 0 :(得分:2)

要获得所需的结果,首先要分解以子查询开头的SQL查询:

Select *
from tblData 
Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate

等效的mongo查询如下:

db.getCollection("_core.data").aggregate([
    {
        "$match": {
            "TFN": { "$in": tmpTFNList },
            "TrendDate": {
                "$gte": startDate,
                "$lte": endDate
            }
        }
    }
])

$group 等效于

Select TFN, Max(Impressions) MaxImpression 
from tblData 
Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate
Group by TFN 

如下

db.getCollection("_core.data").aggregate([
    {
        "$match": {
            "TFN": { "$in": tmpTFNList },
            "TrendDate": {
                "$gte": startDate,
                "$lte": endDate
            }
        }
    },
    {
        "$group": {
            "_id": "$TFN",
            "MaxImpression": { "$max": "$Impression" }
        }
    }
])

前5个查询

Select Top 5 a.TFN, a.MaxImpression as MaxCount from ( 
    Select TFN, Max(Impressions) MaxImpression 
    from tblData 
    Where TFN in (Select TFN From @tmpTFNList) 
        and TrendDate between @StartDate AND @EndDate
    Group by TFN 
) a

可以使用 $limit 运算符,并通过 $project 阶段选择字段

db.getCollection("_core.data").aggregate([
    { /* WHERE TFN in list AND TrendDate between DATES */
        "$match": {
            "TFN": { "$in": tmpTFNList },
            "TrendDate": {
                "$gte": startDate,
                "$lte": endDate
            }
        }
    },
    { /* GROUP BY TFN */
        "$group": {
            "_id": "$TFN",
            "MaxImpression": { "$max": "$Impression" }
        }
    },
    { "$limit": 5 }, /* TOP 5 */
    { /* SELECT a.MaxImpression as MaxCount */
        "$project": {
            "TFN": "$_id",
            "_id": 0,
            "MaxCount": "$MaxImpression"
        }
    }
])

更新

要从此edit中的示例中获得所需结果,您需要在 $sort 之前添加额外的 $group 管道在哪里按TrendDateImpression字段对文档进行排序,两者都按降序排列。

然后,您必须在 $first 管道阶段中使用 $group 累加器运算符才能获得最大的印象,因为您将拥有管道中有序的文档流。

考虑将修订后的聚合操作作为:

运行
db.getCollection('collection').aggregate([
    { 
        "$match": {
            "TFN": { "$in": tmpTFNList },
            "TrendDate": {
                "$gte": startDate,
                "$lte": endDate
            }
        }
    },
    { "$sort": { "TrendDate": -1, "Impression": -1 } },
    {  
        "$group": {
            "_id": "$TFN",
            "MaxImpression": { "$first": "$Impression" }
        }
    },
    { "$limit": 5 }, 
    {   
        "$project": {
            "TFN": "$_id",
            "_id": 0,
            "MaxCount": "$MaxImpression"
        }
    }
])

示例输出

/* 1 */
{
    "TFN" : 84251456,
    "MaxCount" : 22
}

/* 2 */
{
    "TFN" : 84251455,
    "MaxCount" : 35
}