首先,我们只是从SQL Server迁移到Mongodb。
我有一个包含字段TFN, Impressions
的集合。我需要在mongo中转换sql查询但是暂时陷入困境。
情景是我需要从集合中选择top 5 impressions
group by on the basis of tfns
Select Top 5 a.TFN, a.MaxImpression as MaxCount from (
Select TFN, Max(Impressions) MaxImpression
from tblData
Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate
Group by TFN
) a
这是Sql Server中的查询。我需要使用mongodb来实现相同的场景。到目前为止,我已经完成了mongo的聚合和组函数,但无法实现与sql相同的输出。
注意:我无法在MongoDb中将Max子句与Group by之间建立链接
这是我尝试过的实现
db.getCollection("_core.data").aggregate([
{
$match:
{
$and: [
{
"TFN":
{
$in: tfns
}
} ,
{
"TrendDate":
{
$gte : 20170421,
$lte: 20170421
}
}]
}
},
{
$group:
{
_id:"Impressions",
Impression: {
$max : "$Impressions"
}
}
}
])
其次尝试
db.getCollection("_core.adwordsPull.static").group({
key: { TFN: 1, Impressions: 1 },
cond: { TFN: {
$in: tfns
},
{
"TrendDate":
{
$gte : 20170421,
$lte: 20170421
}
} },
reduce: function( curr, result ) {
result.total += curr.Impression;
},
initial: { total : 0 }
})
这种方法有什么问题,我怎么能纠正它们呢?
修改1:示例数据
TFN Impression TrendDate
84251456 12 20170424
84251456 15 20170424
84251456 18 20170424
84251456 19 20170424
84251456 22 20170424
84251456 23 20170423
84251456 24 20170423
84251455 25 20170423
84251455 30 20170423
84251455 35 20170424
84251455 24 20170423
84251455 22 20170423
84251455 21 20170424
84251455 22 20170424
预期输出:
TFN MaxCount
84251456 22
84251455 35
答案 0 :(得分:2)
要获得所需的结果,首先要分解以子查询开头的SQL查询:
Select *
from tblData
Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate
等效的mongo查询如下:
db.getCollection("_core.data").aggregate([
{
"$match": {
"TFN": { "$in": tmpTFNList },
"TrendDate": {
"$gte": startDate,
"$lte": endDate
}
}
}
])
$group
等效于
Select TFN, Max(Impressions) MaxImpression
from tblData
Where TFN in (Select TFN From @tmpTFNList) and TrendDate between @StartDate AND @EndDate
Group by TFN
如下
db.getCollection("_core.data").aggregate([
{
"$match": {
"TFN": { "$in": tmpTFNList },
"TrendDate": {
"$gte": startDate,
"$lte": endDate
}
}
},
{
"$group": {
"_id": "$TFN",
"MaxImpression": { "$max": "$Impression" }
}
}
])
前5个查询
Select Top 5 a.TFN, a.MaxImpression as MaxCount from (
Select TFN, Max(Impressions) MaxImpression
from tblData
Where TFN in (Select TFN From @tmpTFNList)
and TrendDate between @StartDate AND @EndDate
Group by TFN
) a
可以使用 $limit
运算符,并通过 $project
阶段选择字段
db.getCollection("_core.data").aggregate([
{ /* WHERE TFN in list AND TrendDate between DATES */
"$match": {
"TFN": { "$in": tmpTFNList },
"TrendDate": {
"$gte": startDate,
"$lte": endDate
}
}
},
{ /* GROUP BY TFN */
"$group": {
"_id": "$TFN",
"MaxImpression": { "$max": "$Impression" }
}
},
{ "$limit": 5 }, /* TOP 5 */
{ /* SELECT a.MaxImpression as MaxCount */
"$project": {
"TFN": "$_id",
"_id": 0,
"MaxCount": "$MaxImpression"
}
}
])
要从此edit中的示例中获得所需结果,您需要在 $sort
之前添加额外的 $group
管道在哪里按TrendDate
和Impression
字段对文档进行排序,两者都按降序排列。
然后,您必须在 $first
管道阶段中使用 $group
累加器运算符才能获得最大的印象,因为您将拥有管道中有序的文档流。
考虑将修订后的聚合操作作为:
运行db.getCollection('collection').aggregate([
{
"$match": {
"TFN": { "$in": tmpTFNList },
"TrendDate": {
"$gte": startDate,
"$lte": endDate
}
}
},
{ "$sort": { "TrendDate": -1, "Impression": -1 } },
{
"$group": {
"_id": "$TFN",
"MaxImpression": { "$first": "$Impression" }
}
},
{ "$limit": 5 },
{
"$project": {
"TFN": "$_id",
"_id": 0,
"MaxCount": "$MaxImpression"
}
}
])
示例输出
/* 1 */
{
"TFN" : 84251456,
"MaxCount" : 22
}
/* 2 */
{
"TFN" : 84251455,
"MaxCount" : 35
}