下面是我的mongodb 3.0查询,执行需要很长时间(4秒以上),数据集只有430万个文档:
db.getCollection('TestingCollection').aggregate([
{ $match: {
myDate: { $gte: new Date(949384052490) },
$and: [
{
myDate: { $lte: new Date(1448257684431) },
$and: [ { myId: 10 } ]
}
],
type: { $ne: "Contractor" }
}},
{ $project: {
retailerName: 1,
unitSold: 1,
year: { $year: [ "$myDate" ] },
currency: 1,
totalSales: { $multiply: [ "$unitSold", "$itemPrice" ] }
}},
{ $group: {
_id: {
retailerName: "$retailerName",
year: "$year",
currency: "$currency"
},
netSales: { $sum: "$revenue" },
netUnitSold: { $sum: "$unitSold" },
totalSales: { $sum:"$totalSales" }
}}
] )
复合索引字段:
(myDate : 1, retailerName:1, type:1, myId:1).
与
相同的查询type: { $eq: "Contractor" }
需要几毫秒才能执行。
请告诉我在哪里做错了。
答案 0 :(得分:2)
“范围选择”指定不正确,您对$and
的使用不正确。事实上,只考虑“最后”的论点,因此它只是在寻找“大于myId
等于10
”的日期,这当然不是正确。
以下是$match
的正确查询语法:
{ "$match": {
"myDate": {
"$gte": new Date(949384052490),
"$lte": new Date(1448257684431)
},
"myId": 10,
"type": { "$ne": "Contractor" }
}}
不需要任何$and
,因为所有MongoDB查询参数都已经是 AND 条件。
您还应该考虑合并$project
和$group
阶段,因为这通常意味着它们可以在它们一个接一个地出现时进行组合。至少它的效率更高。
但当然大部分时间都浪费在最初的$match
上,无论如何都会选择不正确的结果。
$group
和$project
的最佳渠道:
{ "$group": {
"_id": {
"retailerName": "$retailerName",
"year": { "$year": "$myDate" },
"currency": "$currency"
},
"netSales": { "$sum": "$revenue" },
"netUnitSold": { "$sum": "$unitSold" },
"totalSales": { "$sum":
{ "$multiply": [ "$unitSold", "$itemPrice" ] }
}
}}
所以整个管道现在只有$match
然后$group
。
如果您正在使用spring-mongo,那么受支持的运算符与复合键和累加器中的计算值的组合$group
存在当前限制,但您可以解决这些问题。关于$and
语句,这实际上是语法问题,而不是spring mongo的错误。
首先在聚合管道中为“组”设置自定义类:
public class CustomGroupOperation implements AggregationOperation {
private DBObject operation;
public CustomGroupOperation (DBObject operation) {
this.operation = operation;
}
@Override
public DBObject toDBObject(AggregationOperationContext context) {
return context.getMappedObject(operation);
}
}
然后使用该类构建管道:
Aggregation aggregation = newAggregation(
match(
Criteria.where("myDate")
.gte(new Date(new Long("949384052490")))
.lte(new Date(new Long("1448257684431")))
.and("myId").is(10)
.and("type").ne("Contractor")
),
new CustomGroupOperation(
new BasicDBObject(
"$group", new BasicDBObject(
"_id", new BasicDBObject(
"retailerName", "$retailerName"
).append(
"year", new BasicDBObject("$year", "$myDate")
).append(
"currency", "$currency"
)
).append(
"netSales", new BasicDBObject("$sum","$revenue")
).append(
"netUnitSold", new BasicDBObject("$sum","$unitSold")
).append(
"totalSales", new BasicDBObject(
"$multiply", Arrays.asList("$unitSold", "$itemPrice")
)
)
)
)
);
生成这样的序列化管道:
[
{ "$match" : {
"myDate" : {
"$gte" : { "$date" : "2000-02-01T05:47:32.490Z"},
"$lte" : { "$date" : "2015-11-23T05:48:04.431Z"}
},
"myId" : 10,
"type" : { "$ne" : "Contractor"}
}},
{ "$group": {
"_id" : {
"retailerName" : "$retailerName",
"year" : { "$year" : "$myDate"},
"currency" : "$currency"
},
"netSales" : { "$sum" : "$revenue"},
"netUnitSold" : { "$sum" : "$unitSold"},
"totalSales" : { "$multiply" : [ "$unitSold" , "$itemPrice"]}
}}
]
这与上面给出的例子完全相同