我有一个这样的集合,但有更多的数据。
{
_id: ObjectId("db759d014f70743495ef1000"),
tracked_item_origin: "winword",
tracked_item_type: "Software",
machine_user: "mmm.mmm",
organization_id: ObjectId("a91864df4f7074b33b020000"),
group_id: ObjectId("20ea74df4f7074b33b520000"),
tracked_item_id: ObjectId("1a050df94f70748419140000"),
tracked_item_name: "Word",
duration: 9540,
}
{
_id: ObjectId("2b769d014f70743495fa1000"),
tracked_item_origin: "http://www.facebook.com",
tracked_item_type: "Site",
machine_user: "gabriel.mello",
organization_id: ObjectId("a91864df4f7074b33b020000"),
group_id: ObjectId("3f6a64df4f7074b33b040000"),
tracked_item_id: ObjectId("6f3466df4f7074b33b080000"),
tracked_item_name: "Facebook",
duration: 7920,
}
我做了一个聚合,请返回这样的分组数据:
{"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"Twitter"}, "duration"=>288540},
{"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"ANoticia"}, "duration"=>237300},
{"_id"=>{"tracked_item_type"=>"Site", "tracked_item_name"=>"Facebook"}, "duration"=>203460},
{"_id"=>{"tracked_item_type"=>"Software", "tracked_item_name"=>"Word"}, "duration"=>269760},
{"_id"=>{"tracked_item_type"=>"Software", "tracked_item_name"=>"Excel"}, "duration"=>204240}
简单聚合代码:
AgentCollector.collection.aggregate(
{'$match' => {group_id: '20ea74df4f7074b33b520000'}},
{'$group' => {
_id: {tracked_item_type: '$tracked_item_type', tracked_item_name: '$tracked_item_name'},
duration: {'$sum' => '$duration'}
}},
{'$sort' => {
'_id.tracked_item_type' => 1,
duration: -1
}}
)
有一种方法可以通过tracked_item_type
键限制只有2个项目吗?防爆。 2个站点和2个软件。
答案 0 :(得分:2)
由于你的问题目前尚不清楚,我真的希望你的意思是你要指定两个Site
键和2个Software
键,因为这是一个很好而简单的答案,你可以添加到你的$匹配阶段如:
{$match: {
group_id: "20ea74df4f7074b33b520000",
tracked_item_name: {$in: ['Twitter', 'Facebook', 'Word', 'Excel' ] }
}},
我们都欢呼和快乐;)
但是,如果您的问题更加恶魔般,例如,根据持续时间从结果中获得前2个Sites
和Software
条目,那么我们非常感谢您产生憎恶< /强>
您的里程数可能因您实际想做的事情而有所不同,或者您的结果是否会因为大小而爆炸。但这是作为你所处的目的的一个例子:
db.collection.aggregate([
// Match items first to reduce the set
{$match: {group_id: "20ea74df4f7074b33b520000" }},
// Group on the types and "sum" of duration
{$group: {
_id: {
tracked_item_type: "$tracked_item_type",
tracked_item_name: "$tracked_item_name"
},
duration: {$sum: "$duration"}
}},
// Sort by type and duration descending
{$sort: { "_id.tracked_item_type": 1, duration: -1 }},
/* The fun part */
// Re-shape results to "sites" and "software" arrays
{$group: {
_id: null,
sites: {$push:
{$cond: [
{$eq: ["$_id.tracked_item_type", "Site" ]},
{ _id: "$_id", duration: "$duration" },
null
]}
},
software: {$push:
{$cond: [
{$eq: ["$_id.tracked_item_type", "Software" ]},
{ _id: "$_id", duration: "$duration" },
null
]}
}
}},
// Remove the null values for "software"
{$unwind: "$software"},
{$match: { software: {$ne: null} }},
{$group: {
_id: "$_id",
software: {$push: "$software"},
sites: {$first: "$sites"}
}},
// Remove the null values for "sites"
{$unwind: "$sites"},
{$match: { sites: {$ne: null} }},
{$group: {
_id: "$_id",
software: {$first: "$software"},
sites: {$push: "$sites"}
}},
// Project out software and limit to the *top* 2 results
{$unwind: "$software"},
{$project: {
_id: 0,
_id: { _id: "$software._id", duration: "$software.duration" },
sites: "$sites"
}},
{$limit : 2},
// Project sites, grouping multiple software per key, requires a sort
// then limit the *top* 2 results
{$unwind: "$sites"},
{$group: {
_id: { _id: "$sites._id", duration: "$sites.duration" },
software: {$push: "$_id" }
}},
{$sort: { "_id.duration": -1 }},
{$limit: 2}
])
现在导致的结果是* 并不完全那些理想的结果,但是它可以以编程方式使用,并且比在循环中过滤先前的结果更好。 (我的测试数据)
{
"result" : [
{
"_id" : {
"_id" : {
"tracked_item_type" : "Site",
"tracked_item_name" : "Digital Blasphemy"
},
"duration" : 8000
},
"software" : [
{
"_id" : {
"tracked_item_type" : "Software",
"tracked_item_name" : "Word"
},
"duration" : 9540
},
{
"_id" : {
"tracked_item_type" : "Software",
"tracked_item_name" : "Notepad"
},
"duration" : 4000
}
]
},
{
"_id" : {
"_id" : {
"tracked_item_type" : "Site",
"tracked_item_name" : "Facebook"
},
"duration" : 7920
},
"software" : [
{
"_id" : {
"tracked_item_type" : "Software",
"tracked_item_name" : "Word"
},
"duration" : 9540
},
{
"_id" : {
"tracked_item_type" : "Software",
"tracked_item_name" : "Notepad"
},
"duration" : 4000
}
]
}
],
"ok" : 1
}
所以你看到你得到了数组中的前2 Sites
,其中每个都嵌入了前2个Software
项。聚合本身,无法进一步明确这一点,因为我们需要重新合并我们拆分的项目才能做到这一点,而且还没有我们可以用来执行此操作的运算符动作。
但这很有趣。它不是全部完成的方式,但大多数的方式,并将其变成4文档响应将是相对简单的代码。但是我的头已经疼了。