在我的Rails 3.2项目中,我使用MongoDB(Mongoid)使用map / reduce对某些结果进行分组,例如:
def count_and_group_by(context)
raise "No #{context} attribute" unless %w(action browser country).include? context
map = %Q{
function() {
key = this.#{context};
value = {count: 1};
emit(key, value);
}
}
reduce = %Q{
function(key, values) {
var reducedValue = {count: 0};
values.forEach(function(value) {
reducedValue.count += value.count;
});
return reducedValue;
}
}
map_reduce = self.map_reduce(map, reduce).out(inline: true)
Hash[map_reduce.map {|v| [v["_id"],v["value"]["count"].to_i]}]
end
一旦我使用MyClass.count_and_group_by("action")
之类的方法,我会得到以下格式的结果:
{"change_password"=>31, "invalid_ip"=>32, "login_failure"=>74, "login_success"=>63, "logout"=>34}
现在我通常做的是尝试根据属性对结果进行分组,比如找到基于 action 属性,浏览器和 city 的结果属性,我会分别为每个属性执行新的调用,例如:MyClass.count_and_group_by("action")
,MyClass.count_and_group_by("browser")
,MyClass.count_and_group_by("city")
。
有没有一次发出多个键,所以我可以立即对结果进行分组,得到如下结果:
{"action" => {
"change_password"=>31,
"invalid_ip"=>32,
"login_failure"=>74,
"login_success"=>63,
"logout"=>34},
"browser" => {}
"city" => {}}
任何帮助都将受到高度赞赏。
干杯
答案 0 :(得分:3)
通常应该可以,但实际上对于这种类型的操作,您将获得更多使用聚合框架的性能。目前还没有"聚合"使用Mongoid定义的类的方法,但有一个.collection
访问器,它公开底层的驱动程序对象。所以你可以从这里打电话给.aggregate()
:
result = this.collection.aggregate([
# Include each field and an array for "type" in all documents
{ "$project" => {
"action" => 1,
"browser" => 1,
"country" => 1,
"type" => { "$const" => [ "action", "browser", "country" ] },
}},
# Unwind that "type" array
{ "$unwind" => "$type" },
# Group by "type" and the values of each field which matches
{ "$group" => {
"_id" => {
"type" => "$type",
"value" => {
"$cond" => [
{ "$eq" => [ "$type", "action" ] },
"$action",
{ "$cond" => [
{ "$eq" => [ "$type", "browser" ] },
"$browser",
"$country"
]}
]
}
},
"count" => { "$sum" => 1 }
}},
# Just in case all fields were not present in all documents
{ "$match" => { "_id.value" => { "$ne" => null } } },
# Group to a single document with each "type" as the keys
{ "$group" => {
"_id" => null,
"action" => {
"$addToSet" => {
"$cond" => [
{ "$eq" => [ "$_id.type", "action" ] },
{ "value" => "$_id.value", "count": "$count" },
null
]
}
},
"browser" => {
"$addToSet" => {
"$cond" => [
{ "$eq" => [ "$_id.type", "browser" ] },
{ "value" => "$_id.value", "count": "$count" },
null
]
}
},
"country" => {
"$addToSet" => {
"$cond" => [
{ "$eq" => [ "$_id.type", "country" ] },
{ "value" => "$_id.value", "count": "$count" },
null
]
}
}
}},
# Filter out any null values from the conditional allocation
{ "$project" => {
"action" => { "$setDifference" => [ "$action", [null] ] },
"browser" => { "$setDifference" => [ "$browser", [null] ] },
"country" => { "$setDifference" => [ "$country", [null] ] }
}}
])
使用较新的MongoDB 2.6引入的$setDifference
运算符,以便从结果数组中过滤掉任何空值。同样的事情可以用以前的版本完成,对处理的影响很小,只需要更多的步骤:
result = this.collection.aggregate([
# Include each field and an array for "type" in all documents
{ "$project" => {
"action" => 1,
"browser" => 1,
"country" => 1,
"type" => { "$const" => [ "action", "browser", "country" ] },
}},
# Unwind that "type" array
{ "$unwind" => "$type" },
# Group by "type" and the values of each field which matches
{ "$group" => {
"_id" => {
"type" => "$type",
"value" => {
"$cond" => [
{ "$eq" => [ "$type", "action" ] },
"$action",
{ "$cond" => [
{ "$eq" => [ "$type", "browser" ] },
"$browser",
"$country"
]}
]
}
},
"count" => { "$sum" => 1 }
}},
# Just in case all fields were not present in all documents
{ "$match" => { "_id.value" => { "$ne" => null } } },
# Group to a single document with each "type" as the keys
{ "$group" => {
"_id" => null,
"action" => {
"$addToSet" => {
"$cond" => [
{ "$eq" => [ "$_id.type", "action" ] },
{ "value" => "$_id.value", "count": "$count" },
null
]
}
},
"browser" => {
"$addToSet" => {
"$cond" => [
{ "$eq" => [ "$_id.type", "browser" ] },
{ "value" => "$_id.value", "count": "$count" },
null
]
}
},
"country" => {
"$addToSet" => {
"$cond" => [
{ "$eq" => [ "$_id.type", "country" ] },
{ "value" => "$_id.value", "count": "$count" },
null
]
}
}
}},
# Filter out any null values from the conditional allocation
{ "$unwind": "$country" },
{ "$match": { "country": { "$ne": null } } },
{ "$group": {
"_id": "$_id",
"action": { "$first": "$action" },
"browser": { "$first": "$browser" },
"country": { "$push": "$country" }
}},
{ "$unwind": "$browser" },
{ "$match": { "browser": { "$ne": null } } },
{ "$group": {
"_id": "$_id",
"action": { "$first": "$action" },
"browser": { "$push": "$browser" },
"country": { "$first": "$country" }
}},
{ "$unwind": "$action" },
{ "$match": { "action": { "$ne": null } } },
{ "$group": {
"_id": "$_id",
"action": { "$push": "$action" },
"browser": { "$first": "$browser" },
"country": { "$first": "$country" }
}}
])
输出与键/值形式略有不同,但可以轻松地将其操作为与您目前正在进行的后处理相同的处理。所以输入如下:
{ "action" : "change_password", "browser" : "ie", "country" : "US" }
{ "action" : "change_password", "browser" : "ie", "country" : "UK" }
{ "action" : "change_password", "browser" : "chrome", "country" : "AU" }
获得的结果如下:
{
"_id" : null,
"action" : [
{
"value" : "change_password",
"count" : 3
}
],
"browser" : [
{
"value" : "ie",
"count" : 2
},
{
"value" : "chrome",
"count" : 1
}
],
"country" : [
{
"value" : "US",
"count" : 1
},
{
"value" : "UK",
"count" : 1
},
{
"value" : "AU",
"count" : 1
}
]
}
所以你对mapReduce的输出有一些区别,但是再次mapReduce的输出也是"不完全是"无论如何你想要输出格式。在本机代码中实现,聚合框架运行得更快