Question

在mongo 2.6文档中，请参见下面的几个

nms:PRIMARY> db.checkpointstest4.find()
{ "_id" : 1, "cpu" : [ 100, 20, 60 ], "hostname" : "host1" }
{ "_id" : 2, "cpu" : [ 40, 30, 80 ], "hostname" : "host1" }

我需要找到每个主机IE的平均cpu（每个cpu数组索引）基于上面两个，host1的平均值为[70,25,70]，因为cpu[0]是100+40=70等

当我有3个数组元素而不是两个数组元素时，我迷失了，请参阅mongodb aggregate average of array elements

最后，下面对我有用：

var map = function () {
    for (var idx = 0; idx < this.cpu.length; idx++) {
        var mapped = {
            idx: idx,
            val: this.cpu[idx]
        };
        emit(this.hostname, {"cpu": mapped});
    }
};

var reduce = function (key, values) {

    var cpu = []; var sum = [0,0,0]; cnt = [0,0,0];
    values.forEach(function (value) {        
        sum[value.cpu.idx] += value.cpu.val;
        cnt[value.cpu.idx] +=1;       
        cpu[value.cpu.idx] = sum[value.cpu.idx]/cnt[value.cpu.idx]
    });   
    return {"cpu": cpu};
};

db.checkpointstest4.mapReduce(map, reduce, {out: "checkpointstest4_result"});

Answer 1

在显示db.test.aggregate( {$unwind: {path:"$cpu", includeArrayIndex:"index"}}, {$group: {_id:{h:"$hostname",i:"$index"}, cpu:{$avg:"$cpu"}}}, {$sort:{"_id.i":1}}, {$group:{_id:"$_id.h", cpu:{$push:"$cpu"}}} ) // Make a row for each array element with an index field added. {$unwind: {path:"$cpu", includeArrayIndex:"index"}}, // Group by hostname+index, calculate average for each group. {$group: {_id:{h:"$hostname",i:"$index"}, cpu:{$avg:"$cpu"}}}, // Sort by index (to get the array in the next step sorted correctly) {$sort:{"_id.i":1}}, // Group by host, pushing the averages into an array in order. {$group:{_id:"$_id.h", cpu:{$push:"$cpu"}}}的MongoDB 3.2中，您可以执行此操作;

.panel-body ul { text-align: center; }

Answer 2

从MongoDB 3.2开始，includeArrayIndex可用$unwind提及升级将是您的最佳选择。

如果你不能这样做，那么你总是可以用mapReduce处理：

db.checkpointstest4.mapReduce(
    function() {
        var mapped = this.cpu.map(function(val) {
            return { "val": val, "cnt": 1 };
        });
        emit(this.hostname,{ "cpu": mapped });
    },
    function(key,values) {
        var cpu = [];

        values.forEach(function(value) {
            value.cpu.forEach(function(item,idx) {
                if ( cpu[idx] == undefined )
                    cpu[idx] = { "val": 0, "cnt": 0 };
                cpu[idx].val += item.val;
                cpu[idx].cnt += item.cnt
            });
        });
        return { "cpu": cpu };
    },
    {
        "out": { "inline": 1 },
        "finalize": function(key,value) {
            return { 
                "cpu": value.cpu.map(function(cpu) {
                    return cpu.val / cpu.cnt;
                 })
            };
        }
    }
)

所以＆＃34;映射器中的步骤是＆＃34;函数将数组内容转换为包含＆＃34;值＆＃34;的对象数组;从元素和＆＃34;计数＆＃34;供以后参考作为＆＃34;减少＆＃34;功能。您需要这与减速器如何使用它一致，并且必须获得获得平均值所需的总体计数。

＆＃34; reducer＆＃34;本身你基本上为＆＃34;值＆＃34;的每个位置的数组内容求和。和＃34;计数＆＃34;。这很重要，因为＆＃34;减少＆＃34;在整个约简过程中可以多次调用函数，将其输出作为＆＃34;输入＆＃34;在随后的电话中。这就是为什么mapper和reducer都以这种格式工作的原因。

通过最终减少的结果，调用finalize函数来简单地查看每个求和的值＃34;和＆＃34;计数＆＃34;并除以计数以返回平均值。

里程可能会因现代聚合管道处理或实际上这个mapReduce进程是否会发挥最佳效果而有所不同，主要取决于数据。以规定的方式使用$unwind肯定会增加要分析的文档数量，从而产生开销。相反，虽然JavaScript处理与聚合框架中的本机运算符相比通常会更慢，但是这里的文档处理开销会减少，因为这样可以保留数组。

如果升级到3.2不是一个选项，那么我给出的建议是使用这个，但即使是一个选项，然后至少对你的数据和预期增长进行基准测试，看看哪个最适合你。

返回

{
        "results" : [
                {
                        "_id" : "host1",
                        "value" : {
                                "cpu" : [
                                        70,
                                        25,
                                        70
                                ]
                        }
                }
        ],
        "timeMillis" : 38,
        "counts" : {
                "input" : 2,
                "emit" : 2,
                "reduce" : 1,
                "output" : 1
        },
        "ok" : 1
}

如何在大小超过2的mongo数组中找到聚合？

2 个答案: