Question

我试图使用Mongoose计算我的集合中数组中字符串的出现次数。我的＆＃34;架构＆＃34;看起来像这样：

var ThingSchema = new Schema({
  tokens: [ String ]
});

我的目标是获得前10名＆＃34;代币＆＃34;在＆＃34; Thing＆＃34;集合，每个文档可以包含多个值。例如：

var documentOne = {
    _id: ObjectId('50ff1299a6177ef9160007fa')
  , tokens: [ 'foo' ]
}

var documentTwo = {
    _id: ObjectId('50ff1299a6177ef9160007fb')
  , tokens: [ 'foo', 'bar' ]
}

var documentThree = {
    _id: ObjectId('50ff1299a6177ef9160007fc')
  , tokens: [ 'foo', 'bar', 'baz' ]
}

var documentFour = {
    _id: ObjectId('50ff1299a6177ef9160007fd')
  , tokens: [ 'foo', 'baz' ]
}

...会给我数据结果：

[ foo: 4, bar: 2 baz: 2 ]

我考虑将MapReduce和Aggregate用于此工具，但我不确定什么是最佳选择。

Answer 1

啊哈，我找到了解决办法。 MongoDB的aggregate框架允许我们对集合执行一系列任务。特别值得注意的是$unwind，它将文档中的数组分解为唯一文档，因此可以将它们分组/计算为 en masse 。

MongooseJS在模型上公开了这个。使用上面的示例，如下所示：

Thing.aggregate([
    { $match: { /* Query can go here, if you want to filter results. */ } } 
  , { $project: { tokens: 1 } } /* select the tokens field as something we want to "send" to the next command in the chain */
  , { $unwind: '$tokens' } /* this converts arrays into unique documents for counting */
  , { $group: { /* execute 'grouping' */
          _id: { token: '$tokens' } /* using the 'token' value as the _id */
        , count: { $sum: 1 } /* create a sum value */
      }
    }
], function(err, topTopics) {
  console.log(topTopics);
  // [ foo: 4, bar: 2 baz: 2 ]
});

在大约200,000条记录的初步测试中，它明显快于MapReduce，因此可能会更好地扩展，但这只是在粗略浏览之后。 YMMV。

Mongoose / MongoDB：计算数组中的元素

1 个答案: