Question

我正在缓存引用文档的计数并且发现我的方法太慢了。

假设一个简单的一对多模型，其中包含对帖子的评论。我插入一些看起来像这样的东西：

db.posts.insert( { _id:"foo", ncomments:0 } );
db.posts.insert( { _id:"bar", ncomments:0 } );
db.posts.insert( { _id:"baz", ncomments:0 } );

db.comments.insert( { post_id:"foo", comment:"First comment" } );
db.comments.insert( { post_id:"foo", comment:"Second comment" } );
db.comments.insert( { post_id:"bar", comment:"Another comment" } );

现在要重建所有 ncomments 字段，我这样做：

db.posts.find().forEach( function(post){
    var n = db.comments.find( { post_id: post._id } ).count();
    db.posts.update( { _id: post._id }, { $set : { ncomments: n } } );
} );

这样可以正常工作，直到集合变得很大 - 每1,000个文档大约需要一秒钟。

有没有更快的方法来实现这一点，也许没有迭代脚本方法？

我不知道我应该如何构建数据;我也不应该立刻使缓存的较小部分无效。我在特定情况下询问有什么更好的方法来实现这一目标。

Answer 1

我已成功将此过程加速了十次以上。

// allow application to reduce subset of posts if possible
var query = {};

// index all comment counts by post
var counts = {};
db.comments.aggregate( [
    { $match: query },
    { $group : { 
        '_id' : '$post_id', 
        'num' : { $sum: 1 } 
    } }
] ).forEach( function( group ){
    counts[ group._id ] = group.num;
} );

// for all posts (including those without comments) \
// collect a multi-update batch
var updates = db.posts.initializeUnorderedBulkOp();
db.posts.find( query, { _id:1 } ).forEach( function( post ){
    updates.find( { _id: post._id } ).update( { $set: {
        ncomments: counts[ post._id ] || 0
    } } );
} );

// execute all updates with a loose write concern for speed
updates.execute( { w: 0, j: false } );

在我接受自己的答案之前，我仍然愿意接受更好的答案。

如何快速缓存引用对象的文档计数？

1 个答案: