Question

为超过1亿个mongodb文档添加新字段的最快，最安全的策略是什么？

背景

在3节点副本集中使用mongodb 3.0
我们正在添加一个新字段（post_hour），它基于当前文档中另一个字段（post_time）中的数据。 post_hour字段是post_time到小时的截断版本。

Answer 1

我遇到了类似的情况，我在其中创建了一个脚本来更新大约2500万份文档，并且花了很多时间来更新所有文档。为了提高性能，我逐个将更新的文档插入到新的集合中并重命名为新的集合。这种方法有帮助，因为我插入的是文档而不是更新它们（＆＃39;插入＆＃39;操作比＆＃更快39;更新＆＃39;操作）。

以下是示例脚本（我尚未对其进行测试）：

/*This method returns postHour*/
function convertPostTimeToPostHour(postTime){
}

var totalCount = db.person.count();
var chunkSize = 1000;
var chunkCount = totalCount / chunkSize;
offset = 0;
for(index = 0; index<chunkCount; index++){
    personList = db.persons.find().skip(offset).limit(chunkSize);
    personList.forEach(function (person) {
        newPerson = person;
        newPerson.post_hour = convertPostTimeToPostHour(person.post_time);
        db.personsNew.insert(newPerson); // This will insert the record in a new collection
    });
    offset += chunkSize;
}

当上述编写的脚本将被执行时，新的集合“人物”新的＆＃39;将有更新的记录，其值为field＆post;时间＆＃39;集。

如果现有集合包含任何索引，则需要在新集合中重新创建它们。

创建索引后，您可以重命名集合的名称＆＃39; person＆＃39;到了人们的老年人和＆＃39; personNew＆＃39;到了＃39;。

Answer 2

snapshot将允许防止查询结果中的重复（因为我们正在扩展大小） - 如果发生任何问题，可以删除。

请在下面找到mongo shell脚本＆＃39; a1＆＃39;是集合名称：

var documentLimit = 1000;

var docCount = db.a1.find({
        post_hour : {
            $exists : false
        }
    }).count();

var chunks = docCount / documentLimit;

for (var i = 0; i <= chunks; i++) {
    db.a1.find({
        post_hour : {
            $exists : false
        }
    }).snapshot()
      .limit(documentLimit)
      .forEach(function (doc) {
        doc.post_hour = 12; // put your transformation here
        // db.a1.save(doc); // uncomment this line to save data 
                            // you can also specify write concern here
        printjson(doc);     // comment this line to avoid polution of shell output
                            // this is just for test purposes    
    });
}

您可以使用参数，但在1000个记录块中执行批量处理时，这看起来是最佳的。

在mongodb中为1亿条记录添加新字段

2 个答案: