Question

我有一个mongoDB数据库，由nodejs通过mongoose使用，它涉及嵌套/嵌入文档，如下所示：

"people" : [
    {"name" : "james", "_id": ObjectId("randomrandom1")},
    {"name" : "arianna","_id": ObjectId("randomrandom2")},
    {"name" : "kyle","_id": ObjectId("randomrandom3")}
]

我需要更改结构，以便我有一个单独的人物＃39;文档和人员将包含ObjectId人员的数组：

"people" : [{type:mongoose.Schema.Types.ObjectId, ref: 'Person'}]

和每个人＆＃39;文件将包含詹姆斯，阿里安娜和凯尔的信息 - 以便我可以在需要时填充它们。

我需要在保留已输入的文档的同时更改数据库结构。有没有办法实现这个目标？

Answer 1

假设我的文档位于名为coll的集合中，就像这样

{
    "_id" : ObjectId("56b47c7a088d9fa3e1aa77a0"),
    "people" : [
        {
            "name" : "james",
            "_id" : ObjectId("56b47c7a088d9fa3e1aa779d")
        },
        {
            "name" : "arianna",
            "_id" : ObjectId("56b47c7a088d9fa3e1aa779e")
        },
        {
            "name" : "kyle",
            "_id" : ObjectId("56b47c7a088d9fa3e1aa779f")
        }
    ]
}

现在我可以聚合使用_id将所有aggregate存储到另一个集合中

db.coll.aggregate([
    {
        $project: {
            _id : 0,
            'people._id' : 1
        }
    },
    {
        $out : 'somecoll'
    }
])

这会将所有ID存储在另一个名为somecoll的集合中，如下所示：

{
    "_id" : ObjectId("56b47de8b47e47b58b64f312"),
    "people" : [
        {
            "_id" : ObjectId("56b47c7a088d9fa3e1aa779d")
        },
        {
            "_id" : ObjectId("56b47c7a088d9fa3e1aa779e")
        },
        {
            "_id" : ObjectId("56b47c7a088d9fa3e1aa779f")
        }
    ]
}

Answer 2

为了提高性能，尤其是在处理大型集合时，请利用 Bulk() API有效地批量更新集合，因为您将批量发送操作到服务器（例如，批量大小为1000）。这样可以提供更好的性能，因为您不会将每个请求发送到服务器，而是每1000个请求中只发送一次，从而使您的更新更加高效和快捷。

要更改数据库结构，这里的一般算法是＆＃34;循环＆＃34;集合的 find() 会在访问当前文档信息的同时生成并处理更新。通常，您希望更改批量执行此操作的结构，并且您的更新将基于字段中已包含的信息（在您的情况下为people数组）。

要将新文档插入person集合，您希望通过对旧集合运行聚合操作来获取文档，该集合通过{{对非规范化people数组进行分组。 1}}键，对于每个分组文档，返回结果中的_id和_id字段。使用此name数组将文档插入到新集合中，并将Bulk API写操作insert（）方法循环为＆＃34;最安全的＆＃34;这样做的形式，而无需在服务器上运行所有代码。

由于 aggregate() 方法返回cursor，您可以使用其 forEach() 方法对其进行迭代并访问每个文档因此，批量设置批量更新操作，然后通过API有效地通过服务器发送。

以下示例演示了此方法，用于在服务器和应用程序中执行此操作。第一个使用MongoDB版本results中提供的 Bulk() API。

服务器端（mongo shell）：

>= 2.6 and < 3.2

下一个示例适用于deprecated the Bulk API以后的新MongoDB版本// Bulk insert new documents to person collection var bulkInsertOp = db.person.initializeUnorderedBulkOp(), // initialise the bulk operations on the new person collection pipeline = [ {"$unwind": "$people"}, { "$group": { "_id": "$people._id", "name": { "$first": "$people.name" } } } ], counter = 0, // counter to keep track of the batch insert size cursor = db.collection.aggregate(pipeline); // Get person documents using aggregation framework on old collection cursor.forEach(function(doc){ bulkInsertOp.insert(doc); // insert the aggregated document to the new person collection counter++; // increment counter if (counter % 1000 == 0) { // execute the bulk insert operation in batches of 1000 bulkInsertOp.execute(); bulkInsertOp = db.person.initializeUnorderedBulkOp(); } }); if (counter % 1000 != 0) { bulkInsertOp.execute(); } // Bulk update old collection to denormalize the people array var bulkUpdateOp = db.collection.initializeUnorderedBulkOp(), // initialise the bulk operations on the new person collection count = 0, // counter to keep track of the batch insert size cur = db.collection.find({}); // Get all documents from collection cur.forEach(function(doc){ var peopleIds = doc.people.map(function(p){ return p._id; }); // Create an array of person ids for referencing bulkUpdateOp.find({ "_id": doc._id }).updateOne({ "$set": { "people": peopleIds } }); if (count % 1000 == 0) { bulkUpdateOp.execute(); bulkUpdateOp = db.collection.initializeUnorderedBulkOp(); } }); if (count % 1000 != 0) { bulkUpdateOp.execute(); }，并使用 bulkWrite() 提供了一套更新的api。

它使用与上面相同的游标，但不是迭代结果，而是使用 map() 方法创建包含批量操作的数组：

3.2

猫鼬实施

在客户端实现此功能，有多种方法可以执行此操作。您可以将查询流用于＆＃34;插件＆＃34;到其他节点流，如http响应和写入流，所以所有东西＆＃34;只是工作＆＃34;与散装api一起开箱即用。

在Mongoose中，您可以通过从基本驱动程序访问底层集合对象来进行循环，但在尝试访问 Bulk() api方法之前，请确保数据库连接已打开。这可确保存在Node.js var pipeline = [ {"$unwind": "$people"}, { "$group": { "_id": "$people._id", "name": { "$first": "$people.name" } } } ], cursor = db.collection.aggregate(pipeline), bulkInsertOps = cursor.map(function (doc) { return { "insertOne": { "document": doc } }; }), cur = db.collection.find({}), bulkUpdateOps = cur.map(function (doc) { var peopleIds = doc.people.map(function(p){ return p._id; }); return { "updateOne": { "filter": { "_id": doc._id } , "update": { "$set": { "people": peopleIds } } } }; }); db.person.bulkWrite(bulkInsertOps, { "ordered": true }); db.collection.bulkWrite(bulkUpdateOps, { "ordered": true });实例，并且可以获取Db对象。在模型上使用Collection()访问器后，您可以使用For Mongoose版本中可用的 Bulk() API~3.8.8，~3.8.22,4.x支持MongoDB服务器版本.collection：

客户端：

>= 2.6 and < 3.2

在上面， Stream api打破了聚合结果，以便一次处理一个文档，因为这样可以批量构建插件然后发送到批量处理服务器，而不是一次性加载所有内容。

Bulk() 然后在实际发送到服务器之前一次排队多次操作。因此，在上面的这种情况下，写入仅发送到服务器以便批量处理1000个条目。您可以选择高达16MB BSON限制的任何东西，但要保持可管理性。

在批量处理的操作之上， async 库充当附加限制器，可确保在任何时候基本上不超过文档的限制。限制防止制造昂贵的＆＃34;执行＆＃34;通过确保操作等待而不是排队更多的事情来调用。

有没有办法在保持数据的同时更改mongoDB结构（从嵌套/嵌入文档到对象引用列表）？

2 个答案:

猫鼬实施