你能做什么

Question

我有一个shell脚本，它在集合上创建一个游标，然后用另一个集合中的数据更新每个docunment 当我在本地数据库上运行它时，它在大约15秒内完成，但在托管数据库上运行超过45分钟。

db.col1.find().forEach(function(doc) {
    db.col2.findAndModify(
        {
            query: {attr1: doc.attr1},
            update: { $push: {attr2: doc.attr2},
            upsert: true
        });
    });

因此，为了处理这个脚本，客户端和服务器之间显然存在网络开销。有没有办法保持处理所有服务器端？我查看了服务器端的javascript，但是根据我读到的here，这不是推荐的做法。

Answer 1

在本地，您几乎没有网络开销。无干扰，无路由器，无交换机，无带宽限制。此外，在大多数情况下，您的大容量存储，无论是SSD还是HDD，或多或少闲置（除非您倾向于在开发时玩游戏。）因此，当需要大量IO功能的操作启动时，它就可用。 / p>

当您从本地shell对服务器运行脚本时，会发生以下情况。

db.col1.find().forEach整个集合将从未知媒体中读取（很可能是可以在许多实例之间共享可用IO的HDD）。然后这些文件将被转移到您的本地shell。与localhost的连接相比，每个文档检索都会通过几十个跃点进行路由，每个跃点都会增加很少的延迟。大概相当一些文件，这加起来。不要忘记完整的文档是通过网络发送的，因为您没有使用投影来限制返回到attr1和attr2的字段。外部带宽当然比与localhost的连接慢。
db.col2.findAndModify对于每个文档，都会进行查询。同样，共享IO可能会破坏性能。
{ query: {attr1: doc.attr1}, update: { $push: {attr2: doc.attr2}, upsert: true}顺便说一句，您确定attr1被编入索引吗？即使它是，它也不确定索引当前是否在RAM中。我们正在谈论一个共享实例，对吗？而且很可能你的写操作必须等到它们被mongod处理，根据默认的写入问题，数据必须成功应用到内存数据集中在确认之前，如果将大量的操作发送到共享实例，则可能是您的操作数量为1亿，其中一个在队列中。并且第二次添加网络延迟，因为传输到本地shell的值需要被发回。

你能做什么

首先，确保你

使用projection

将返回值限制为您需要的值

db.col1.find({},{ "_id":0, "attr1":1, "attr2":1 })

确保您已attr1已编入索引
```
db.col2.ensureIndex( { "attr1":1 } )
```

使用bulk operations。它们执行得更快，代价是在出现问题时减少反馈。

// We can use unordered here, because the operations
// each apply to only a single document
var bulk = db.col2.initializeUnorderedBulkOp()

// A counter we will use for intermediate commits
// We do the intermediate commits in order to keep RAM usage low
var counter = 0

// We limit the result to the values we need
db.col1.find({}.{"_id":0, "attr1":1, "attr2":1 }).forEach(
  function(doc){

    // Find the matching document
    // Update exactly that
    // and if it does not exist, create it
    bulk
      .find({"attr1": doc.attr1})
      .updateOne({ $push: {"attr2": doc.attr2})
      .upsert()

    counter++

    // We have queued 1k operations and can commit them
    // MongoDB would split the bulk ops in batches of 1k operations anyway
    if( counter%1000 == 0 ){
      bulk.execute()
      print("Operations committed: "+counter)
      // Initialize a new batch of operations
      bulk = db.col2.initializeUnorderedBulkOp()
    }

  }
)
// Execute the remaining operations not committed yet.
bulk.execute()
print("Operations committed: "+counter)

Mongodb服务器端与客户端处理

1 个答案:

你能做什么