Question

我需要过滤一个大的MongoDB集合（今天，3500000个文档，明天，更多......），并将其部分内容转移到空集合中。这是我的ES6天真的方法：

＆＃13;

await col_target.drop();
const cursor = await col_source.find();

while (await cursor.hasNext()) {
  const doc = await cursor.next();
  
  // the filter is and array of regular expressions
  if (!regex.map(_ => new RegExp(_, 'imu').test(doc.rawJson.text)).reduce((a, b) => a || b)) continue;
  
  await col_target.insertOne(prepareTweet(doc));
}

await db.close();

＆＃13;

我觉得这不是最佳的，因为找到＆amp;插入操作应该并行化。但我真的不知道该怎么做。有人可以就如何改进我的代码向我提供建议吗？

Answer 1

你应该在一个查询中执行此操作，它会更快

db.col_source.aggregate([
    {$match: {rawJson: /someRegex/gi }},
    {$out: "col_target"}
])

当然，不要忘记在rawJson字段上创建text index

将过滤后的文档从Mongo集合传输到另一个集合的最佳方法（在nodejs中）

1 个答案: