Question

我有一个mongodb查询，可以获取大约50,000个大文档这对我的RAM很大，因此计算机速度变慢。

现在我想通过mongodb结果逐块迭代我想获取前1000个文档并获取它们然后接下来的1000个文档。

这是处理数据量大小的最佳方法吗？或者我应该更好地接受一个接一个？

我尝试了以下内容：

MongoCursor<MyDocument> items = collection.Find(query);
long count = items.Count();
int stepSize = 1000;

for (int i = 0; i < Math.Ceiling((double)count / stepSize); i++)
{
    List<MyDocument> list = items.SetSkip(i * stepSize).SetLimit(stepSize).ToList();
    // process the 1000 ...
}

但是这段代码没有用。我收到以下错误：

一旦MongoCursor对象被冻结，就无法修改。

如何解决此问题？

Answer 1

你可以使用这种方法

        var count = collection.Find(new BsonDocument()).Count();
        var stepSize = 1000;

        for (int i = 0; i < Math.Ceiling((double)count / stepSize); i++)
        {
            // put your query here      \/\/\/\/\/\/
            var list = collection.Find(new BsonDocument()).Skip(i * stepSize).Limit(stepSize).ToList();
            // process the 1000 ...
        }

因为您有兴趣获得和处理司机2.2.4

Answer 2

感谢profesor79的提示。

现在我能够以块状方式获取它，所以我开始了一个意外结果的测量。

结果集有39,500份文件

首次测试：步数为1000份文件：

int stepSize = 1000;
long start = DateTime.Now.Ticks;
for (int i = 0; i < Math.Ceiling((double)count / stepSize); i++)
{
    List<MyDocument> list = collection.Find(query).SetSkip(i * stepSize).SetLimit(stepSize).ToList();
}
long end = DateTime.Now.Ticks;
Debug.WriteLine("Step 1000 --> " + new TimeSpan(end - start));
// Step 1000 --> 00:00:41.1731168

第二次测试：步长2000文件：

int stepSize = 2000;
long start = DateTime.Now.Ticks;
for (int i = 0; i < Math.Ceiling((double)count / stepSize); i++)
{
    List<MyDocument> list = collection.Find(query).SetSkip(i * stepSize).SetLimit(stepSize).ToList();
}
long end = DateTime.Now.Ticks;
Debug.WriteLine("Step 2000 --> " + new TimeSpan(end - start));
// Step 2000 --> 00:00:42.1772173

第三次测试：5000步的步长：

int stepSize = 5000;
long start = DateTime.Now.Ticks;
for (int i = 0; i < Math.Ceiling((double)count / stepSize); i++)
{
    List<MyDocument> list = collection.Find(query).SetSkip(i * stepSize).SetLimit(stepSize).ToList();
}
long end = DateTime.Now.Ticks;
Debug.WriteLine("Step 5000 --> " + new TimeSpan(end - start));
// Step 5000 --> 00:00:40.9530949

上次测试：步长1文档：

long start = DateTime.Now.Ticks;
foreach (MyDocument item in collection.Find(query))
{

}
long end = DateTime.Now.Ticks;
Debug.WriteLine("Step 1 --> " + new TimeSpan(end - start));
// Step 1 --> 00:00:39.6329629

所以看起来使用积木毫无意义一切都差不多快。
我再次运行测试时收到了类似的时间。

唯一使用大量RAM的是这样做：

collection.Find(query)).ToList();

ToList已缓存在RAM中。

希望它能帮助其他人。

通过mongodb结果逐块迭代

2 个答案: