Question

我正在尝试将数据从Microsoft SQL数据库移动到Elasticsearch中。我使用EF 6生成模型（数据库中的代码优先）和NEST将对象序列化为Elasticsearch。

如果我使用延迟加载，它可以正常工作，但速度令人难以置信（速度太慢，无法使用）。如果我通过添加以下行切换到Eager加载：

public MyContext() : base("name=MyContext")
{
    this.Configuration.LazyLoadingEnabled = false;
}

并像这样序列化：

ElasticClient client = new ElasticClient(settings);

var allObjects = context.objects
    .Include("item1")
    .Include("item2")
    .Include("item2.item1")
    .Include("item2.item1.item");

client.IndexMany(allObjects);

在序列化发生之前，我最终得到一个System.OutOfMemoryException（所以只需加载数据）。我有大约2.5 GB的可用内存，我们在谈论数据库中的110.000项。

我尝试对数据进行排序，然后使用Skip和Take一次只对一定数量的对象进行序列化，但是我只是设法在内存不足之前将60.000个对象插入到Elasticsearch中。看起来垃圾收集器没有释放足够的内存，即使我在将一定数量的对象插入Elasticsearch后也明确地调用了它。

有没有办法让Eager加载特定数量的对象？还是另一种序列化大型数据集的方法？

Answer 1

事后看来，这是一个愚蠢的错误。通过这样做，我设法实现了我的目标：

int numberOfObjects;

using (var context = new myContext())
{
    numberOfObjects = context.objects.Count();
}

for (int i = 0; i < numberOfObjects; i += 10000)
{
    using (var context = new myContext())
    {
        var allObjekts = context.objects.OrderBy(s => s.ID)
            .Skip(i)
            .Take(10000)
            .Include("item1")
            .Include("item2")
            .Include("item2.item1")
            .Include("item2.item1.item");

            client.IndexMany(allObjekts);
    }
}

这允许Gargage收集器完成其工作，因为上下文被包装在for循环中。我不知道是否有更快的方法，我能够在大约400秒内在Elasticsearch中插入大约100.000个对象。

C＃序列化大型数据集

1 个答案: