在弹性搜索中对所有已提取的文档进行分页

时间:2019-12-09 15:09:35

标签: c# elasticsearch nest

我正在尝试使用像这样的原始代码:

var pageSize = 100;
var startPosition = 0;

do
{

    var searchResponse = client.Search<Bla>(s => s
        .Index(indexName)
        .Query(q => q.MatchAll()
        ).From(startPosition).Size(pageSize)
    );

    startPosition = startPosition + pageSize;

} while (true);

在所有提取的文档上进行分页。我认为这会中断服务器,因为请求太频繁了。我可以通过入睡几毫秒来减慢速度,但是我认为这仍然不是最佳实践。

我知道还有滚动的概念。在我想对每个页面的结果采取行动的情况下,我将如何使用它?

PS:

static void Main(string[] args)
{
    var indexName = "document";
    var client = GetClient(indexName);
    var pageSize = 1000;

    var numberOfSlices = 4;

    var scrollObserver = client.ScrollAll<Document>("1m", numberOfSlices, s => s
    .MaxDegreeOfParallelism(numberOfSlices)
    .Search(search => search
        .Index(indexName).MatchAll()
        .Size(pageSize)
    )
    ).Wait(TimeSpan.FromMinutes(60), r =>
    {
    // do something with documents from a given response.
    var documents = r.SearchResponse.Documents.ToList();

    Console.WriteLine(documents[0].Id);
    });
}

我熟悉观察者模式,但不确定这些组件到底意味着什么:

“ 1m” numberOfSlices TimeSpan.FromMinutes(60)

1 个答案:

答案 0 :(得分:0)

遵循这些思路似乎可行:

const string indexName = "bla";
var client = GetClient(indexName);
const int scrollTimeout = 1000;

var initialResponse = client.Search<Document>
    (scr => scr.Index(indexName)
    .From(0)
    .Take(100)
    .MatchAll()
    .Scroll(scrollTimeout))
;

List<XYZ> results;
results = new List<XYZ>();

if (!initialResponse.IsValid || string.IsNullOrEmpty(initialResponse.ScrollId))
throw new Exception(initialResponse.ServerError.Error.Reason);

if (initialResponse.Documents.Any())
results.AddRange(initialResponse.Documents);

var scrollid = initialResponse.ScrollId;
bool isScrollSetHasData = true;
while (isScrollSetHasData)
{
    var loopingResponse = client.Scroll<XYZ>(scrollTimeout, scrollid);

    if (loopingResponse.IsValid)
    {
        results.AddRange(loopingResponse.Documents);
        scrollid = loopingResponse.ScrollId;
    }
    isScrollSetHasData = loopingResponse.Documents.Any();

    // do some amazing stuff
}

client.ClearScroll(new ClearScrollRequest(scrollid));