ElasticSearch NEST API中的滚动示例

时间:2015-07-09 20:42:08

标签: elasticsearch nest

我使用.From()和.Size()方法从Elastic Search结果中检索所有文档。

以下是示例 -

ISearchResponse<dynamic> bResponse = ObjElasticClient.Search<dynamic>(s => s.From(0).Size(25000).Index("accounts").AllTypes().Query(Query));

最近我遇到了弹性搜索的滚动功能。这看起来比专门用于获取大数据的From()和Size()方法更好。

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

我在NEST API中寻找Scroll功能的例子。

有人可以提供NEST示例吗?

谢谢, Sameer

3 个答案:

答案 0 :(得分:11)

这是使用NEST和C#滚动的示例。适用于5.x和6.x

public IEnumerable<T> GetAllDocumentsInIndex<T>(string indexName, string scrollTimeout = "2m", int scrollSize = 1000) where T : class
      {
          ISearchResponse<T> initialResponse = this.ElasticClient.Search<T>
              (scr => scr.Index(indexName)
                   .From(0)
                   .Take(scrollSize)
                   .MatchAll()
                   .Scroll(scrollTimeout));

          List<T> results = new List<T>();

          if (!initialResponse.IsValid || string.IsNullOrEmpty(initialResponse.ScrollId))
              throw new Exception(initialResponse.ServerError.Error.Reason);

          if (initialResponse.Documents.Any())
              results.AddRange(initialResponse.Documents);

          string scrollid = initialResponse.ScrollId;
          bool isScrollSetHasData = true;
          while (isScrollSetHasData)
          {
              ISearchResponse<T> loopingResponse = this.ElasticClient.Scroll<T>(scrollTimeout, scrollid);
              if (loopingResponse.IsValid)
              {
                  results.AddRange(loopingResponse.Documents);
                  scrollid = loopingResponse.ScrollId;
              }
              isScrollSetHasData = loopingResponse.Documents.Any();
          }

          this.ElasticClient.ClearScroll(new ClearScrollRequest(scrollid));
          return results;
      }

来自:http://telegraphrepaircompany.com/elasticsearch-nest-scroll-api-c/

答案 1 :(得分:5)

NEST Reindex的内部实现使用scroll将文档从一个索引移动到另一个索引。

这应该是一个很好的起点。

以下您可以从github找到有趣的代码。

var page = 0;
var searchResult = this.CurrentClient.Search<T>(
    s => s
        .Index(fromIndex)
        .AllTypes()
        .From(0)
        .Size(size)
        .Query(this._reindexDescriptor._QuerySelector ?? (q=>q.MatchAll()))
        .SearchType(SearchType.Scan)
        .Scroll(scroll)
    );
if (searchResult.Total <= 0)
    throw new ReindexException(searchResult.ConnectionStatus, "index " + fromIndex + " has no documents!");
IBulkResponse indexResult = null;
do
{
    var result = searchResult;
    searchResult = this.CurrentClient.Scroll<T>(s => s
        .Scroll(scroll)
        .ScrollId(result.ScrollId)
    );
    if (searchResult.Documents.HasAny())
        indexResult = this.IndexSearchResults(searchResult, observer, toIndex, page);
    page++;
} while (searchResult.IsValid && indexResult != null && indexResult.IsValid && searchResult.Documents.HasAny());

另外,您可以查看Scroll

integration test
[Test]
public void SearchTypeScan()
{
    var scanResults = this.Client.Search<ElasticsearchProject>(s => s
        .From(0)
        .Size(1)
        .MatchAll()
        .Fields(f => f.Name)
        .SearchType(SearchType.Scan)
        .Scroll("2s")
    );
    Assert.True(scanResults.IsValid);
    Assert.False(scanResults.FieldSelections.Any());
    Assert.IsNotNullOrEmpty(scanResults.ScrollId);

    var results = this.Client.Scroll<ElasticsearchProject>(s=>s
        .Scroll("4s") 
        .ScrollId(scanResults.ScrollId)
    );
    var hitCount = results.Hits.Count();
    while (results.FieldSelections.Any())
    {
        Assert.True(results.IsValid);
        Assert.True(results.FieldSelections.Any());
        Assert.IsNotNullOrEmpty(results.ScrollId);
        var localResults = results;
        results = this.Client.Scroll<ElasticsearchProject>(s=>s
            .Scroll("4s")
            .ScrollId(localResults.ScrollId));
        hitCount += results.Hits.Count();
    }
    Assert.AreEqual(scanResults.Total, hitCount);
}

答案 2 :(得分:0)

我自由地将Michael的好答案改写为async,并且不再那么冗长(v。6.x Nest):

public async Task<IEnumerable<T>> RockAndScroll<T>(
    string indexName,
    string scrollTimeoutMilliseconds = "2m",
    int scrollPageSize = 1000
) where T : class
{
    var searchResponse = await this.ElasticClient.SearchAsync<T>(sd => sd
        .Index(indexName)
        .From(0)
        .Take(scrollPageSize)
        .MatchAll()
        .Scroll(scrollTimeoutMilliseconds));

    var results = new List<T>();

    while (true)
    {
        if (!searchResponse.IsValid || string.IsNullOrEmpty(searchResponse.ScrollId))
            throw new Exception($"Search error: {searchResponse.ServerError.Error.Reason}");

        if (!searchResponse.Documents.Any())
            break;

        results.AddRange(searchResponse.Documents);
        searchResponse = await ElasticClient.ScrollAsync<T>(scrollTimeoutMilliseconds, searchResponse.ScrollId);
    }

    await this.ElasticClient.ClearScrollAsync(new ClearScrollRequest(searchResponse.ScrollId));

    return results;
}