使用Java API提取超过50k文档时的ElasticSearch约束

时间:2018-07-12 15:51:50

标签: java elasticsearch elastic-stack

Am使用Java api SearchSourceBuilder查询elasticsearch索引。我的索引中有超过100k个文档,并且如果我尝试获取index.max_result_window个文档,我将120000增加到120k,然后从我的Java代码增加了。它在下一行中引发空指针异常。

SearchHit[] searchHits = searchResponse.getHits().getHits();

如果我将SearchSourceBuilder的大小减小为50k,则它可以正常工作,但是我只能提取50k个文档。

请在下面找到我的代码:

RestHighLevelClient restHighLevelClient = null;
    Document doc=new Document();

    logger.info("Started Indexing the Document.....");

    try {
        restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
                new HttpHost("localhost", 9201, "http")));
        System.out.println(e.getMessage());
    }


    //Fetching Id, FilePath & FileName from Document Index. 
    SearchRequest searchRequest = new SearchRequest(INDEX); 
    searchRequest.types(TYPE);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    QueryBuilder qb = QueryBuilders.matchAllQuery();
    searchSourceBuilder.query(qb);
    searchSourceBuilder.size(120000); 
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse = null;
    try {
         searchResponse = restHighLevelClient.search(searchRequest);
    } catch (IOException e) {
        e.getLocalizedMessage();
    }

    SearchHit[] searchHits = searchResponse.getHits().getHits(); /// Getting null pointer exception after porcessing some documents. Count is not very constant.
    long totalHits=searchResponse.getHits().totalHits;
    logger.info("Total Hits --->"+totalHits);

请找到我的索引设置详细信息

{
  "document_attachment": {
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "document_attachment",
        "max_result_window": "150000",
        "creation_date": "1531402811016",
        "analysis": {
          "analyzer": {
            "custom_analyzer": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "char_filter": [
                "html_strip"
              ],
              "type": "custom",
              "tokenizer": "whitespace"
            },
            "product_catalog_keywords_analyzer": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "char_filter": [
                "html_strip"
              ],
              "type": "custom",
              "tokenizer": "whitespace"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "UBRQAkg-Su-FfeAtBTGFIw",
        "version": {
          "created": "6020399"
        }
      }
    }
  }
}

1 个答案:

答案 0 :(得分:0)

您需要使用滚动搜索,而不是尝试一次获取所有内容。这使您可以浏览结果。

通过滚动,您可以根据需要获得尽可能多的结果;没有上限。您将无法获得排名结果t,但这对如此大的结果集毫无意义。

有关如何操作的信息,请参见documentation