Am使用Java api SearchSourceBuilder
查询elasticsearch索引。我的索引中有超过100k
个文档,并且如果我尝试获取index.max_result_window
个文档,我将120000
增加到120k
,然后从我的Java代码增加了。它在下一行中引发空指针异常。
SearchHit[] searchHits = searchResponse.getHits().getHits();
如果我将SearchSourceBuilder
的大小减小为50k
,则它可以正常工作,但是我只能提取50k
个文档。
请在下面找到我的代码:
RestHighLevelClient restHighLevelClient = null;
Document doc=new Document();
logger.info("Started Indexing the Document.....");
try {
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));
System.out.println(e.getMessage());
}
//Fetching Id, FilePath & FileName from Document Index.
SearchRequest searchRequest = new SearchRequest(INDEX);
searchRequest.types(TYPE);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryBuilder qb = QueryBuilders.matchAllQuery();
searchSourceBuilder.query(qb);
searchSourceBuilder.size(120000);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = null;
try {
searchResponse = restHighLevelClient.search(searchRequest);
} catch (IOException e) {
e.getLocalizedMessage();
}
SearchHit[] searchHits = searchResponse.getHits().getHits(); /// Getting null pointer exception after porcessing some documents. Count is not very constant.
long totalHits=searchResponse.getHits().totalHits;
logger.info("Total Hits --->"+totalHits);
请找到我的索引设置详细信息
{
"document_attachment": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "document_attachment",
"max_result_window": "150000",
"creation_date": "1531402811016",
"analysis": {
"analyzer": {
"custom_analyzer": {
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "whitespace"
},
"product_catalog_keywords_analyzer": {
"filter": [
"lowercase",
"asciifolding"
],
"char_filter": [
"html_strip"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
},
"number_of_replicas": "1",
"uuid": "UBRQAkg-Su-FfeAtBTGFIw",
"version": {
"created": "6020399"
}
}
}
}
}
答案 0 :(得分:0)
您需要使用滚动搜索,而不是尝试一次获取所有内容。这使您可以浏览结果。
通过滚动,您可以根据需要获得尽可能多的结果;没有上限。您将无法获得排名结果t,但这对如此大的结果集毫无意义。
有关如何操作的信息,请参见documentation。