我想从elasticsearch获取所有数据,而不使用可分页的过滤器。最好的方法是哪种方式?我将默认限制设置为2000.我读过我应该使用扫描,但我不知道应该如何使用它。我应该如何使用扫描和滚动来获取所有数据?
public Map searchByIndexParams(AuctionIndexSearchParams searchParams, Pageable pageable) {
final List<FilterBuilder> filters = Lists.newArrayList();
final NativeSearchQueryBuilder searchQuery = new NativeSearchQueryBuilder().withQuery(matchAllQuery());
Optional.ofNullable(searchParams.getCategoryId()).ifPresent(v -> filters.add(boolFilter().must(termFilter("cat", v))));
Optional.ofNullable(searchParams.getCurrency()).ifPresent(v -> filters.add(boolFilter().must(termFilter("curr", v))));
Optional.ofNullable(searchParams.getTreeCategoryId()).ifPresent(v -> filters.add(boolFilter().must(termFilter("tcat", v))));
Optional.ofNullable(searchParams.getUid()).ifPresent(v -> filters.add(boolFilter().must(termFilter("uid", v))));
//access for many uids
if(searchParams.getUids() != null){
Optional.ofNullable(searchParams.getUids().split(",")).ifPresent(v -> {
filters.add(boolFilter().must(termsFilter("uid", v)));
});
}
//access for many categories
if(searchParams.getCategories() != null){
Optional.ofNullable(searchParams.getCategories().split(",")).ifPresent(v -> {
filters.add(boolFilter().must(termsFilter("cat", v)));
});
}
final BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
if (Optional.ofNullable(searchParams.getTitle()).isPresent()) {
boolQueryBuilder.should(queryStringQuery(searchParams.getTitle()).analyzeWildcard(true).field("title"));
}
if (Optional.ofNullable(searchParams.getStartDateFrom()).isPresent()
|| Optional.ofNullable(searchParams.getStartDateTo()).isPresent()) {
filters.add(rangeFilter("start_date").from(searchParams.getStartDateFrom()).to(searchParams.getStartDateTo()));
}
if (Optional.ofNullable(searchParams.getEndDateFrom()).isPresent()
|| Optional.ofNullable(searchParams.getEndDateTo()).isPresent()) {
filters.add(rangeFilter("end_date").from(searchParams.getEndDateFrom()).to(searchParams.getEndDateTo()));
}
if (Optional.ofNullable(searchParams.getPriceFrom()).isPresent()
|| Optional.ofNullable(searchParams.getPriceTo()).isPresent()) {
filters.add(rangeFilter("price").from(searchParams.getPriceFrom()).to(searchParams.getPriceTo()));
}
searchQuery.withQuery(boolQueryBuilder);
FilterBuilder[] filterArr = new FilterBuilder[filters.size()];
filterArr = filters.toArray(filterArr);
searchQuery.withFilter(andFilter(filterArr));
final FacetedPage<AuctionIndex> search = auctionIndexRepository.search(searchQuery.build());
response.put("content", search.map(index ->auctionRepository
.findAuctionById(Long.valueOf(index.getId())))
.getContent());
return response;
}
编辑:
我得到了:
String scrollId = searchTemplate.scan(searchQuery.build(), 1000, false);
Page<AuctionIndex> page = searchTemplate.scroll(scrollId, 15000L, AuctionIndex.class);
Integer i = 0;
if (page != null && page.hasContent()) {
while(page.hasContent()){
page = searchTemplate.scroll(scrollId, 15000L, AuctionIndex.class);
if(page.hasContent()){
System.out.println(i);
i++;
}
}
}
但迭代到166并停止错误?
答案 0 :(得分:1)
Scroll API是以最有效的方式浏览所有文档的最佳方式。使用scroll_id
,您可以找到存储在服务器上的特定滚动请求的会话。
以下示例说明如何在代码中使用elasticsearch java scroll api来获取与查询匹配的所有结果。
SearchResponse searchResponse = client.prepareSearch(<INDEX>)
.setQuery(<QUERY>)
.setSearchType(SearchType.SCAN)
.setScroll(SCROLL_TIMEOUT)
.setSize(SCROLL_SIZE)
.execute()
.actionGet();
while (true) {
searchResponse = client
.prepareSearchScroll(searchResponse.getScrollId())
.setScroll(SCROLL_TIMEOUT)
.execute().actionGet();
if (searchResponse.getHits().getHits().length == 0) {
break; //Break condition: No hits are returned
}
for (SearchHit hit : searchResponse.getHits()) {
// process response
}
}
使用Spring-data-elasticsearch的示例
@Autowired
private ElasticsearchTemplate searchTemplate;
String scrollId = searchTemplate.scan(<SEARCH_QUERY>, 1000, false);
Page<ExampleItem> page = searchTemplate.scroll(scrollId, 5000L, ExampleItem.class);
if (page != null && page.hasContent()) {
// process first batch
while (page != null && page.hasContent()) {
page = searchTemplate.scroll(scrollId, 5000L, ExampleItem.class);
if (page != null && page.hasContent()) {
// process remaining batches
}
}
}
此处,ExampleItem
指定要获取的实体。