返回Elasticsearch中一个查询中的所有记录

时间:2013-02-27 14:25:23

标签: java api elasticsearch search

我有一个弹性搜索数据库,想要在我的网站页面上获取所有记录。我写了一个bean,它连接到弹性搜索节点,搜索记录并返回一些响应。我执行搜索的简单java代码是

SearchResponse response = getClient().prepareSearch(indexName)
    .setTypes(typeName)              
    .setQuery(queryString("\*:*"))
    .setExplain(true)
    .execute().actionGet();

但是Elasticsearch将默认大小设置为10,我有10次点击响应。我的数据库中有超过10条记录。如果我将大小设置为Integer.MAX_VALUE,我的搜索会变得非常慢,这不是我想要的。

如何在没有设置响应大小的情况下,在可接受的时间内在一个操作中获取所有记录?

9 个答案:

答案 0 :(得分:18)

public List<Map<String, Object>> getAllDocs(){
        int scrollSize = 1000;
        List<Map<String,Object>> esData = new ArrayList<Map<String,Object>>();
        SearchResponse response = null;
        int i = 0;
        while( response == null || response.getHits().hits().length != 0){
            response = client.prepareSearch(indexName)
                    .setTypes(typeName)
                       .setQuery(QueryBuilders.matchAllQuery())
                       .setSize(scrollSize)
                       .setFrom(i * scrollSize)
                    .execute()
                    .actionGet();
            for(SearchHit hit : response.getHits()){
                esData.add(hit.getSource());
            }
            i++;
        }
        return esData;
}

答案 1 :(得分:7)

当前排名最高的答案有效,但它需要在内存中加载整个结果列表,这可能会导致大型结果集的内存问题,并且在任何情况下都是不必要的。

我创建了一个Java类,它实现了一个很好的Iterator over SearchHit,允许迭代所有结果。在内部,它通过发出包含from:字段的查询来处理分页,并且只保留在内存中一页结果

<强>用法:

// build your query here -- no need for setFrom(int)
SearchRequestBuilder requestBuilder = client.prepareSearch(indexName)
                                            .setTypes(typeName)
                                            .setQuery(QueryBuilders.matchAllQuery()) 

SearchHitIterator hitIterator = new SearchHitIterator(requestBuilder);
while (hitIterator.hasNext()) {
    SearchHit hit = hitIterator.next();

    // process your hit
}

请注意,在创建SearchRequestBuilder时,您不需要致电setFrom(int),因为这将由SearchHitIterator进行。如果要指定页面大小(即每页搜索点击次数),可以调用setSize(int),否则使用ElasticSearch的默认值。

<强> SearchHitIterator:

import java.util.Iterator;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.search.SearchHit;

public class SearchHitIterator implements Iterator<SearchHit> {

    private final SearchRequestBuilder initialRequest;

    private int searchHitCounter;
    private SearchHit[] currentPageResults;
    private int currentResultIndex;

    public SearchHitIterator(SearchRequestBuilder initialRequest) {
        this.initialRequest = initialRequest;
        this.searchHitCounter = 0;
        this.currentResultIndex = -1;
    }

    @Override
    public boolean hasNext() {
        if (currentPageResults == null || currentResultIndex + 1 >= currentPageResults.length) {
            SearchRequestBuilder paginatedRequestBuilder = initialRequest.setFrom(searchHitCounter);
            SearchResponse response = paginatedRequestBuilder.execute().actionGet();
            currentPageResults = response.getHits().getHits();

            if (currentPageResults.length < 1) return false;

            currentResultIndex = -1;
        }

        return true;
    }

    @Override
    public SearchHit next() {
        if (!hasNext()) return null;

        currentResultIndex++;
        searchHitCounter++;
        return currentPageResults[currentResultIndex];
    }

}

事实上,意识到拥有这样一个类是多么方便,我想知道为什么ElasticSearch的Java客户端不会提供类似的东西。

答案 2 :(得分:3)

您可以使用滚动API。 使用searchhit迭代器的另一个建议也很有效,但只有当你不想更新那些命中时。

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //max of 100 hits will be returned for each scroll
//Scroll until no hits are returned
do {
    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.

答案 3 :(得分:1)

问您这个问题已经很久了,我想将我的答案发布给以后的读者。

如上所述,最好在索引中包含成千上万个文档的情况下加载具有大小的文档并开始。在我的项目中,搜索将以默认大小并从零索引开始加载50个结果,如果用户要加载更多数据,则将加载下一个50个结果。这是我在代码中所做的:

public List<CourseDto> searchAllCourses(int startDocument) {

    final int searchSize = 50;
    final SearchRequest searchRequest = new SearchRequest("course_index");
    final SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.matchAllQuery());

    if (startDocument != 0) {
        startDocument += searchSize;
    }

    searchSourceBuilder.from(startDocument);
    searchSourceBuilder.size(searchSize);

    // sort the document
    searchSourceBuilder.sort(new FieldSortBuilder("publishedDate").order(SortOrder.ASC));
    searchRequest.source(searchSourceBuilder);

    List<CourseDto> courseList = new ArrayList<>();

    try {
        final SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
        final SearchHits hits = searchResponse.getHits();

        // Do you want to know how many documents (results) are returned? here is you get:
        TotalHits totalHits = hits.getTotalHits();
        long numHits = totalHits.value;

        final SearchHit[] searchHits = hits.getHits();

        final ObjectMapper mapper = new ObjectMapper();

        for (SearchHit hit : searchHits) {
            // convert json object to CourseDto
            courseList.add(mapper.readValue(hit.getSourceAsString(), CourseDto.class));
        }
    } catch (IOException e) {
        logger.error("Cannot search by all mach. " + e);
    }
    return courseList;
}

信息: -Elasticsearch版本7.5.0 -Java高级REST客户端用作客户端。

我希望这对某人有用。

答案 4 :(得分:0)

您必须将返回结果的数量与您希望用户等待的时间和可用服务器内存量进行权衡。如果您已将1,000,000个文档编入索引,则无法在一个请求中检索所有这些结果。我假设您的结果是针对一个用户的。您必须考虑系统在负载下的性能。

答案 5 :(得分:0)

要查询所有内容,您应构建一个CountRequestBuilder来获取记录总数(通过CountResponse),然后将该数字设置回您的搜索请求的大小。

答案 6 :(得分:0)

如果您主要关注的是导出所有记录,那么您可能需要寻找不需要任何排序的解决方案,因为排序是一项昂贵的操作。 您可以使用ElasticsearchCRUD的扫描和滚动方法,如here所述。

答案 7 :(得分:0)

对于版本6.3.2,以下工作有效:

public List<Map<String, Object>> getAllDocs(String indexName, String searchType) throws FileNotFoundException, UnsupportedEncodingException{

    int scrollSize = 1000;
    List<Map<String,Object>> esData = new ArrayList<>();
    SearchResponse response = null;
    int i=0;

    response = client.prepareSearch(indexName)
        .setScroll(new TimeValue(60000))
        .setTypes(searchType)  // The document types to execute the search against. Defaults to be executed against all types.
        .setQuery(QueryBuilders.matchAllQuery())
        .setSize(scrollSize).get(); //max of 100 hits will be returned for each scroll
    //Scroll until no hits are returned
    do {
        for (SearchHit hit : response.getHits().getHits()) {
            ++i;
            System.out.println (i + " " + hit.getId());
            writer.println(i + " " + hit.getId());
        }
        System.out.println(i);

        response = client.prepareSearchScroll(response.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
    } while(response.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.
    return esData;
}

答案 8 :(得分:-1)

1.设置最大尺寸,例如:MAX_INT_VALUE;

private static final int MAXSIZE = 1000000;

@覆盖     public List getAllSaleCityByCity(int cityId)抛出异常{

    List<EsSaleCity> list=new ArrayList<EsSaleCity>();

    Client client=EsFactory.getClient();
    SearchResponse response= client.prepareSearch(getIndex(EsSaleCity.class)).setTypes(getType(EsSaleCity.class)).setSize(MAXSIZE)
            .setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.boolFilter()
                    .must(FilterBuilders.termFilter("cityId", cityId)))).execute().actionGet();

    SearchHits searchHits=response.getHits();

    SearchHit[] hits=searchHits.getHits();
    for(SearchHit hit:hits){
        Map<String, Object> resultMap=hit.getSource();
        EsSaleCity saleCity=setEntity(resultMap, EsSaleCity.class);
        list.add(saleCity);
    }

    return list;

}

2.在搜索之前计算ES

CountResponse countResponse = client.prepareCount(getIndex(EsSaleCity.class)).setTypes(getType(EsSaleCity.class)).setQuery(queryBuilder).execute().actionGet();

int size =(int)countResponse.getCount(); //这是你想要的尺寸;

然后你可以

SearchResponse response= client.prepareSearch(getIndex(EsSaleCity.class)).setTypes(getType(EsSaleCity.class)).setSize(size);