ElastiSearch DateHistogram和TopHits聚合

时间:2018-11-04 12:07:11

标签: java elasticsearch elasticsearch-aggregation

我检索了许多文章,并希望每天查找最相关的文章。 含义->我搜索例如然后选择“ Apple”,然后希望每天获取5条相关的文章(每天一篇)。

Java代码

public List<ArticleEntity> findByKeyword(String keyword, String dateFrom, String dateTo) {
        TransportClient client = elasticSearchProvider.getClient();
        SearchRequestBuilder requestBuilder = client.prepareSearch("summarizer").setTypes("article").setSearchType(SearchType.QUERY_THEN_FETCH);

        BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
        boolQueryBuilder.filter(
                QueryBuilders.queryStringQuery(keyword + "*").analyzeWildcard(true)
        );

        DateHistogramAggregationBuilder dateHistogramAggregationBuilder =
                AggregationBuilders.dateHistogram("date").field("publish_date").format("yyyy-MM-dd").dateHistogramInterval(DateHistogramInterval.DAY).subAggregation(
                        AggregationBuilders.topHits("top").size(1).explain(true)

                ).keyed(true);

        requestBuilder.setQuery(boolQueryBuilder)
                .addAggregation(dateHistogramAggregationBuilder);

        SearchResponse response = requestBuilder.execute().actionGet();

        List<ArticleEntity> articleEntityList = new ArrayList<>();

        ObjectMapper oMapper = new ObjectMapper();
        oMapper.setPropertyNamingStrategy(
                PropertyNamingStrategy.CAMEL_CASE_TO_LOWER_CASE_WITH_UNDERSCORES);

        Histogram dateHits = response.getAggregations().get("date");
        for (Histogram.Bucket entry : dateHits.getBuckets()) {
            TopHits topHits = entry.getAggregations().get("top");
            for (SearchHit hit : topHits.getHits().getHits()) {
                Map<String, Object> source = hit.getSource();
                if (source != null) {
                    articleEntityList.add(oMapper.convertValue(source, ArticleEntity.class));
                }
            }

        }

        return articleEntityList;

    }

这里的问题是,存储桶的大小大于2000->而对我而言,存储桶的大小应为6-7(我从最近6天开始抓取文章)。

ES文档的一部分:

article_authors: "Dorothy Pitti",
facebook_score: 0,
keywords: "users,apps,nsfw,app,privacy,exchanges,best,screenshot,dont,sexting,messages,youre,good,features",
publish_date: "2018-09-14",

那么-为什么我的存储桶大小错误?我已经尝试了不同的日期格式(包括时间),但是没有任何方法可以起作用。有想法吗?

0 个答案:

没有答案