我检索了许多文章,并希望每天查找最相关的文章。 含义->我搜索例如然后选择“ Apple”,然后希望每天获取5条相关的文章(每天一篇)。
Java代码
public List<ArticleEntity> findByKeyword(String keyword, String dateFrom, String dateTo) {
TransportClient client = elasticSearchProvider.getClient();
SearchRequestBuilder requestBuilder = client.prepareSearch("summarizer").setTypes("article").setSearchType(SearchType.QUERY_THEN_FETCH);
BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
boolQueryBuilder.filter(
QueryBuilders.queryStringQuery(keyword + "*").analyzeWildcard(true)
);
DateHistogramAggregationBuilder dateHistogramAggregationBuilder =
AggregationBuilders.dateHistogram("date").field("publish_date").format("yyyy-MM-dd").dateHistogramInterval(DateHistogramInterval.DAY).subAggregation(
AggregationBuilders.topHits("top").size(1).explain(true)
).keyed(true);
requestBuilder.setQuery(boolQueryBuilder)
.addAggregation(dateHistogramAggregationBuilder);
SearchResponse response = requestBuilder.execute().actionGet();
List<ArticleEntity> articleEntityList = new ArrayList<>();
ObjectMapper oMapper = new ObjectMapper();
oMapper.setPropertyNamingStrategy(
PropertyNamingStrategy.CAMEL_CASE_TO_LOWER_CASE_WITH_UNDERSCORES);
Histogram dateHits = response.getAggregations().get("date");
for (Histogram.Bucket entry : dateHits.getBuckets()) {
TopHits topHits = entry.getAggregations().get("top");
for (SearchHit hit : topHits.getHits().getHits()) {
Map<String, Object> source = hit.getSource();
if (source != null) {
articleEntityList.add(oMapper.convertValue(source, ArticleEntity.class));
}
}
}
return articleEntityList;
}
这里的问题是,存储桶的大小大于2000->而对我而言,存储桶的大小应为6-7(我从最近6天开始抓取文章)。
ES文档的一部分:
article_authors: "Dorothy Pitti",
facebook_score: 0,
keywords: "users,apps,nsfw,app,privacy,exchanges,best,screenshot,dont,sexting,messages,youre,good,features",
publish_date: "2018-09-14",
那么-为什么我的存储桶大小错误?我已经尝试了不同的日期格式(包括时间),但是没有任何方法可以起作用。有想法吗?