ElasticSearch DateHistogram聚合填充缺失数据

时间:2015-10-16 09:22:14

标签: java spring elasticsearch spring-data spring-data-elasticsearch

我试图将ElasticSearch弹簧数据用于某些聚合

这是我的查询

final FilteredQueryBuilder filteredQuery = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),
      FilterBuilders.andFilter(FilterBuilders.termFilter("gender", "F"),
      FilterBuilders.termFilter("place", "Arizona"),
      FilterBuilders.rangeFilter("dob").from(from).to(to)));

final MetricsAggregationBuilder<?> aggregateArtifactcount = AggregationBuilders.sum("delivery")
            .field("birth");

    final AggregationBuilder<?> dailyDateHistogarm =
       AggregationBuilders.dateHistogram(AggregationConstants.DAILY).field("dob")
        .interval(DateHistogram.Interval.DAY).subAggregation(aggregateArtifactcount);

    final SearchQuery query = new NativeSearchQueryBuilder().withIndices(index).withTypes(type)
        .withQuery(filteredQuery).addAggregation(dailyDateHistogarm).build();

    return elasticsearchTemplate.query(query, new DailyDeliveryAggregation());

这也是我的聚合

        public class DailyDeliveryAggregation implements ResultsExtractor<List<DailyDeliverySum>> {

@SuppressWarnings("unchecked")
@Override
public List<DailyDeliverySum> extract(final SearchResponse response) {
    final List<DailyDeliverySum> dailyDeliverySum = new ArrayList<DailyDeliverySum>();
    final Aggregations aggregations = response.getAggregations();
    final DateHistogram daily = aggregations.get(AggregationConstants.DAILY);
    final List<DateHistogram.Bucket> buckets = (List<DateHistogram.Bucket>) daily.getBuckets();
    for (final DateHistogram.Bucket bucket : buckets) {
        final Sum sum = (Sum) bucket.getAggregations().getAsMap().get("delivery");
        final int deliverySum = (int) sum.getValue();
        final int delivery = (int) bucket.getDocCount();
        final String dateString = bucket.getKeyAsText().string();
        dailyDeliverySum.add(new DailyDeliverySum(deliverySum, delivery, dateString));
    }
    return dailyDeliverySum;
}
}

它为我提供了正确的数据,但它并不能满足我的所有需求 假设我查询10天的时间范围,如果在给定时间范围内没有日期数据它在日期直方图桶中错过了该日期,但我想将0设置为聚合的默认值和doc count(如果有)没有数据

有办法吗?

2 个答案:

答案 0 :(得分:1)

是的,您可以使用date_histogram聚合的"minimum document count" feature并将其设置为0.这样,您还可以获得不包含任何数据的存储桶:<\ n / p>

final AggregationBuilder<?> dailyDateHistogarm =
   AggregationBuilders.dateHistogram(AggregationConstants.DAILY)
        .field("dob")        
        .minDocCount(0)                          <--- add this line
        .interval(DateHistogram.Interval.DAY)
        .subAggregation(aggregateArtifactcount);

答案 1 :(得分:0)

来自@Val的示例本身并不适用于我(我使用ElasticSearch 6.2.x的高级API)。虽然有效,但是告诉聚合应该将缺失值处理为0:

final AggregationBuilder<?> dailyDateHistogarm =
AggregationBuilders.dateHistogram(AggregationConstants.DAILY)
    .field("dob")        
    .minDocCount(0)                          
    .missing(0)
    .interval(DateHistogram.Interval.DAY)
    .subAggregation(aggregateArtifactcount);