Searchkick / Elasticsearch聚合的结果不一致

时间:2020-05-06 22:23:21

标签: elasticsearch searchkick

首先,我将清理IRB的输出(这是来自生产的)。

我在Rails应用程序中使用了一些搜索支持聚合。在生产中进行了良好的开发测试,但结果却不一致。我的字段中的数据应该清楚地显示在汇总中。

这是我原来的活动记录数据-我的BudgetItem模型有一个total

irb(main):008:0> BudgetItem.where(budget_id: 3).order(:cbs_item_id).select(:id, :cbs_item_id, :total)
  BudgetItem Load (1.4ms)  SELECT  "budget_items"."id", "budget_items"."cbs_item_id", "budget_items"."total" FROM "budget_items" WHERE "budget_items"."company_id" = 26 AND "budget_items"."budget_id" = $1 ORDER BY "budget_items"."cbs_item_id" ASC LIMIT $2  [["budget_id", 3], ["LIMIT", 11]]
=> #<ActiveRecord::Relation [#<BudgetItem id: 28, company_id: 26, cbs_item_id: 3, total: 0.1e5>, #<BudgetItem id: 29, company_id: 26, cbs_item_id: 12, total: 0.8e5>, #<BudgetItem id: 34, company_id: 26, cbs_item_id: 15, total: 0.1e5>, #<BudgetItem id: 41, company_id: 26, cbs_item_id: 16, total: 0.141e6>, #<BudgetItem id: 35, company_id: 26, cbs_item_id: 18, total: 0.1e5>, #<BudgetItem id: 33, company_id: 26, cbs_item_id: 18, total: 0.12e5>, #<BudgetItem id: 27, company_id: 26, cbs_item_id: 20, total: 0.2e5>, #<BudgetItem id: 6, company_id: 26, cbs_item_id: 23, total: 0.184e6>, #<BudgetItem id: 5, company_id: 26, cbs_item_id: 23, total: 0.2288e6>, #<BudgetItem id: 30, company_id: 26, cbs_item_id: 24, total: 0.45e5>, ...]>

这个数字是41

通过Searchkick进行的相同搜索:

BudgetItem.search("*", where: {budget_id: 3}).count = 41

甚至是这个:

irb(main):025:0> BudgetItem.search("*", where: {budget_id: 3, cbs_item_id: 16}).first.total
= 26 AND "budget_items"."id" = $1  [["id", 41]]
=> 0.141e6

请注意cbs_item_id: 16, total: 0.141e6(141000)-该值显然在模型中。

现在,我尝试对此运行聚合:

irb(main):019:0> BudgetItem.search("*", body_options: { aggs: { cbs: { terms: { field: "cbs_item_id" }, aggs: { "total": { "sum": { "field": "total" } } } } } },  where: {budget_id: 3}).aggs
  BudgetItem Search (5.4ms)  pacific-canbriam-20191213_budget_items_production/_search {"query":{"bool":{"must":{"match_all":{}},"filter":[{"term":{"budget_id":{"value":3}}}]}},"timeout":"11s","_source":false,"size":10000,"aggs":{"cbs":{"terms":{"field":"cbs_item_id"},"aggs":{"total":{"sum":{"field":"total"}}}}}}
=> {"cbs"=>{"doc_count_error_upper_bound"=>0, "sum_other_doc_count"=>13, "buckets"=>[{"key"=>24, "doc_count"=>4, "total"=>{"value"=>90000.0}}, {"key"=>25, "doc_count"=>4, "total"=>{"value"=>114000.0}}, {"key"=>39, "doc_count"=>4, "total"=>{"value"=>107325.0}}, {"key"=>43, "doc_count"=>4, "total"=>{"value"=>209820.0}}, {"key"=>18, "doc_count"=>2, "total"=>{"value"=>22000.0}}, {"key"=>23, "doc_count"=>2, "total"=>{"value"=>412800.0}}, {"key"=>38, "doc_count"=>2, "total"=>{"value"=>13500.0}}, {"key"=>49, "doc_count"=>2, "total"=>{"value"=>161000.0}}, {"key"=>57, "doc_count"=>2, "total"=>{"value"=>20300.0}}, {"key"=>58, "doc_count"=>2, "total"=>{"value"=>32200.0}}]}}

数据完全不一致,缺少aggs。注意键16丢失。疯狂的事情是我在另一列上进行了另一种聚合,并且该聚合绝对正常。我在这里想念什么吗?我已经尝试过settings: {number_of_shards: 1}

更令人沮丧的是,

  aggs: {
    "grand_total": { "sum": { "field": "total"  } },
   }

total的常规agg的工作结果相同。

1 个答案:

答案 0 :(得分:0)

尝试在聚合中设置size。根据回复中的sum_other_doc_count,未返回13个文档。