Question

我的问题是：Elasticsearch计数与我的数据库不同。

我为“用户”表编制索引，每个用户可以拥有一个或多个apps_events：

curl localhost:9200/users/_count
{"count":190291,"_shards":{"total":5,"successful":5,"failed":0}}

SELECT COUNT(*) FROM users
count : 190291

=＆GT;一样，一切都好！

但是，当我对2个过滤器进行搜索时，一个术语和一个术语是嵌套资源：

curl -X GET 'http://localhost:9200/users/user/_search?load=&size=10&pretty' -d '
{
"query": {
  "match_all": {
  }
},
"filter": {
  "and": [
    {
      "terms": {
        "apps_events.type": [
          "sale"
        ]
      }
    },
    {
      "term": {
        "apps_events.status": "active"
      }
    }
  ]
},
"size": 10
}

total : 63756

在我的数据库中：

SELECT
  COUNT(DISTINCT(users_id))
FROM
  apps_event
WHERE
  apps_event_state_id = 1 AND apps_event_project_id = 2;

count : 63340

因为实际上，elasticsearch SQL等价查询是：

SELECT
  COUNT(DISTINCT(users_id))
FROM apps_event
WHERE apps_event_state_id = 1
AND users_id IN
  (SELECT DISTINCT(users_id) FROM apps_event WHERE apps_event_project_id = 2)

count : 63756

===＆GT;我如何为每个资源做一个简单的“AND”？

由于

Answer 1

你可能已经检查了这一点，apps_event_project_id是apps_events.type的正确推论吗？它们在表面上看起来并不相同，但你肯定会知道。此外，users_id是否直接映射到ES _id？可能是你的索引中有重复项会导致其数量膨胀。

Answer 2

“嵌套资源”的最佳资源： http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

在嵌套资源上使用AND过滤

2 个答案: