Question

我在elasticsearch中有这种格式的数据，每个都在单独的文档中：

{'pid'：1，'nm'：'tom'}，{'pid'：1，'nm'：'dick''}，{'pid'：1，'nm'：'harry' }，{'pid'：2，'nm'：'tom'}，{'pid'：2，'nm'：'harry'}，{'pid'：3，'nm'：'dick'}， {'pid'：3，'nm'：'harry'}，{'pid'：4，'nm'：'harry'}

    {
       "took": 137,
       "timed_out": false,
       "_shards": {
          "total": 5,
          "successful": 5,
          "failed": 0
       },
       "hits": {
          "total": 8,
          "max_score": null,
          "hits": [
             {
                "_index": "query_test",
                "_type": "user",
                "_id": "AVj9KS86AaDUbQTYUmwY",
                "_score": null,
                "_source": {
                   "pid": 1,
                   "nm": "Harry"
                }
             },
             {
                "_index": "query_test",
                "_type": "user",
                "_id": "AVj9KJ9BAaDUbQTYUmwW",
                "_score": null,
                "_source": {
                   "pid": 1,
                   "nm": "Tom"
                }
             },
             {
                "_index": "query_test",
                "_type": "user",
                "_id": "AVj9KRlbAaDUbQTYUmwX",
                "_score": null,
                "_source": {
                   "pid": 1,
                   "nm": "Dick"
                }
             },
             {
                "_index": "query_test",
                "_type": "user",
                "_id": "AVj9KYnKAaDUbQTYUmwa",
                "_score": null,
                "_source": {
                   "pid": 2,
                   "nm": "Harry"
                }
             },
             {
                "_index": "query_test",
                "_type": "user",
                "_id": "AVj9KXL5AaDUbQTYUmwZ",
                "_score": null,
                "_source": {
                   "pid": 2,
                   "nm": "Tom"
                }
             },
             {
                "_index": "query_test",
                "_type": "user",
                "_id": "AVj9KbcpAaDUbQTYUmwb",
                "_score": null,
                "_source": {
                   "pid": 3,
                   "nm": "Dick"
                }
             },
             {
                "_index": "query_test",
                "_type": "user",
                "_id": "AVj9Kdy5AaDUbQTYUmwc",
                "_score": null,
                "_source": {
                   "pid": 3,
                   "nm": "Harry"
                }
             },
             {
                "_index": "query_test",
                "_type": "user",
                "_id": "AVj9KetLAaDUbQTYUmwd",
                "_score": null,
                "_source": {
                   "pid": 4,
                   "nm": "Harry"
                }
             }
          ]
       }
    }

我需要找到具有'harry'并且没有'tom'的pid，在上面的示例中是3和4.哪个基本意味着寻找具有相同pid的文档，其中没有任何一个具有nm值'tom'但其中至少有一个的nm值为'harry'。

如何查询？

编辑：使用Elasticsearch版本5

Answer 1

如果您的POST请求正文如下所示，您可以使用bool：

POST _search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "nm" : "harry" }
      },
      "must_not" : {
        "term" : { "nm" : "tom" }
      }
    }
  }
}

Answer 2

我在Elasticsearch中比较新，所以我可能错了。但我从未见过这样的疑问。这里不能使用简单过滤器，因为它们应用于您不想要的doc（而不是聚合）。我所看到的是你想做一个＆＃34; Group by＆＃34;查询＆＃34;有＆＃34;子句（就SQL而言）。但是分组查询涉及一些聚合（如avg，max，min of any field），这些聚合用于＆＃34;拥有＆＃34;条款。基本上，您使用reducer进行聚合结果的Post处理。对于像这样的查询，可以使用 Bucket Selector Aggregation 。阅读this
但你的情况有所不同。您不希望在任何度量标准聚合上应用Having子句，但是您想要检查＆＃34;组的字段（或列）中是否存在某些值。数据。就SQL而言，你想做一个＆＃34;其中＆＃34;查询＆＃34; group by＆＃34;。这是我从未见过的。您还可以阅读this
但是，在应用程序级别，您可以通过破坏查询轻松完成此操作。首先使用term aggs找到nm = harry的独特pid。然后获取具有附加条件nm！= tom的那些pid的文档。

我是ES的新手。如果任何一个与我相矛盾的人在一个查询中显示出这样做的方法，我将非常高兴。我也会学到这一点。

ElasticSearch查询具有多个文档的条件

2 个答案: