Elasticsearh匹配尽可能多的字段

时间:2016-10-07 18:54:08

标签: elasticsearch

假设我们有以下属性:

gender=male
location=US
network=Facebook

我在Elasticsearch中存储了以下数据:

{ some_data: {}, attributes: ["US", "Facebook"] }
{ some_data: {}, attributes: ["Facebook"] }
{ some_data: {}, attributes: ["male", "AR", "LinkedIn"] }
{ some_data: {}, attributes: ["female", "US", "Facebook"] }
{ some_data: {}, attributes: ["male", "US", "LinkedIn"] }
{ some_data: {}, attributes: ["male", "US", "Facebook"] }

我希望Elasticsearch返回符合"属性的所有以下内容"变量完全正确。例如:

1) attributes: ["male", "US", "Facebook"] # All attributes match
2) attributes: ["male", "US"] # Two attributes combined match
3) attributes: ["male", "Facebook"] # Two attributes combined match
4) attributes: ["US", "Facebook"] # Two attributes combined match
5) attributes: ["male"] # Only one matches
6) attributes: ["US"] # Only one matches
7) attributes: ["Facebook"] # Only one matches

在这个例子中,我们得到:

1) { some_data: {}, attributes: ["male", "US", "Facebook"] } # All match
2) { some_data: {}, attributes: ["US", "Facebook"] } # Two matches
3) { some_data: {}, attributes: ["Facebook"] } # One match

必须考虑两件事:

1)我不想要所有符合性别='男性'的字段。我只想要那些完全匹配开头给出的字段组合的结果。 2)该算法必须可用于n个元素。在这个例子中,我用3来简化它,但我们可能有30个属性要查询。

因此,对数据库只有一个查询会很好。

1 个答案:

答案 0 :(得分:3)

正如我之前评论的the documentation中所述,最简单的方法是添加tag_count字段并以这种方式执行查询。要获得所需的行为,您需要指定(male AND tag_count=1) OR (male AND facebook AND tag_count=2),其转换为 Elasticsearch DSL中的SHOULD [(MUST male and tag_count=1) (MUST male and facebook and tag_count=2)]。 (应该是OR,必须是AND)。

由于显而易见的原因,30张标签不能很好地扩展,但这可能会让你走上正轨。

将以下数据插入Elasticsearch:

{ "tags":["male"], "tag_count":1 }
{ "tags":["male","facebook"], "tag_count":2 }
{ "tags":["male","linkedin"], "tag_count":2 }
{ "tags":["male","US", "facebook"], "tag_count":3 }
{ "tags":["male","Germany", "facebook"], "tag_count":3 }

这个查询:

{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "should": [
            {
              "bool": {
                "must": [
                  {
                    "term": {
                      "tags": "male"
                    }
                  },
                  {
                    "term": {
                      "tag_count": 1
                    }
                  }
                ]
              }
            },
            {
              "bool": {
                "must": [
                  {
                    "term": {
                      "tags": "male"
                    }
                  },
                  {
                    "term": {
                      "tags": "facebook"
                    }
                  },
                  {
                    "term": {
                      "tag_count": 2
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

我得到以下结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "test_index",
      "_type" : "mult_query",
      "_id" : "AVegvUyzNutW6yNguPqZ",
      "_score" : 1.0,
      "_source" : {
        "tags" : [ "male" ],
        "tag_count" : 1
      }
    }, {
      "_index" : "test_index",
      "_type" : "mult_query",
      "_id" : "AVegvPSFNutW6yNguPqX",
      "_score" : 1.0,
      "_source" : {
        "tags" : [ "male", "facebook" ],
        "tag_count" : 2
      }
    } ]
  }
}