Elasticsearch - 查找与给定数组至少具有x个元素的记录

时间:2015-04-07 19:10:42

标签: elasticsearch

我有这样的映射:

"properties": {
    "id": {"type": "long", "index": "not_analyzed"},
    "name": {"type": "string", "index": "not_analyzed"},
    "skills": {"type": "string", "index": "not_analyzed"}
}

我想使用给定的映射将学生的个人资料存储在elasticsearch中。 skills是他们在个人资料中指定的计算机技能列表(python,javascript,...)。

鉴于['html', 'css', 'sass', 'javascript', 'django', 'bootstrap', 'angularjs', 'backbone']这样的技能组合,我想找到所有具有此技能组合中至少3项技能的档案。我不想知道他们与我们想要的名单有什么共同点,只对计数感兴趣。有没有办法在elasticsearch中做到这一点?

2 个答案:

答案 0 :(得分:3)

可能有一种更好的方式我没想到,但你可以用script filter来做。

我设置了索引的简化版本,其中包含一些文档:

PUT /test_index
{
   "settings": {
      "number_of_shards": 1
   },
   "mappings": {
      "doc": {
         "properties": {
            "skills": {
               "type": "string",
               "index": "not_analyzed"
            }
         }
      }
   }
}

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"skills":["html","css","javascript"]}
{"index":{"_id":2}}
{"skills":["bootstrap", "angularjs", "backbone"]}
{"index":{"_id":3}}
{"skills":["python", "javascript", "ruby","java"]}

然后运行此查询:

POST /test_index/_search
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "script": {
               "script": "count=0; for(s: doc['skills'].values){ for(x: skills){ if(s == x){ count +=1 } } } count >= 3",
               "params": {
                  "skills": ["html", "css", "sass", "javascript", "django", "bootstrap", "angularjs", "backbone"]
               }
            }
         }
      }
   }
}

并取回了我的预期:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 1,
            "_source": {
               "skills": [
                  "html",
                  "css",
                  "javascript"
               ]
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 1,
            "_source": {
               "skills": [
                  "bootstrap",
                  "angularjs",
                  "backbone"
               ]
            }
         }
      ]
   }
}

以下是所有代码:

http://sense.qbox.io/gist/1018a01f1df29cb793ea15661f22bc8b25ed3476

答案 1 :(得分:2)

可以使用query string和minimum_should_match选项

示例:

POST <index>/_search 
{
        "query": {
            "filtered": {
               "filter": {
                   "query": { 
                        "query_string": {
                            "default_field": "skills",
                            "query": "html css sass javascript django bootstrap angularjs backbone \"ruby on rails\" ",
                            "minimum_should_match" : "3"
                        }
                   }
               }
            }
        }  
}