Elasticsearch - 全场值的基数

时间:2016-05-20 03:23:55

标签: elasticsearch

我有一个看起来像这样的文件:

{
   "_id":"some_id_value",
   "_source":{
      "client":{
         "name":"x"
      },
      "project":{
         "name":"x November 2016"
      }
   }
}

我正在尝试执行一个查询,该查询将获取每个客户端的唯一项目名称的计数。为此,我在cardinality上使用project.name的查询。我确信这个特定客户端只有4个唯一的项目名称。但是,当我运行查询时,我得到5的计数,我知道这是错误的。

项目名称都包含客户端的名称。例如,如果客户是" X",项目名称将是" X测试2016年11月"或" X Jan 2016"等等。我不会' ;不知道这是否是一个考虑因素。

这是文档类型

的映射
{
   "mappings":{
      "vma_docs":{
         "properties":{
            "client":{
               "properties":{
                  "contact":{
                     "type":"string"
                  },
                  "name":{
                     "type":"string"
                  }
               }
            },
            "project":{
               "properties":{
                  "end_date":{
                     "format":"yyyy-MM-dd",
                     "type":"date"
                  },
                  "project_type":{
                     "type":"string"
                  },
                  "name":{
                     "type":"string"
                  },
                  "project_manager":{
                     "index":"not_analyzed",
                     "type":"string"
                  },
                  "start_date":{
                     "format":"yyyy-MM-dd",
                     "type":"date"
                  }
               }
            }
         }
      }
   }
}

这是我的搜索查询

{
   "fields":[
      "client.name",
      "project.name"
   ],
   "query":{
      "bool":{
         "must":{
            "match":{
               "client.name":{
                  "operator":"and",
                  "query":"ABC systems"
               }
            }
         }
      }
   },
   "aggs":{
      "num_projects":{
         "cardinality":{
            "field":"project.name"
         }
      }
   },
   "size":5
}

这些是我得到的结果(为了简洁起见,我仅发布了2个结果)。请找到num_projects聚合返回5,但必须只返回4,这是项目的总数。

{
   "hits":{
      "hits":[
         {
            "_score":5.8553367,
            "_type":"vma_docs",
            "_id":"AVTMIM9IBwwoAW3mzgKz",
            "fields":{
               "project.name":[
                  "ABC"
               ],
               "client.name":[
                  "ABC systems Pvt Ltd"
               ]
            },
            "_index":"vma"
         },
         {
            "_score":5.8553367,
            "_type":"vma_docs",
            "_id":"AVTMIM9YBwwoAW3mzgK2",
            "fields":{
               "project.name":[
                  "ABC"
               ],
               "client.name":[
                  "ABC systems Pvt Ltd"
               ]
            },
            "_index":"vma"
         }
      ],
      "total":18,
      "max_score":5.8553367
   },
   "_shards":{
      "successful":5,
      "failed":0,
      "total":5
   },
   "took":4,
   "aggregations":{
      "num_projects":{
         "value":5
      }
   },
   "timed_out":false
}

仅供参考:项目名称为ABCABC Nov 2016ABC retest NovemberABC Mobile App

1 个答案:

答案 0 :(得分:2)

您的project.name字段需要以下映射:

{
  "mappings": {
    "vma_docs": {
      "properties": {
        "client": {
          "properties": {
            "contact": {
              "type": "string"
            },
            "name": {
              "type": "string"
            }
          }
        },
        "project": {
          "properties": {
            "end_date": {
              "format": "yyyy-MM-dd",
              "type": "date"
            },
            "project_type": {
              "type": "string"
            },
            "name": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            },
            "project_manager": {
              "index": "not_analyzed",
              "type": "string"
            },
            "start_date": {
              "format": "yyyy-MM-dd",
              "type": "date"
            }
          }
        }
      }
    }
  }
}

它基本上是一个名为raw的子字段,project.name中的相同值放在project.name.raw中,但不触及它(标记或分析)。然后您需要使用的查询是:

{
  "fields": [
    "client.name",
    "project.name"
  ],
  "query": {
    "bool": {
      "must": {
        "match": {
          "client.name": {
            "operator": "and",
            "query": "ABC systems"
          }
        }
      }
    }
  },
  "aggs": {
    "num_projects": {
      "cardinality": {
        "field": "project.name.raw"
      }
    }
  },
  "size": 5
}