ElasticSearch聚合:带空格的字符串视为两个字符串

时间:2016-10-03 11:15:30

标签: elasticsearch

我使用此查询以获取单个字段中的值(SQLfying将是SELECT field, count(field) GROUP BY field

为了做到这一点,我将此请求发送给ES:

{
  "query" : {
    "bool" : {
      "must" : {
        "exists" : {
          "field" : "metainfos.ceeaacceaeaaccebeaacceceaaccedeaac"
        }
      }
    }
  },
  "aggregations" : {
    "followUpActivity.metainfo.metainfos.ceeaacceaeaaccebeaacceceaaccedeaac" : {
      "terms" : {
        "field" : "metainfos.ceeaacceaeaaccebeaacceceaaccedeaac",
        "missing" : "null"
      }
    }
  }
}

此系列中只有一个文档:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "living_v1",
      "_type" : "fuas",
      "_id" : "a2cb0ba1-8955-11e6-8a00-0242ac110007",
      "_score" : 1.0,
      "_routing" : "user2",
      "_source" : {
        "user" : "user2",
        "timestamp" : "2016-10-03T11:08:30.074Z",
        "startTimestamp" : "2016-10-03T11:08:30.074Z",
        "dueTimestamp" : null,
        "closingTimestamp" : null,
        "matter" : "Fua 1",
        "comment" : null,
        "status" : 0,
        "backlogStatus" : 20,
        "metainfos" : {
          "ceeaacceaeaaccebeaacceceaaccedeaac" : [ "Living Digital" ]
        },
        "resources" : [ ],
        "notes" : null
      }
    } ]
  }
}

正如您所见doc.metainfos.ceeaacc... = ["Living Digital"]

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "living_v1",
      "_type" : "fuas",
      "_id" : "a2cb0ba1-8955-11e6-8a00-0242ac110007",
      "_score" : 1.0,
      "_routing" : "user2",
      "_source":{"user":"user2","timestamp":"2016-10-03T11:08:30.074Z","startTimestamp":"2016-10-03T11:08:30.074Z","dueTimestamp":null,"closingTimestamp":null,"matter":"Fua 1","comment":null,"status":0,"backlogStatus":20,"metainfos":{"ceeaacceaeaaccebeaacceceaaccedeaac":["Living Digital"]},"resources":[],"notes":null}
    } ]
  },
  "aggregations" : {
    "followUpActivity.metainfo.metainfos.ceeaacceaeaaccebeaacceceaaccedeaac" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "digital",
        "doc_count" : 1
      }, {
        "key" : "living",
        "doc_count" : 1
      } ]
    }
  }
}

ES给了我两个值:一个用于"living",另一个用于"digital"。我希望使用shole值"Living Digital"进行聚合。

映射方案是:

{
  "living_v1" : {
    "mappings" : {
      "fuas" : {
        "properties" : {
          "backlogStatus" : {
            "type" : "long"
          },
          "comment" : {
            "type" : "string"
          },
          "matter" : {
            "type" : "string"
          },
          "metainfos" : {
            "properties" : {
              "ceeaacceaeaaccebeaacceceaaccedeaac" : {
                "type" : "string"
              }
            }
          },
          "startTimestamp" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "status" : {
            "type" : "long"
          },
          "timestamp" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "user" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      }
    }
  }
}

如你所见:

"metainfos" : {
    "properties" : {
        "ceeaacceaeaaccebeaacceceaaccedeaac" : {
             "type" : "string"
         }
     }
 }

对我来说,问题是" ceeaacceaeaaccebeaacceceaaccedeaac"是创建的用户点播属性,我不知道如何将not-analyzed设置为任何metainfos.*字段。

修改

我已经测试过:

#curl -XPUT 'http://localhost:9200/living_v1/' -d '
{
  "mappings": {
    "fuas": {
      "dynamic_templates": [
        {
          "metainfos": {
            "path_match":   "metainfos.*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      ]
    }
  }
}
'

它告诉我living_v1索引已经存在。到目前为止,我已经能够了解here我需要针对PUT发送index

{
    "error":{
    "root_cause":[
        {
            "type":"index_already_exists_exception",
            "reason":"already exists",
            "index":"living_v1"
        }
    ],
    "type":"index_already_exists_exception",
    "reason":"already exists",
    "index":"living_v1"
},
"status":400
}

2 个答案:

答案 0 :(得分:1)

正如您已经注意到的,搜索行为是由默认情况下应用的映射引起的。此映射会对未定义不同的所有字符串值字段进行分析。

因此,如果您还不知道metainfos对象中的哪些属性(=键),则可以使用动态模板功能herehere定义应该为这些字段应用哪个映射,从而覆盖分析字符串字段的默认行为。

您可以应用看起来有点像这样的映射(未经测试):

{
  "mappings": {
    "fuas": {
      "dynamic_templates": [
        {
          "metainfos": {
            "path_match":   "metainfos.*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "index": "not_analyzed",
            }
          }
        }
      ]
    }
  }
}

答案 1 :(得分:1)

正如其他人所指出的那样,动态模板是最佳选择。唯一的问题是,在索引某些文档后,您无法更改索引模板。您需要重新创建索引(删除索引,创建映射,提供新文档)