elasticsearch映射tokenizer关键字以避免拆分令牌并启用通配符

时间:2014-10-21 11:53:24

标签: elasticsearch

我尝试在给定字段上使用angularjs和elasticsearch创建自动完成功能,例如countryname。它可以包含简单的名称,例如" France"," Spain"或者"组成的名字"喜欢"塞拉利昂"。

在映射中,此字段为not_analyzed,以防止弹性标记化"组合名称"

"COUNTRYNAME" : {"type" : "string", "store" : "yes","index": "not_analyzed" }

我需要查询elasticsearch:

  • 使用" countryname:value"等过滤文档。其中value可以包含通配符
  • 并对过滤器返回的国家/地区名称进行汇总,(我只进行汇总以获取不同的数据,这对我来说计数没用,也许有更好的解决方案)

我不能在" not_analyzed"中使用通配符。领域:

这是我的查询,但是"值"变量不起作用且区分大小写:

单独使用通配符她的工作:

curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{
  "fields": [
    "COUNTRYNAME"
  ],
  "query": {
    "query_string": {
      "query": "COUNTRYNAME:*"
    }
  },
  "aggs": {
    "general": {
      "terms": {
        "field": "COUNTRYNAME",
        "size": 0
      }
    }
  }
}'

但这不起作用(法郎*):

curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{
  "fields": [
    "COUNTRYNAME"
  ],
  "query": {
    "query_string": {
      "query": "COUNTRYNAME:Franc*"
    }
  },
  "aggs": {
    "general": {
      "terms": {
        "field": "COUNTRYNAME",
        "size": 0
      }
    }
  }
}'

我也尝试使用bool must query,但不使用此not_analyzed字段和通配符:

curl -XGET 'local_host:9200/botanic/specimens/_search?size=0' -d '{
  "fields": [
    "COUNTRYNAME"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "COUNTRYNAME": "Franc*"
          }
        }
      ]
    }
  },
  "aggs": {
    "general": {
      "terms": {
        "field": "COUNTRYNAME",
        "size": 0
      }
    }
  }
}'

我错过了什么或做错了什么?我应该在映射中离开字段analyzed并使用另一个不将组合名称拆分为令牌的分析器吗?

1 个答案:

答案 0 :(得分:22)

我找到了一个有效的解决方案:“关键字”标记器。 创建一个自定义分析器,并在我希望保留的字段的映射中使用它,而不是按空格分割:

    curl -XPUT 'localhost:9200/botanic/' -d '{
 "settings":{
     "index":{
        "analysis":{
           "analyzer":{
              "keylower":{
                 "tokenizer":"keyword",
                 "filter":"lowercase"
              }
           }
        }
     }
  },
  "mappings":{
        "specimens" : {
            "_all" : {"enabled" : true},
            "_index" : {"enabled" : true},
            "_id" : {"index": "not_analyzed", "store" : false},
            "properties" : {
                "_id" : {"type" : "string", "store" : "no","index": "not_analyzed"  } ,
            ...
                "LOCATIONID" : {"type" : "string",  "store" : "yes","index": "not_analyzed" } ,
                "AVERAGEALTITUDEROUNDED" : {"type" : "string",  "store" : "yes","index": "analyzed" } ,
                "CONTINENT" : {"type" : "string","analyzer":"keylower" } ,
                "COUNTRYNAME" : {"type" : "string","analyzer":"keylower" } ,                
                "COUNTRYCODE" : {"type" : "string", "store" : "yes","index": "analyzed" } ,
                "COUNTY" : {"type" : "string","analyzer":"keylower" } ,
                "LOCALITY" : {"type" : "string","analyzer":"keylower" }                 
            }
        }
    }
}'

所以我可以在不分割的字段COUNTRYNAME上使用通配符:

curl -XGET 'localhost:9200/botanic/specimens/_search?size=10' -d '{
"fields"  : ["COUNTRYNAME"],     
"query": {"query_string" : {
                    "query": "COUNTRYNAME:bol*"
}},
"aggs" : {
    "general" : {
        "terms" : {
            "field" : "COUNTRYNAME", "size":0
        }
    }
}}'

结果:

{
    "took" : 14,
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "hits" : {
        "total" : 45,
        "max_score" : 1.0,
        "hits" : [{
                "_index" : "botanic",
                "_type" : "specimens",
                "_id" : "91E7B53B61DF4E76BF70C780315A5DFD",
                "_score" : 1.0,
                "fields" : {
                    "COUNTRYNAME" : ["Bolivia, Plurinational State of"]
                }
            }, {
                "_index" : "botanic",
                "_type" : "specimens",
                "_id" : "7D811B5D08FF4F17BA174A3D294B5986",
                "_score" : 1.0,
                "fields" : {
                    "COUNTRYNAME" : ["Bolivia, Plurinational State of"]
                }
            } ...
        ]
    },
    "aggregations" : {
        "general" : {
            "buckets" : [{
                    "key" : "bolivia, plurinational state of",
                    "doc_count" : 45
                }
            ]
        }
    }
}