ElasticSearch-带有过滤器的自定义分析器-未应用过滤器

时间:2020-01-23 15:38:37

标签: elasticsearch elasticsearch-analyzers

我有以下查询:

GET /nameofmyindex/_analyze
{
  "text" : "Limousinetesting",
  "explain": true,
  "analyzer": "default"
}

结果是:

{
  "detail" : {
    "custom_analyzer" : true,
    "charfilters" : [ ],
    "tokenizer" : {
      "name" : "standard",
      "tokens" : [
        {
          "token" : "Limousinetesting",
          "start_offset" : 0,
          "end_offset" : 16,
          "type" : "<ALPHANUM>",
          "position" : 0,
          "bytes" : "[4c 69 6d 6f 75 73 69 6e 65 74 65 73 74 69 6e 67]",
          "positionLength" : 1,
          "termFrequency" : 1
        }
      ]
    },
    "tokenfilters" : [ ]
  }
}

我的索引配置如下:

{
   "nameofmyindex":{
      "aliases":{

      },
      "mappings":{
         "properties":{
            "author":{
               "type":"integer"
            },
            "body:value":{
               "type":"text",
               "fields":{
                  "keyword":{
                     "type":"keyword",
                     "ignore_above":256
                  }
               }
            },
            "changed":{
               "type":"date",
               "format":"epoch_second"
            },
            "created":{
               "type":"date",
               "format":"epoch_second"
            },
            "id":{
               "type":"keyword"
            },
            "promote":{
               "type":"boolean"
            },
            "search_api_language":{
               "type":"keyword"
            },
            "sticky":{
               "type":"boolean"
            },
            "title":{
               "type":"text",
               "boost":5.0,
               "fields":{
                  "keyword":{
                     "type":"keyword",
                     "ignore_above":256
                  }
               }
            },
            "type":{
               "type":"keyword"
            }
         }
      },
      "settings":{
         "index":{
            "number_of_shards":"1",
            "provided_name":"nameofmyindex",
            "creation_date":"1579792687839",
            "analysis":{
               "filter":{
                  "stop":{
                     "type":"stop",
                     "stopwords":[
                        "i",
                        "me",
                        "my",
                        "myself"
                     ]
                  },
                  "synonym":{
                     "type":"synonym",
                     "lenient":"true",
                     "synonyms":[
                        "P-Card, P Card => P-Card",
                        "limousinetesting => limousine"
                     ]
                  }
               },
               "analyzer":{
                  "default":{
                     "type":"custom",
                     "filters":[
                        "lowercase",
                        "stop",
                        "synonym"
                     ],
                     "tokenizer":"standard"
                  }
               }
            },
            "number_of_replicas":"1",
            "uuid":"QTlVnyWVRLayEfPWTrcgdg",
            "version":{
               "created":"7050199"
            }
         }
      }
   }
}

如您所见,带有过滤器的默认分析器无效,“ Limousinetesting”一词没有收到其“ limousine”同义词。

分析仪应如何使过滤器有效?即使是最简单的过滤器,在这种情况下也不会出现小写字母。

1 个答案:

答案 0 :(得分:1)

问题出在您用于创建索引设置的语法中,我能够重现您的问题并予以解决。问题是您正在JSON数组中使用filters来定义所有过滤器,尽管您可以在该数组中定义许多过滤器(如{{ 3}}。

请在下面找到用于创建索引的正确格式:

filter

现在,当我使用上述映射创建索引并用您的文本命中analyst API时,我将获得其同义词标记{ "mappings": { "properties": { "author": { "type": "integer" }, "body:value": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "changed": { "type": "date", "format": "epoch_second" }, "created": { "type": "date", "format": "epoch_second" }, "id": { "type": "keyword" }, "promote": { "type": "boolean" }, "search_api_language": { "type": "keyword" }, "sticky": { "type": "boolean" }, "title": { "type": "text", "boost": 5, "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "type": { "type": "keyword" } } }, "settings": { "index": { "number_of_shards": "1", "analysis": { "filter": { "stop": { "type": "stop", "stopwords": [ "i", "me", "my", "myself" ] }, "synonym": { "type": "synonym", "lenient": "true", "synonyms": [ "P-Card, P Card => P-Card", "limousinetesting => limousine" ] } }, "analyzer": { "default": { "type": "custom", "filter": [ --> Notice the change in filters to filter "lowercase", "stop", "synonym" ], "tokenizer": "standard" } } }, "number_of_replicas": "1" } } } ,如以下输出所示。

limousine