Question

我正在尝试搜索由elasticsearch和icu_tokenizer索引的文本，但无法使其正常工作。

我的测试用例是将句子标记为“你好。我来自曼谷“，在泰国สวัสดีผมมาจากกรุงเทพฯ，应该被标记为五个字สวัสดี，ผม，มา，จาก，กรุงเทพฯ。（Sample from Elasticsearch - The Definitive Guide）

使用最后四个单词中的任何一个搜索都失败了。使用任何空格分隔词搜索สวัสดี或ผมมาจากกรุงเทพฯ都可以。

如果我在命令行中指定icu_tokenizer，例如

curl -XGET 'http://localhost:9200/icu/_analyze?tokenizer=icu_tokenizer' -d "สวัสดี ผมมาจากกรุงเทพฯ"

它标记为五个单词。

我的设置是：

curl http://localhost:9200/icu/_settings?pretty
{
  "icu" : {
    "settings" : {
      "index" : {
        "creation_date" : "1474010824865",
        "analysis" : {
          "analyzer" : {
            "nfkc_cf_normalized" : [ "icu_normalizer" ],
            "tokenizer" : "icu_tokenizer"
          }
        }
      },
      "number_of_shards" : "5",
      "number_of_replicas" : "1",
      "uuid" : "tALRehqIRA6FGPu8iptzww",
      "version" : {
        "created" : "2040099"
      }
    }
  }
}

索引填充

curl -XPOST 'http://localhost:9200/icu/employee/' -d '
{
  "first_name" : "John",
  "last_name" : "Doe",
  "about" :  "สวัสดี ผมมาจากกรุงเทพฯ"
}'

使用

进行搜索

curl -XGET 'http://localhost:9200/_search' -d'
{
  "query" : {
    "match" : {
      "about" : "กรุงเทพฯ"
    }
  }
}'

不返回任何内容（“点击”：[]）。使用สวัสดี或ผมมาจากกรุงเทพone进行相同的搜索工作正常。

我想我错误配置了索引，应该怎么做？

Answer 1

缺少的部分是：

  "mappings": {
    "employee" : {
      "properties": {
        "about":{
          "type": "text", 
          "analyzer": "icu_analyzer"
        }
      }
    }
  }

在映射中，必须指定分析仪要使用的文档字段。

[索引]：icu

[类型]：员工

[字段]：大约

PUT /icu
{
  "settings": {
    "analysis": {
      "analyzer": {
        "icu_analyzer" : {
          "char_filter": [
              "icu_normalizer"
          ],
          "tokenizer" : "icu_tokenizer"
        }
      }
    }
  },
  "mappings": {
    "employee" : {
      "properties": {
        "about":{
          "type": "text", 
          "analyzer": "icu_analyzer"
        }
      }
    }
  }
}

使用以下DSLJson测试自定义分析器

POST /icu/_analyze
{
  "text": "สวัสดี ผมมาจากกรุงเทพฯ",
  "analyzer": "icu_analyzer"
}

结果应为[สวัสดี，ผม，มา，จาก，กรุงเทพฯ]

我的建议是：

Kibana : Dev Tool could help you for effective query crafting

如何配置弹性搜索以使用icu_tokenizer？

1 个答案: