Question

我使用自定义分析器构建了一个ElasticSearch索引，该分析器使用letter tokenizer和lower_case以及word_delimiter令牌过滤器。然后我尝试搜索包含下划线分隔的子词的文档，例如abc_xyz，仅使用其中一个子词，例如abc，但它没有任何结果。当我尝试全字，即abc_xyz时，它确实找到了该文件。

然后我将文档更改为以破折号分隔的子词，例如abc-xyz并尝试再次使用子词进行搜索，并且有效。

为了尝试了解发生了什么，我想我会使用_termvector服务检查为我的文档生成的术语，结果对于两者都是相同的，下划线分隔的子词和破折号 - 分开的子词，所以我希望在两种情况下搜索的结果都是相同的。

知道我可能做错了吗？

如果有帮助，这是我用于索引的设置：

{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "cmt_value_analyzer": {
            "tokenizer": "letter",
            "filter": [
              "lowercase",
              "my_filter"
            ],
            "type": "custom"
          }
        },
        "filter": {
          "my_filter": {
            "type": "word_delimiter"
          }
        }
      }
    }
  },
  "mappings": {
    "alertmodel": {
      "properties": {
        "name": {
          "analyzer": "cmt_value_analyzer",
          "term_vector": "with_positions_offsets_payloads",
          "type": "string"
        },
        "productId": {
          "type": "double"
        },
        "productName": {
          "analyzer": "cmt_value_analyzer",
          "term_vector": "with_positions_offsets_payloads",
          "type": "string"
        },
        "link": {
          "analyzer": "cmt_value_analyzer",
          "term_vector": "with_positions_offsets_payloads",
          "type": "string"
        },
        "updatedOn": {
          "type": "date"
        }
      }
    }
  }
}

＆＃34;信＆＃34; tokenizer和＆＃34; word_delimiter＆＃34;过滤器不使用下划线

0 个答案: