Question

我试图让Elasticsearch忽略连字符。我不希望它将连字符的任何一边分成单独的单词。看起来很简单，但是我把头撞在墙上。

我想要字符串＆＃34; Roland JD-Xi＆＃34;产生以下条款： [roland jd-xi，roland，jd-xi，jdxi，roland jdxi]

我无法轻易实现这一目标。大多数人只会键入＆＃39; jdxi＆＃39;所以我最初的想法就是删除连字符。所以我使用以下定义

  name: {
"type": "string",
"analyzer": "language",
"include_in_all": true,
"boost": 5,
"fields": {
    "my_standard": {
        "type": "string",
        "analyzer": "my_standard"
    },
    "my_prefix": {
        "type": "string",
        "analyzer": "my_text_prefix",
        "search_analyzer": "my_standard"
    },
    "my_suffix": {
        "type": "string",
        "analyzer": "my_text_suffix",
        "search_analyzer": "my_standard"
    }
}

}

相关的分析器和过滤器定义为

{
"number_of_replicas": 0,
"number_of_shards": 1,
"analysis": {
    "analyzer": {
        "std": {
            "tokenizer": "standard",
            "char_filter": "html_strip",
            "filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "length", "strip_hyphens"]
        ...
        "my_text_prefix": {
            "tokenizer": "whitespace",
            "char_filter": "my_filter",
            "filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "edge_ngram_front"]
        },
        "my_text_suffix": {
            "tokenizer": "whitespace",
            "char_filter": "my_filter",
            "filter": ["standard", "elision", "asciifolding", "lowercase", "stop", "edge_ngram_back"]
        },
        "my_standard": {
            "type": "custom",
            "tokenizer": "whitespace",
            "char_filter": "my_filter",
            "filter": ["standard", "elision", "asciifolding", "lowercase"]
        }
    },
    "char_filter": {
        "my_filter": {
            "type": "mapping",
            "mappings": ["- => ", ". => "]
        }
    },
    "filter": {
        "edge_ngram_front": {
            "type": "edgeNGram",
            "min_gram": 1,
            "max_gram": 20,
            "side": "front"
        },
        "edge_ngram_back": {
            "type": "edgeNGram",
            "min_gram": 1,
            "max_gram": 20,
            "side": "back"
        },
        "strip_spaces": {
            "type": "pattern_replace",
            "pattern": "\\s",
            "replacement": ""
        },
        "strip_dots": {
            "type": "pattern_replace",
            "pattern": "\\.",
            "replacement": ""
        },
        "strip_hyphens": {
            "type": "pattern_replace",
            "pattern": "-",
            "replacement": ""
        },
        "stop": {
            "type": "stop",
            "stopwords": "_none_"
        },
        "length": {
            "type": "length",
            "min": 1
        }
    }
}

我已经能够测试（即_analyze）这个和字符串＆＃34; Roland JD-Xi＆＃34;被标记为 [roland，jdxi]

这不完全是我想要的，但足够接近它应该匹配＆＃39; jdxi＆＃39;。

但这就是我的问题。如果我做一个简单的＆＃34; index / _search？q = jdxi＆＃34;它没有带回文件。但是，如果我做了＆＃34; index / _search？q = roland + jdxi＆＃34;它确实带回了文件。

所以至少我知道连字符被删除了，但是如果令牌＆＃34; roland＆＃34;和＆＃34; jdxi＆＃34;正在创建怎样来＆＃34; index / _search？q = jdxi＆＃34;与文件不匹配？

我的索引流程或查询流程有问题吗？
我该如何解决？
任何人都可以解释如何实现所需的令牌 [roland jd-xi，roland，jd-xi，jdxi，roland jdxi]

Answer 1

我在ES 6上复制了您的案例，并且搜索index/_search?q=jdxi会返回该文档。

问题可能是在搜索index/_search?q=jdxi而未指定字段时，它基本上会在_all中搜索包含name字段中的内容（与{{基本相同） 1}}）。由于未使用index/_search?q=name:jdxi分析器分析该字段，因此无法获得任何结果。

您应该做的是使用my_standard子字段进行搜索，即my_standard，并且非常确定您会获得该文档。

Elasticsearch - Analyzer创建正确的令牌，但查询不匹配

1 个答案: