同义词似乎不适用于通配符请求

时间:2015-03-05 15:53:29

标签: elasticsearch

我无法在我的ElasticSearch上使用同义词,我已经尝试了多项但没有任何工作,所以这里的设置是:

首先,我的synonyms.txt文件:

hello => world

其次,我的索引metadatas:

"analysis": {
    "filter": {
        "ipSynonym": {
            "type": "synonym",
            "synonyms_path": "synonyms.txt"
        },
        "ipAsciiFolding": {
            "type": "asciifolding",
            "preserve_original": "true"
        },
        "NoTokenPattern": {
            "type": "pattern_capture",
            "preserve_original": "true",
            "patterns": [".*"]
        }
    },
    "char_filter": {
        "ipCharFilter": {
            "type": "mapping",
            "mappings": ["'=>-",
            "_=>-"]
        }
    },
    "analyzer": {
        "ipStrictAnalyzer": {
            "filter": ["lowercase",
            "trim",
            "ipSynonym"],
            "type": "custom",
            "tokenizer": "ipStrictTokenizer"
        },
        "varIdAnalyser": {
            "type": "custom",
            "filter": ["lowercase",
            "trim"],
            "tokenizer": "varIdTokenizer"
        },
        "pathAnalyzer": {
            "type": "custom",
            "filter": ["lowercase"],
            "tokenizer": "pathTokenizer"
        },
        "ipAnalyzer": {
            "filter": ["icu_normalizer",
            "icu_folding",
            "ipSynonym"],
            "char_filter": ["ipCharFilter"],
            "type": "custom",
            "tokenizer": "ipTokenizer"
        }
    },
    "tokenizer": {
        "varIdTokenizer": {
            "pattern": "([\W_]+|[a-zA-Z0-9]+|[\w]+)",
            "type": "pattern",
            "group": "0"
        },
        "ipTokenizer": {
            "type": "icu_tokenizer"
        },
        "pathTokenizer": {
            "type": "pattern",
            "pattern": "/"
        },
        "ipStrictTokenizer": {
            "type": "keyword"
        }
    }
}

因为你可以看到,我创建了一个名为ipSynonym的过滤器,类型为' synonym'使用synonym_path到我在ElasticSearch的config文件夹中新创建的synonym.txt文件。

你可以看到我在ipStrictAnalyzer和ipAnalyzer中使用这个过滤器。

现在,这是我在ElasticSearch API上搜索时得到的内容: 首先请求:

http://localhost:9200/media/_analyze?analyzer=ipAnalyzer&text=hello/

答案是:

{
    "tokens": [{
        "token": "world",
        "start_offset": 0,
        "end_offset": 5,
        "type": "SYNONYM",
        "position": 1
    }]
}

这让我觉得同义词过滤器运行正常,对吧? :)

所以我现在在ElasticSearch中执行此查询:

"query": {
    "nested": {
        "query": {
            "wildcard": {
                "name.analyzed": {
                    "value": "*world*"
                }
            }
        },
        "path": "name"
    }
}

输出是我想要的项目。这一个:

{
    "_index": "media",
    "_type": "clipdocument",
    "_id": "2c215600-b21d-4355-a379-e44db5c9b354",
    "_score": 1,
    "_source": {
        "name": {
            "analyzed": "world",
            "notAnalyzed": "world"
        },
        "creationDate": "2015-02-27T23:27:58",
    }
}

现在我搜索

"query": {
    "nested": {
        "query": {
            "wildcard": {
                "name.analyzed": {
                    "value": "*hello*"
                }
            }
        },
        "path": "name"
    }
}

我找不到我之前找到的文件,为什么? :(

1 个答案:

答案 0 :(得分:0)

所以,我觉得这个同义词系统很奇怪,但可能是因为我不熟悉它。

我从一个更简单的映射重试,它工作但第一次(如在示例中)我做了synonyms.txt文件坏,我写了hello =>世界,但我想让世界=>你好。所以它现在有点工作。