我无法在我的ElasticSearch上使用同义词,我已经尝试了多项但没有任何工作,所以这里的设置是:
首先,我的synonyms.txt文件:
hello => world
其次,我的索引metadatas:
"analysis": {
"filter": {
"ipSynonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"ipAsciiFolding": {
"type": "asciifolding",
"preserve_original": "true"
},
"NoTokenPattern": {
"type": "pattern_capture",
"preserve_original": "true",
"patterns": [".*"]
}
},
"char_filter": {
"ipCharFilter": {
"type": "mapping",
"mappings": ["'=>-",
"_=>-"]
}
},
"analyzer": {
"ipStrictAnalyzer": {
"filter": ["lowercase",
"trim",
"ipSynonym"],
"type": "custom",
"tokenizer": "ipStrictTokenizer"
},
"varIdAnalyser": {
"type": "custom",
"filter": ["lowercase",
"trim"],
"tokenizer": "varIdTokenizer"
},
"pathAnalyzer": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "pathTokenizer"
},
"ipAnalyzer": {
"filter": ["icu_normalizer",
"icu_folding",
"ipSynonym"],
"char_filter": ["ipCharFilter"],
"type": "custom",
"tokenizer": "ipTokenizer"
}
},
"tokenizer": {
"varIdTokenizer": {
"pattern": "([\W_]+|[a-zA-Z0-9]+|[\w]+)",
"type": "pattern",
"group": "0"
},
"ipTokenizer": {
"type": "icu_tokenizer"
},
"pathTokenizer": {
"type": "pattern",
"pattern": "/"
},
"ipStrictTokenizer": {
"type": "keyword"
}
}
}
因为你可以看到,我创建了一个名为ipSynonym的过滤器,类型为' synonym'使用synonym_path到我在ElasticSearch的config文件夹中新创建的synonym.txt文件。
你可以看到我在ipStrictAnalyzer和ipAnalyzer中使用这个过滤器。
现在,这是我在ElasticSearch API上搜索时得到的内容: 首先请求:
http://localhost:9200/media/_analyze?analyzer=ipAnalyzer&text=hello/
答案是:
{
"tokens": [{
"token": "world",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 1
}]
}
这让我觉得同义词过滤器运行正常,对吧? :)
所以我现在在ElasticSearch中执行此查询:
"query": {
"nested": {
"query": {
"wildcard": {
"name.analyzed": {
"value": "*world*"
}
}
},
"path": "name"
}
}
输出是我想要的项目。这一个:
{
"_index": "media",
"_type": "clipdocument",
"_id": "2c215600-b21d-4355-a379-e44db5c9b354",
"_score": 1,
"_source": {
"name": {
"analyzed": "world",
"notAnalyzed": "world"
},
"creationDate": "2015-02-27T23:27:58",
}
}
现在我搜索
"query": {
"nested": {
"query": {
"wildcard": {
"name.analyzed": {
"value": "*hello*"
}
}
},
"path": "name"
}
}
我找不到我之前找到的文件,为什么? :(
答案 0 :(得分:0)
所以,我觉得这个同义词系统很奇怪,但可能是因为我不熟悉它。
我从一个更简单的映射重试,它工作但第一次(如在示例中)我做了synonyms.txt文件坏,我写了hello =>世界,但我想让世界=>你好。所以它现在有点工作。