我喜欢从Elasticsearch使用Edge-NGrams索引数据和不同的搜索分析器获得的结果。但是,我希望匹配的较短术语的排名高于较长的术语。
例如,请使用ABC100
和ABC100xxx
这两个词。如果我使用术语ABC
执行查询,我会将这两个文档作为具有相同分数的匹配返回。我希望ABC100
得分高于ABC100xxx
因为ABC
与Levenshtein distance algorithm之类的ABC100
更接近匹配PUT stackoverflow
{
"settings": {
"index": {
"number_of_replicas": 0,
"number_of_shards": 1
},
"analysis": {
"filter": {
"edge_ngram": {
"type": "edgeNGram",
"min_gram": "1",
"max_gram": "20"
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"edge_ngram"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"product": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "whitespace"
}
}
}
}
}
。
设置索引:
PUT stackoverflow/doc/1
{
"product": "ABC100"
}
PUT stackoverflow/doc/2
{
"product": "ABC100xxx"
}
插入文件:
GET stackoverflow/_search?pretty
{
"query": {
"match": {
"product": "ABC"
}
}
}
搜索查询:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.28247002,
"hits": [
{
"_index": "stackoverflow",
"_type": "doc",
"_id": "2",
"_score": 0.28247002,
"_source": {
"product": "ABC100xxx"
}
},
{
"_index": "stackoverflow",
"_type": "doc",
"_id": "1",
"_score": 0.28247002,
"_source": {
"product": "ABC100"
}
}
]
}
}
结果:
ABC100
是否有人知道如何缩短ABC100xxx
排名高于ipconfig
?
答案 0 :(得分:0)
在找到关于将字段长度存储为字段或使用脚本查询的大量less than optimal solutions后,我找到了the root of my problem。这只是因为我使用的是edge_ngrams标记过滤器而不是edge_ngrams标记器。