索引值:Java, JavaScript, ClojureScript
。
_input_ | _output_
Java | JavaScript, Java
JavaScript | JavaScript
script | JavaScript, ClojureScript
大多数已接近所需结果的分析仪如下。
"analysis": {
"filter": {
"trigrams_filter": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "3"
}
},
"analyzer": {
"trigrams": {
"filter": [
"lowercase",
"trigrams_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
但它不够准确,因为“JavaScript”返回“JavaScript”和“Java” 并且“脚本”什么都不返回。
答案 0 :(得分:1)
您的映射存在一个主要问题:您希望使用edge_ngram过滤器来搜索单词的一部分。当您想要查找以查询值开头的单词时,使用Edge_ngram过滤器。在您的情况下,您应该使用nGram过滤器:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html
此外,您应该只在数据为索引时指定trigrams分析器。为了搜索它最好使用标准分析器,因为没有意义通过nGram过滤器放置查询字符串,因为你将获得比你需要的更多的数据。
正确的映射:
POST /so
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"trigrams_filter": {
"type": "nGram",
"min_gram": "2",
"max_gram": "20"
}
},
"analyzer": {
"trigrams": {
"filter": [
"lowercase",
"trigrams_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
},
"mappings": {
"so" :{
"properties": {
"text": {
"type": "string",
"analyzer": "trigrams",
"search_analyzer": "standard"
}
}
}
}
}
值:
POST /so/so/1
{
"text" :"Java"
}
POST /so/so/2
{
"text" :"JavaScript"
}
POST /so/so/3
{
"text" :"ClojureScript"
}
当您的查询字符串为“java”时,响应包含:Java和JavaScript
POST /so/so/_search
{
"query": {"match": {
"text": "Java"
}}
}
响应:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "so",
"_type": "so",
"_id": "1",
"_score": 1,
"_source": {
"text": "Java"
}
},
{
"_index": "so",
"_type": "so",
"_id": "2",
"_score": 1,
"_source": {
"text": "JavaScript"
}
}
]
}
}
当您的查询字符串为“JavaScript”时,响应包含:JavaScript
POST /so/so/_search
{
"query": {"match": {
"text": " JavaScript "
}}
}
响应:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4054651,
"hits": [
{
"_index": "so",
"_type": "so",
"_id": "2",
"_score": 1.4054651,
"_source": {
"text": "JavaScript"
}
}
]
}
}
当您的查询字符串是“script”时,响应包含:JavaScript和ClojureScript
POST /so/so/_search
{
"query": {"match": {
"text": "script"
}}
}
响应:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "so",
"_type": "so",
"_id": "2",
"_score": 1,
"_source": {
"text": "JavaScript"
}
},
{
"_index": "so",
"_type": "so",
"_id": "3",
"_score": 1,
"_source": {
"text": "ClojureScript"
}
}
]
}
}