Elasticsearch通过令牌自动完成或自动提供

时间:2014-06-24 16:17:32

标签: autocomplete elasticsearch autosuggest search-suggestion

我想创建有关如何基于令牌完成术语的建议,类似于像自动填充一样的谷歌,但只有一个令牌或单词。

我想搜索将被标记化的文件名。例如。 “BRAND_Connect_A1233.jpg”被标记为“品牌”,“连接”,“a1234”和“jpg”。

现在我想问一些建议,例如: “的 N ”。 该建议应该提供完整的匹配令牌,而不是完整的文件名:

  • 连接
  • 轮廓
  • 概念
  • ...

“A12”的建议应为“A1234”,“A1233”,“A1233”......

实施例

使用查询,构面和过滤器可以正常工作。

首先我创建了一个包含tokenizer和过滤器的映射:

curl -XPUT 'localhost:9200/files/?pretty=1'  -d '
{
   "settings" : {
      "analysis" : {
         "analyzer" : {
            "filename_search" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase"]
            },
            "filename_index" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase","edge_ngram"]
            }
         },
         "tokenizer" : {
            "filename" : {
               "pattern" : "[^[;_\\.\\/]\\d]+",
               "type" : "pattern"
            }
         },
         "filter" : {
            "edge_ngram" : {
               "side" : "front",
               "max_gram" : 20,
               "min_gram" : 2,
               "type" : "edgeNGram"
            }
         }
      }
   },
   "mappings" : {
      "file" : {
         "properties" : {
            "filename" : {
               "type" : "string",
               "search_analyzer" : "filename_search",
               "index_analyzer" : "filename_index"
            }
         }
      }
   }
}'

两种分析仪都运行良好:

curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_search'
curl -XGET 'localhost:9200/files/_analyze?pretty=1&text=BRAND_ConnectBlue_A1234.jpg&analyzer=filename_index'

现在我添加了一些示例数据

curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConnectBlue_A1234.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_Connect_A1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConceptSpace_A1244.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Connect_A1222.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Concept_A1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Connect_B1234_.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Contour21_B1233.jpg"}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_ConceptCube_B2233.jpg"}'
curl -X POST "localhost:9200/files/_refresh"

获得所需建议的各种方法无法提供预期结果。我试图命名分析仪,并尝试了各种分析仪和通配符组合。

curl -XGET 'localhost:9200/files/_suggest?pretty=true'  -d '{
    "text" : "con",
    "simple_phrase" : {
      "phrase" : {
        "field" : "filename",
        "size" : 15,
        "real_word_error_likelihood" : 0.75,
        "max_errors" : 0.1,
        "gram_size" : 3
      }
    }
}'
curl -XGET 'localhost:9200/files/_suggest?pretty=true'  -d '{
    "my-suggestion" : {
    "text" : "con",
    "term" : {
        "field" : "filename",
        "analyzer": "filename_index"
        }
    }
}'

1 个答案:

答案 0 :(得分:0)

您需要添加一个特殊的映射来使用完成建议器,如文档in the official ElasticSearch docs所示。我修改了你的例子来展示它是如何工作的。

首先创建索引。请注意filename_suggest映射。

curl -XPUT 'localhost:9200/files/?pretty=1'  -d '
{
   "settings" : {
      "analysis" : {
         "analyzer" : {
            "filename_search" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase"]
            },
            "filename_index" : {
               "tokenizer" : "filename",
               "filter" : ["lowercase","edge_ngram"]
            }
         },
         "tokenizer" : {
            "filename" : {
               "pattern" : "[^[;_\\.\\/]\\d]+",
               "type" : "pattern"
            }
         },
         "filter" : {
            "edge_ngram" : {
               "side" : "front",
               "max_gram" : 20,
               "min_gram" : 2,
               "type" : "edgeNGram"
            }
         }
      }
   },
   "mappings" : {
      "file" : {
         "properties" : {
            "filename" : {
               "type" : "string",
               "analyzer": "filename_index",
               "search_analyzer" : "filename_search"
            },
            "filename_suggest": {
              "type": "completion",
              "analyzer": "simple",
              "search_analyzer": "simple",
              "payloads": true
            }
         }
      }
   }
}'

添加一些数据。请注意filename_suggest如何包含input字段,其中包含要匹配的关键字。

curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConnectBlue_A1234.jpg", "filename_suggest": { "input": ["BRAND", "ConnectBlue", "A1234", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_Connect_A1233.jpg", "filename_suggest": { "input": ["BRAND", "Connect", "A1233", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "BRAND_ConceptSpace_A1244.jpg", "filename_suggest": { "input": ["BRAND", "ConceptSpace", "A1244", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Connect_A1222.jpg", "filename_suggest": { "input": ["COMPANY", "Connect", "A1222", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "COMPANY_Concept_A1233.jpg", "filename_suggest": { "input": ["COMPANY", "Concept", "A1233", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Connect_B1234_.jpg", "filename_suggest": { "input": ["DEALER", "Connect", "B1234", "jpg"], "payload": {} } }'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_Contour21_B1233.jpg", "filename_suggest": { "input": ["DEALER", "Contour21", "B1233", "jpg"], "payload": {} }}'
curl -X POST "localhost:9200/files/file" -d '{ "filename" : "DEALER_ConceptCube_B2233.jpg", "filename_suggest": { "input": ["DEALER", "ConceptCube", "B2233", "jpg"], "payload": {} }}'
curl -X POST "localhost:9200/files/_refresh"

现在执行查询:

curl -XPOST 'localhost:9200/files/_suggest?pretty=true'  -d '{
    "filename_suggest" : {
        "text" : "con",
        "completion": {
            "field": "filename_suggest", "size": 10
        }
    }
}'

结果:

{
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "filename_suggest" : [ {
    "text" : "con",
    "offset" : 0,
    "length" : 3,
    "options" : [ {
      "text" : "Connect",
      "score" : 2.0,
      "payload":{}
    }, {
      "text" : "Concept",
      "score" : 1.0,
      "payload":{}
    }, {
      "text" : "ConceptSpace",
      "score" : 1.0,
      "payload":{}
    }, {
      "text" : "ConnectBlue",
      "score" : 1.0,
      "payload":{}
    }, {
      "text" : "Contour21",
      "score" : 1.0,
      "payload":{}
    } ]
  } ]
}