如何在弹性搜索中优先考虑整个单词?

时间:2016-03-04 14:47:35

标签: ruby elasticsearch

目前Elasticsearch对我来说运行良好,但我希望全文优先于Ngrams。

我尝试了以下内容:

client.indices.create index: index,
                      body: {
                          mappings: {
                              search_variable: {
                                  properties: {
                                      "name" => {
                                          "type" => "string",
                                          "index" => "not_analyzed"
                                      },
                                      "label" => {
                                          "type" => "string",
                                          "index" => "not_analyzed"
                                      },
                                      "value_labels" => {
                                          "type" => "string",
                                          "index" => "not_analyzed"
                                      },
                                      "value_label_search_string" => {
                                          "type" => "string",
                                          "index" => "not_analyzed"
                                      },
                                      "search_text" => {
                                          "type" => "multi_field",
                                          "fields" => {
                                            "whole_words" => {"type" => "string", "analyzer" => "simple"},
                                            "ngram" => {"type" => "string", "analyzer" => "ngram", "search_analyzer" => "ngram_search"}
                                          }
                                      }
                                  }
                              },
                              settings: {
                                  analysis: {
                                      filter: {
                                          ngram: {
                                              type: 'nGram',
                                              min_gram: 3,
                                              max_gram: 25
                                          }
                                      },
                                      analyzer: {
                                          ngram: {
                                              tokenizer: 'whitespace',
                                              filter: ['lowercase', 'stop', 'ngram'],
                                              type: 'custom'
                                          },
                                          ngram_search: {
                                              tokenizer: 'whitespace',
                                              filter: ['lowercase', 'stop'],
                                              type: 'custom'
                                          }
                                      }
                                  }
                              }
                          }   
                      } 

这是与我的全文搜索字段相关的部分:search_text

"search_text" => {
    "type" => "multi_field",
    "fields" => {
        "whole_words" => {"type" => "string", "analyzer" => "simple"},
        "ngram" => {"type" => "string", "analyzer" => "ngram", "search_analyzer" => "ngram_search"}
    }
}

我想给与搜索文本中的整个单词匹配的项目提供更高的分数。

[400] {"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"analyzer [ngram_search] not found for field [ngram]"}],"type":"mapper_parsing_exception","reason":"Failed to parse mapping [search_variable]: analyzer [ngram_search] not found for field [ngram]","caused_by":{"type":"mapper_parsing_exception","reason":"analyzer [ngram_search] not found for field [ngram]"}},"status":400}

这是错误: "reason":"analyzer [ngram_search] not found for field [ngram]"

我做错了什么?

编辑:

这是我的查询,我现在只尝试匹配整个单词,每次只得到0个结果。

    search_query = {
        index: index,
        body: {
            _source: {
                exclude: ["search_text", "survey_id"]
            },
            query: {
                :bool => {
                    :filter => {
                        :term => {"survey_id" => 12}
                    },
                    :must => {
                        :match => {
                          "search_text.whole_words" => {"query" => "BMW", "operator" => "and"}
                        }
                    }
                }
            }
        }
    }

    result = client.search(search_query)

以下是输出: curl -XGET localhost:9200/yg_search_variables

{"yg_search_variables":{"aliases":{},"mappings":{"search_variable":{"properties":{"label":{"type":"string","index":"not_analyzed"},"name":{"type":"string","index":"not_analyzed"},"search_text":{"type":"string","index":"no","fields":{"ngram":{"type":"string","analyzer":"ngram","search_analyzer":"ngram_search"},"whole_words":{"type":"string","analyzer":"simple"}}},"value_label_search_string":{"type":"string","index":"not_analyzed"},"value_labels":{"type":"string","index":"not_analyzed"}}},"variables":{"properties":{"category":{"type":"string"},"label":{"type":"string","index":"not_analyzed"},"name":{"type":"string","index":"not_analyzed"},"search_text":{"type":"string","index":"no"},"survey_id":{"type":"long"},"value_label_search_text":{"type":"string"},"value_labels":{"properties":{"0":{"type":"string"},"1":{"type":"string"},"10":{"type":"string"},"100":{"type":"string"},"101":{"type":"string"},"102":{"type":"string"},"103":{"type":"string"},"104":{"type":"string"},"105":{"type":"string"},"106":{"type":"string"},"107":{"type":"string"},"108":{"type":"string"},"109":{"type":"string"},"11":{"type":"string"},"110":{"type":"string"},"1100":{"type":"string"},"1101":{"type":"string"},"1102":{"type":"string"},"1103":{"type":"string"},"1104":{"type":"string"},"1105":{"type":"string"},"1106":{"type":"string"},"1107":{"type":"string"},"1108":{"type":"string"},"1109":{"type":"string"},"111":{"type":"string"},"1110":{"type":"string"},"1111":{"type":"string"},"1112":{"type":"string"},"1113":{"type":"string"},"1114":{"type":"string"},"112":{"type":"string"},"113":{"type":"string"},"114":{"type":"string"},"115":{"type":"string"},"116":{"type":"string"},"117":{"type":"string"},"118":{"type":"string"},"119":{"type":"string"},"12":{"type":"string"},"120":{"type":"string"},"121":{"type":"string"},"122":{"type":"string"},"123":{"type":"string"},"124":{"type":"string"},"125":{"type":"string"},"126":{"type":"string"},"127":{"type":"string"},"128":{"type":"string"},"129":{"type":"string"},"13":{"type":"string"},"130":{"type":"string"},"131":{"type":"string"},"132":{"type":"string"},"133":{"type":"string"},"134":{"type":"string"},"135":{"type":"string"},"136":{"type":"string"},"137":{"type":"string"},"138":{"type":"string"},"139":{"type":"string"},"14":{"type":"string"},"140":{"type":"string"},"141":{"type":"string"},"142":{"type":"string"},"143":{"type":"string"},"144":{"type":"string"},"145":{"type":"string"},"146":{"type":"string"},"147":{"type":"string"},"148":{"type":"string"},"149":{"type":"string"},"15":{"type":"string"},"150":{"type":"string"},"151":{"type":"string"},"152":{"type":"string"},"153":{"type":"string"},"154":{"type":"string"},"155":{"type":"string"},"156":{"type":"string"},"157":{"type":"string"},"158":{"type":"string"},"159":{"type":"string"},"16":{"type":"string"},"160":{"type":"string"},"161":{"type":"string"},"162":{"type":"string"},"163":{"type":"string"},"164":{"type":"string"},"165":{"type":"string"},"166":{"type":"string"},"167":{"type":"string"},"168":{"type":"string"},"169":{"type":"string"},"17":{"type":"string"},"170":{"type":"string"},"171":{"type":"string"},"172":{"type":"string"},"173":{"type":"string"},"174":{"type":"string"},"175":{"type":"string"},"176":{"type":"string"},"177":{"type":"string"},"178":{"type":"string"},"179":{"type":"string"},"18":{"type":"string"},"180":{"type":"string"},"181":{"type":"string"},"182":{"type":"string"},"183":{"type":"string"},"184":{"type":"string"},"185":{"type":"string"},"186":{"type":"string"},"187":{"type":"string"},"188":{"type":"string"},"189":{"type":"string"},"19":{"type":"string"},"190":{"type":"string"},"191":{"type":"string"},"192":{"type":"string"},"193":{"type":"string"},"194":{"type":"string"},"195":{"type":"string"},"196":{"type":"string"},"197":{"type":"string"},"198":{"type":"string"},"199":{"type":"string"},"2":{"type":"string"},"20":{"type":"string"},"200":{"type":"string"},"201":{"type":"string"},"202":{"type":"string"},"203":{"type":"string"},"204":{"type":"string"},"205":{"type":"string"},"206":{"type":"string"},"207":{"type":"string"},"208":{"type":"string"},"209":{"type":"string"},"21":{"type":"string"},"210":{"type":"string"},"211":{"type":"string"},"22":{"type":"string"},"23":{"type":"string"},"24":{"type":"string"},"25":{"type":"string"},"26":{"type":"string"},"27":{"type":"string"},"28":{"type":"string"},"29":{"type":"string"},"3":{"type":"string"},"30":{"type":"string"},"301":{"type":"string"},"302":{"type":"string"},"303":{"type":"string"},"304":{"type":"string"},"305":{"type":"string"},"306":{"type":"string"},"307":{"type":"string"},"308":{"type":"string"},"309":{"type":"string"},"31":{"type":"string"},"310":{"type":"string"},"311":{"type":"string"},"312":{"type":"string"},"313":{"type":"string"},"314":{"type":"string"},"315":{"type":"string"},"316":{"type":"string"},"317":{"type":"string"},"32":{"type":"string"},"33":{"type":"string"},"34":{"type":"string"},"35":{"type":"string"},"36":{"type":"string"},"37":{"type":"string"},"38":{"type":"string"},"39":{"type":"string"},"4":{"type":"string"},"40":{"type":"string"},"41":{"type":"string"},"42":{"type":"string"},"43":{"type":"string"},"44":{"type":"string"},"45":{"type":"string"},"46":{"type":"string"},"47":{"type":"string"},"48":{"type":"string"},"49":{"type":"string"},"5":{"type":"string"},"50":{"type":"string"},"51":{"type":"string"},"52":{"type":"string"},"53":{"type":"string"},"54":{"type":"string"},"55":{"type":"string"},"554":{"type":"string"},"555":{"type":"string"},"556":{"type":"string"},"56":{"type":"string"},"57":{"type":"string"},"58":{"type":"string"},"59":{"type":"string"},"6":{"type":"string"},"60":{"type":"string"},"601":{"type":"string"},"602":{"type":"string"},"603":{"type":"string"},"604":{"type":"string"},"61":{"type":"string"},"62":{"type":"string"},"63":{"type":"string"},"64":{"type":"string"},"65":{"type":"string"},"66":{"type":"string"},"666":{"type":"string"},"667":{"type":"string"},"67":{"type":"string"},"68":{"type":"string"},"69":{"type":"string"},"7":{"type":"string"},"70":{"type":"string"},"71":{"type":"string"},"72":{"type":"string"},"73":{"type":"string"},"74":{"type":"string"},"75":{"type":"string"},"76":{"type":"string"},"77":{"type":"string"},"777":{"type":"string"},"78":{"type":"string"},"79":{"type":"string"},"8":{"type":"string"},"80":{"type":"string"},"801":{"type":"string"},"802":{"type":"string"},"803":{"type":"string"},"804":{"type":"string"},"805":{"type":"string"},"806":{"type":"string"},"807":{"type":"string"},"808":{"type":"string"},"809":{"type":"string"},"81":{"type":"string"},"810":{"type":"string"},"811":{"type":"string"},"812":{"type":"string"},"813":{"type":"string"},"814":{"type":"string"},"815":{"type":"string"},"816":{"type":"string"},"817":{"type":"string"},"818":{"type":"string"},"819":{"type":"string"},"82":{"type":"string"},"820":{"type":"string"},"821":{"type":"string"},"822":{"type":"string"},"83":{"type":"string"},"84":{"type":"string"},"85":{"type":"string"},"86":{"type":"string"},"87":{"type":"string"},"88":{"type":"string"},"888":{"type":"string"},"89":{"type":"string"},"9":{"type":"string"},"90":{"type":"string"},"901":{"type":"string"},"902":{"type":"string"},"903":{"type":"string"},"904":{"type":"string"},"905":{"type":"string"},"906":{"type":"string"},"907":{"type":"string"},"908":{"type":"string"},"909":{"type":"string"},"91":{"type":"string"},"910":{"type":"string"},"911":{"type":"string"},"912":{"type":"string"},"913":{"type":"string"},"914":{"type":"string"},"915":{"type":"string"},"916":{"type":"string"},"917":{"type":"string"},"918":{"type":"string"},"919":{"type":"string"},"92":{"type":"string"},"920":{"type":"string"},"921":{"type":"string"},"922":{"type":"string"},"923":{"type":"string"},"924":{"type":"string"},"925":{"type":"string"},"926":{"type":"string"},"927":{"type":"string"},"928":{"type":"string"},"93":{"type":"string"},"94":{"type":"string"},"95":{"type":"string"},"96":{"type":"string"},"97":{"type":"string"},"98":{"type":"string"},"99":{"type":"string"},"997":{"type":"string"},"998":{"type":"string"},"999":{"type":"string"}}}}}},"settings":{"index":{"creation_date":"1457103857764","analysis":{"filter":{"ngram":{"type":"nGram","min_gram":"3","max_gram":"25"}},"analyzer":{"ngram":{"filter":["lowercase","stop","ngram"],"type":"custom","tokenizer":"whitespace"},"ngram_search":{"filter":["lowercase","stop"],"type":"custom","tokenizer":"whitespace"}}},"number_of_shards":"5","number_of_replicas":"1","uuid":"zPN2LDfCTFqPleW7d5nkwA","version":{"created":"2020099"}}},"warmers":{}}}%

索引似乎很奇怪no

     "search_text": {
        "type": "string",
        "index": "no",
        "fields": {
          "ngram": {
            "type": "string",
            "analyzer": "ngram",
            "search_analyzer": "ngram_search"
          },
          "whole_words": {
            "type": "string",
            "analyzer": "simple"
          }
        }
      }

编辑:以下是“福特”一词的示例匹配文档:

{
  "name"=>"car_ownership", 
  "label"=>"Customer: Ford", 
  "category"=>["Vehicles", "Passenger Vehicles"], "value"=>nil, 
  "value_labels"=>{"1"=>"Yes", "2"=>"No"}, 
  "node_id"=>14813, 
  "survey_id" => 12,
  "search_text" => "Customer Ford Vehicles Passenger Vehicles Yes No"
}

编辑:我添加了一个较小的开始结束测试用例,可以在这里找到,它会复制错误。

https://www.dropbox.com/s/wwxm3qe0oxc2z5y/Slimmed%20ElasticSearch%20Text%20%281%29.html?dl=0

1 个答案:

答案 0 :(得分:1)

第一个问题是在创建索引时settings没有正确嵌套。 settingsmappings应处于同一级别。

然后,查看您的Dropbox文件,我认为问题是映射类型称为search_variable,而在批量中您使用映射类型test_type。因此,永远不会应用映射。