Question

我们的Account模型包含first_name，last_name和ssn（社会安全号码）。

我想在first_name, last_name'上进行部分匹配，但在ssn上完全匹配。到目前为止我有这个：

settings analysis: {
    filter: {
      substring: {
        type: "nGram",
        min_gram: 3,
        max_gram: 50
      },
      ssn_string: {
        type: "nGram",
        min_gram: 9,
        max_gram: 9
      },
    },
    analyzer: {
      index_ngram_analyzer: {
        type: "custom",
        tokenizer: "standard",
        filter: ["lowercase", "substring"]
      },
      search_ngram_analyzer: {
        type: "custom",
        tokenizer: "standard",
        filter:  ["lowercase", "substring"]
      },
      ssn_ngram_analyzer: {
        type: "custom",
        tokenizer: "standard",
        filter: ["ssn_string"]
      },
     }
   }

   mapping do
    [:first_name, :last_name].each do |attribute|
      indexes attribute, type: 'string', 
                         index_analyzer: 'index_ngram_analyzer',
                         search_analyzer: 'search_ngram_analyzer'
   end

   indexes :ssn, type: 'string', index: 'not_analyzed'

  end

我的搜索如下：

query: {
  multi_match: {
     fields: ["first_name", "last_name", "ssn"],
     query: query,
     type: "cross_fields",
     operator: "and"
  }

}

这样可行：

 Account.search("erik").records.to_a

甚至（对于Erik Smith）：

 Account.search("erik smi").records.to_a

和ssn：

 Account.search("111112222").records.to_a

但不是：

 Account.search("erik 111112222").records.to_a

我是否正在编制索引或查询错误？

感谢您的帮助！

Answer 1

是否必须使用单个查询字符串？如果没有，我会做这样的事情：

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "analysis": {
         "filter": {
            "ngram_filter": {
               "type": "ngram",
               "min_gram": 2,
               "max_gram": 20
            }
         },
         "analyzer": {
            "ngram_analyzer": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "ngram_filter"
               ]
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "_all": {
            "enabled": true,
            "index_analyzer": "ngram_analyzer",
            "search_analyzer": "standard"
         },
         "properties": {
            "first_name": {
               "type": "string",
               "include_in_all": true
            },
            "last_name": {
               "type": "string",
               "include_in_all": true
            },
            "ssn": {
               "type": "string",
               "index": "not_analyzed",
               "include_in_all": false
            }
         }
      }
   }
}

请注意_all field的使用。我在first_name中添加了last_name和_all，但没有ssn，ssn根本没有进行分析，因为我想对它进行完全匹配。< / p>

我将几个文件编入索引以供说明：

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"first_name":"Erik","last_name":"Smith","ssn":"111112222"}
{"index":{"_id":2}}
{"first_name":"Bob","last_name":"Jones","ssn":"123456789"}

然后我可以查询部分名称，并按照确切的ssn：

进行过滤

POST /test_index/doc/_search
{
   "query": {
      "filtered": {
         "query": {
            "match": {
               "_all": {
                   "query": "eri smi",
                   "operator": "and"
               }
            }
         },
         "filter": {
            "term": {
               "ssn": "111112222"
            }
         }
      }
   }
}

我回想起我所期待的：

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.8838835,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.8838835,
            "_source": {
               "first_name": "Erik",
               "last_name": "Smith",
               "ssn": "111112222"
            }
         }
      ]
   }
}

如果您需要使用单个查询字符串（无过滤器）进行搜索，您也可以在ssn字段中添加all，但使用此设置时，它也会匹配部分字符串（如111112），这可能不是你想要的。

如果您只想匹配前缀（即从单词开头开始的搜索字词），则应使用edge ngrams。

我写了一篇关于使用ngrams的博客文章，这可能对你有所帮助：http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch

以下是我用于此答案的代码。我尝试了一些不同的东西，包括我在此处发布的设置，以及ssn中_all中的另一个，但是使用了边缘ngrams。希望这会有所帮助：

http://sense.qbox.io/gist/b6a31c929945ef96779c72c468303ea3bc87320f

具有部分匹配和完全匹配的多个字段的弹性搜索

1 个答案: