Question

我在自定义ES中有以下代码'where'包装器方法

filter: { term: params }

然后我们有一个示例ES文档，其中包含：

"emails" => { "email" => "johndoe@email.com" }

我的搜索结果是：

query.where("emails.email" => "johndoe")

但在以下情况下我没有结果：

query.where("emails.email" => "johndoe@email.com")

在使用ES gem时，我似乎必须以某种方式逃脱？

Answer 1

这可能是因为您的字段是使用默认标准分析器进行分析的，因此会在@符号处进行标记。

您可以通过运行以下命令来查看ES已编入索引的内容：

curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty' -d 'johndoe@email.com'

结果是

{
  "tokens" : [ {
    "token" : "johndoe",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "email.com",
    "start_offset" : 8,
    "end_offset" : 17,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}

正如您所看到的，您的电子邮件字段已被标记为两个不同的令牌，这可能是为什么搜索johndoe有效，而搜索完整的电子邮件地址则不然。

有一些方法可以从这里开始，但有一种方法可以根据pattern_capture token filter创建自己的分析器，并将其用作index_analyzer字段的emails.email。

{
   "settings" : {
      "analysis" : {
         "filter" : {
            "email" : {
               "type" : "pattern_capture",
               "preserve_original" : 1,
               "patterns" : [ "([^@]+)", "(\\p{L}+)", "(\\d+)", "@(.+)" ]
            }
         },
         "analyzer" : {
            "email" : {
               "tokenizer" : "uax_url_email",
               "filter" : [ "email", "lowercase", "unique" ]
            }
         }
      }
   },
   "mappings": {
       "emails": {
           "properties": {
               "email": {
                   "type": "string",
                   "analyzer": "email"      <-- use the analyzer here
               }
           }
       }
   }
}

在索引时，该分析器将生成以下所有令牌，这样您就可以搜索电子邮件地址的任何部分：

johndoe@email.com
johndoe
email.com
email
com

在Ruby Elastic Search gem中转义@符号？

1 个答案: