在elasticsearch中使用Email tokenizer

时间:2016-08-02 02:12:34

标签: elasticsearch sense

尝试了一些来自elasticsearch文档和谷歌的例子但没有帮助搞清楚..

我只有一些示例数据只是一些博客文章。我想通过电子邮件地址查看所有帖子。当我使用"email":"someone"时,我会看到与someone匹配的所有帖子,但当我更改为使用someone@gmail.com时,没有任何内容显示出来!

    "hits": [
             {
                "_index": "blog",
                "_type": "post",
                "_id": "2",
                "_score": 1,
                "_source": {
                   "user": "sreenath",
                   "email": "someone@gmail.com",
                   "postDate": "2011-12-12",
                   "body": "Trying to figure out this",
                   "title": "Elastic search testing"
                }
             }
           ]

当我使用Get查询时,如下所示,我看到所有匹配someone@anything.com的帖子。但我想改变这一点 { "term" : { "email" : "someone" }}{ "term" : { "email" : "someone@gmail.com" }}

GET blog/post/_search
{ 
 "query" : { 
   "filtered" : { 
     "filter" : { 
       "and" : [ 
         { "term" :
            { "email" : "someone" }
         }
       ] 
     } 
   } 
 } 
}

我为以下做了卷曲-XPUT,但没有帮助

curl -XPUT localhost:9200/test/  -d '
{
   "settings" : {
      "analysis" : {
         "filter" : {
            "email" : {
               "type" : "pattern_capture",
               "preserve_original" : 1,
               "patterns" : [
                  "([^@]+)",
                  "(\\p{L}+)",
                  "(\\d+)",
                  "@(.+)"
               ]
            }
         },
         "analyzer" : {
            "email" : {
               "tokenizer" : "uax_url_email",
               "filter" : [ "email", "lowercase",  "unique" ]
            }
         }
      }
   }
}
'

1 个答案:

答案 0 :(得分:1)

您已为电子邮件地址创建了自定义分析器,但您没有使用它。您需要在映射类型中声明email字段以实际使用该分析器,如下所示。还要确保使用该分析器创建正确的索引,即blog而不是test

                       change this
                            |
                            v
curl -XPUT localhost:9200/blog/  -d '{
   "settings" : {
      "analysis" : {
         "filter" : {
            "email" : {
               "type" : "pattern_capture",
               "preserve_original" : 1,
               "patterns" : [
                  "([^@]+)",
                  "(\\p{L}+)",
                  "(\\d+)",
                  "@(.+)"
               ]
            }
         },
         "analyzer" : {
            "email" : {
               "tokenizer" : "uax_url_email",
               "filter" : [ "email", "lowercase",  "unique" ]
            }
         }
      }
   },
   "mappings": {              <--- add this
      "post": {
         "properties": {
            "email": {
               "type": "string",
               "analyzer": "email"
            }
         }
      }
   }
}
'