Elasticsearch索引搜索货币$和£符号

时间:2016-05-06 15:19:12

标签: elasticsearch

在我的一些文件中,我有$或£符号。我想搜索£并检索包含该符号的文档。我经历了the documentation,但我得到了一些认知失调。

# Delete the `my_index` index
DELETE /my_index    

# Create a custom analyzer
PUT /my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "&_to_and": {
          "type": "mapping",
          "mappings": [
            "&=> and ",
            "$=> dollar "
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": [
            "html_strip",
            "&_to_and"
          ],
          "tokenizer": "standard",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  }
}    

这将返回“the”,“quick”,“and”,“brown”,“fox”,就像文档所述:

# Test out the new analyzer
GET /my_index/_analyze?analyzer=my_analyzer&text=The%20quick%20%26%20brown%20fox    

这会返回“the”,“quick”,“dollar”,“brown”,“fox”

GET /my_index/_analyze?analyzer=my_analyzer&text=The%20quick%20%24%20brown%20fox    

添加一些记录:

PUT /my_index/test/1
{
  "title": "The quick & fast fox"
}    

PUT /my_index/test/1
{
  "title": "The daft fox owes me $100"
}    

我想如果我搜索“美元”,我会得到一个结果?相反,我得不到任何结果:

GET /my_index/test/_search
{ "query": {
    "simple_query_string": {
      "query": "dollar"
    }
  }
}

甚至在分析仪上使用'$':

GET /my_index/test/_search
{ "query": {
  "query_string": {
    "query": "dollar10",
    "analyzer": "my_analyzer"
  }
 }
}

1 个答案:

答案 0 :(得分:1)

您的问题是您指定了自定义分析器但从未使用过。如果您使用term vertors,则可以验证。请按照以下步骤操作:

为`title field:

创建和索引设置自定义分析器时
GET /my_index

{
  "settings": {
    "analysis": {
      "char_filter": {
        "&_to_and": {
          "type": "mapping",
          "mappings": [
            "&=> and ",
            "$=> dollar "
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": [
            "html_strip",
            "&_to_and"
          ],
          "tokenizer": "standard",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  }, "mappings" :{
    "test" : {
      "properties" : {
        "title" : {
          "type":"string",
          "analyzer":"my_analyzer"
        }
      }
    }
  }
}

插入数据:

PUT my_index/test/1

{
  "title": "The daft fox owes me $100"
}

检查术语向量:

GET /my_index/test/1/_termvectors?fields=title

响应:

{
   "_index":"my_index",
   "_type":"test",
   "_id":"1",
   "_version":1,
   "found":true,
   "took":3,
   "term_vectors":{
      "title":{
         "field_statistics":{
            "sum_doc_freq":6,
            "doc_count":1,
            "sum_ttf":6
         },
         "terms":{
            "daft":{
               "term_freq":1,
               "tokens":[
                  {
                     "position":1,
                     "start_offset":4,
                     "end_offset":8
                  }
               ]
            },
            "dollar100":{       <-- You can see it here
               "term_freq":1,
               "tokens":[
                  {
                     "position":5,
                     "start_offset":21,
                     "end_offset":25
                  }
               ]
            },
            "fox":{
               "term_freq":1,
               "tokens":[
                  {
                     "position":2,
                     "start_offset":9,
                     "end_offset":12
                  }
               ]
            },
            "me":{
               "term_freq":1,
               "tokens":[
                  {
                     "position":4,
                     "start_offset":18,
                     "end_offset":20
                  }
               ]
            },
            "owes":{
               "term_freq":1,
               "tokens":[
                  {
                     "position":3,
                     "start_offset":13,
                     "end_offset":17
                  }
               ]
            },
            "the":{
               "term_freq":1,
               "tokens":[
                  {
                     "position":0,
                     "start_offset":0,
                     "end_offset":3
                  }
               ]
            }
         }
      }
   }
}

现在搜索:

GET /my_index/test/_search

{
  "query": {
    "match": {
      "title": "dollar100"
    }
  }
}

那将找到匹配。但是使用查询字符串搜索:

GET /my_index/test/_search

{ "query": {
    "simple_query_string": {
      "query": "dollar100"
    }
  }
}

找不到任何东西。因为它搜索特殊的_all字段。正如我所看到的那样,它聚合了未分析的字段:

GET /my_index/test/_search

{
  "query": {
    "match": {
      "_all": "dollar100"
    }
  }
}

找不到结果。但是:

GET /my_index/test/_search

{
  "query": {
    "match": {
      "_all": "$100"
    }
  }
}

发现。我不确定,但原因可能是默认分析仪不是自定义分析仪。要将自定义分析器设置为默认检查:

Changing the default analyzer in ElasticSearch or LogStash

http://elasticsearch-users.115913.n3.nabble.com/How-we-can-change-Elasticsearch-default-analyzer-td4040411.html

http://grokbase.com/t/gg/elasticsearch/148kwsxzee/overriding-built-in-analyzer-and-set-it-as-default

http://elasticsearch-users.115913.n3.nabble.com/How-to-set-the-default-analyzer-td3935275.html