Elasticsearch:在查询包含特殊字符的位置进行精确搜索,例如#' {'

时间:2015-11-24 19:31:12

标签: elasticsearch search-engine django-haystack

仅获取包含' #test'的文档的结果。并忽略只包含' test'的文档。在elasticsearch中

1 个答案:

答案 0 :(得分:2)

人们可能会抱怨你这个问题,所以我会注意到这是对我对this post的评论的回应。

您可能希望阅读analysis in Elasticsearch以及match queriesterm queries

无论如何,这里的约定是在字符串字段上使用.raw子字段。这样,如果要进行涉及分析的搜索,可以使用基本字段,但如果要搜索精确(未分析)值,则可以使用子字段。

所以这是一个简单的映射来完成这个:

PUT /test_index
{
   "mappings": {
      "doc": {
         "properties": {
            "post_text": {
               "type": "string",
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  }
               }
            }
         }
      }
   }
}

现在,如果我添加这两个文件:

PUT /test_index/doc/1
{
    "post_text": "#test"
}

PUT /test_index/doc/2
{
    "post_text": "test"
}

针对基本字段的"match"查询将同时返回:

POST /test_index/_search
{
    "query": {
        "match": {
           "post_text": "#test"
        }
    }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.5945348,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.5945348,
            "_source": {
               "post_text": "#test"
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.5945348,
            "_source": {
               "post_text": "test"
            }
         }
      ]
   }
}

但下面的"term"查询只返回一个:

POST /test_index/_search
{
    "query": {
        "term": {
           "post_text.raw": "#test"
        }
    }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 1,
            "_source": {
               "post_text": "#test"
            }
         }
      ]
   }
}

以下是我用来测试它的代码:

http://sense.qbox.io/gist/2f0fbb38e2b7608019b5b21ebe05557982212ac7