与使用elasticsearch的短语相比,如何过滤范围

时间:2015-03-17 15:04:37

标签: elasticsearch

我需要在索引中搜索大于或等于某个短语的匹配。为了更清楚,我需要像下面的SQL一样构建查询:

SELECT * FROM Table WHERE MyNVarCharField >= 'some_string'

映射:

{
    "tock": {
        "mappings": {
            "post": {
                "properties": {
                    "content": {
                        "type": "string",
                        "index_analyzer": "english"
                    },
                    "id": {
                        "type": "double"
                    },
                    "title": {
                        "type": "string",
                        "index_analyzer": "english"
                    }
                }
            }
        }
    }
}

索引包含两个对象:

[
    {
        "id": 1,
        "title": "Post1",
        "content": "Ash to ash item"
    },
    {
        "id": 2,
        "title": "Post2",
        "content": "Dust to dust item"
    }
]

现在我想搜索内容大于或等于“尘埃项目”的对象。我尝试了许多不同的方法,最终得到了类似的东西:

{
    "sort": [
        {
            "content": {
                "order": "asc"
            }
        }
    ],
    "filtered": {
        "query": {
            "match": {
                "content": {
                    "query": "item"
                }
            }
        },
        "filter": {
            "range": {
                "content": {
                    "from": "Dust to dust",
                    "include_lower": true,
                    "include_upper": true
                }
            }
        }
    }
}

但它没有像我期望的那样起作用。返回两个对象。所以我需要帮助:))

以这种方式查询弹性是否真的可行?我需要做什么才能用一个短语将索引分成两部分?

顺便说一下,你应该提到保证这个短语已经存在于索引中。

1 个答案:

答案 0 :(得分:0)

您的范围过滤器会匹配这两个文档,因为文本会与为"content"字段生成的每个术语进行比较,而不是与原始源文本进行比较。由于english analyzer使用standard tokenizer,因此每个文档的其中一个术语为"item"。由于"item"大于"dust",因此两个文档都匹配。

如果您的索引中包含很多文档,那么您使用的方法可能无法使用,因为会生成很多术语。

您可以做的一件事是使用"index":"not_analyzed"字段中的"content"设置。或者,如果您因其他原因需要对"content"进行分析,请定义未分析的sub-field,然后针对该字段进行范围比较。这是一个例子。

所以我定义了一个索引如下:

PUT /test_index
{
   "mappings": {
      "post": {
         "properties": {
            "content": {
               "type": "string",
               "index_analyzer": "english",
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  }
               }
            },
            "id": {
               "type": "double"
            },
            "title": {
               "type": "string",
               "index_analyzer": "english"
            }
         }
      }
   }
}

然后添加了三个文档(您的两个加上另一个用于比较):

POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"post","_id":1}}
{"id": 1,"title": "Post1", "content": "Ash to ash item"}
{"index":{"_index":"test_index","_type":"post","_id":2}}
{"id": 2,"title": "Post2", "content": "Dust to dust item"}
{"index":{"_index":"test_index","_type":"post","_id":3}}
{"id": 3,"title": "Post3", "content": "Earth to earth item"}

然后我可以对"content.raw"使用范围查询:

POST /test_index/_search
{
    "query": {
        "range": {
           "content.raw": {
              "gte": "Dust to dust"
           }
        }
    }
}

它会返回我的期望:

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "post",
            "_id": "2",
            "_score": 1,
            "_source": {
               "id": 2,
               "title": "Post2",
               "content": "Dust to dust item"
            }
         },
         {
            "_index": "test_index",
            "_type": "post",
            "_id": "3",
            "_score": 1,
            "_source": {
               "id": 3,
               "title": "Post3",
               "content": "Earth to earth item"
            }
         }
      ]
   }
}

修改:您可以通过将"content"更改为"content.raw"来调整您发布的查询(同时您的语法略有错误并给了我一个错误,因此我将查询包装起来并在"query"块中过滤):

POST /test_index/_search
{
   "sort": [
      {
         "content": {
            "order": "asc"
         }
      }
   ],
   "query": {
      "filtered": {
         "query": {
            "match": {
               "content": {
                  "query": "item"
               }
            }
         },
         "filter": {
            "range": {
               "content.raw": {
                  "from": "Dust to dust",
                  "include_lower": true,
                  "include_upper": true
               }
            }
         }
      }
   }
}
...
{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": null,
      "hits": [
         {
            "_index": "test_index",
            "_type": "post",
            "_id": "2",
            "_score": null,
            "_source": {
               "id": 2,
               "title": "Post2",
               "content": "Dust to dust item"
            },
            "sort": [
               "dust"
            ]
         },
         {
            "_index": "test_index",
            "_type": "post",
            "_id": "3",
            "_score": null,
            "_source": {
               "id": 3,
               "title": "Post3",
               "content": "Earth to earth item"
            },
            "sort": [
               "earth"
            ]
         }
      ]
   }
}

以下是我用于测试的代码:

http://sense.qbox.io/gist/57968fda91b9bcd5b2f1d8236ecb5fc1953800b7