我尝试对弹性搜索_type进行简单查询,并使用通配符匹配多个字段,我的第一次尝试是这样的:
POST my_index/my_type/_search
{
"sort" : { "date_field" : {"order" : "desc"}},
"query" : {
"filtered" : {
"filter" : {
"or" : [
{
"term" : { "field1" : "4848" }
},
{
"term" : { "field2" : "6867" }
}
]
}
}
}
}
当field1 OR field2分别完全等于4848和6867时,此示例将成功匹配每条记录。
我想要做的是在field1上匹配任何包含4848的字段和包含6867的field2,但我不确定该怎么做。
我感谢我能得到的任何帮助:)
答案 0 :(得分:2)
听起来你的问题主要与analysis有关。适当的解决方案取决于数据的结构以及您要匹配的内容。我将提供几个例子。
首先,我们假设您的数据是这样的,我们可以使用standard analyzer获得我们想要的内容。该分析器将对空格,标点符号和符号上的文本字段进行标记。因此,文字"1234-5678-90"
将分为"1234"
,"5678"
和"90"
这两个字词,因此对这些字词中的任何一个"term"
查询或过滤器都会匹配那份文件。更具体地说:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"field1":{
"type": "string",
"analyzer": "standard"
},
"field2":{
"type": "string",
"analyzer": "standard"
}
}
}
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"field1": "1212-2323-4848","field2": "1234-5678-90"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"field1": "0000-0000-0000","field2": "0987-6543-21"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"field1": "1111-2222-3333","field2": "6867-4545-90"}
POST test_index/_search
{
"query": {
"filtered": {
"filter": {
"or": [
{
"term": { "field1": "4848" }
},
{
"term": { "field2": "6867" }
}
]
}
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"field1": "1212-2323-4848",
"field2": "1234-5678-90"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"_source": {
"field1": "1111-2222-3333",
"field2": "6867-4545-90"
}
}
]
}
}
(明确地写"analyzer": "standard"
是多余的,因为如果你没有指定一个,那么这是默认的分析器;我只想让它变得明显。)
另一方面,如果文本的嵌入方式使得标准分析无法提供您想要的内容,请说"121223234848"
之类的内容,并希望在"4848"
上进行匹配,那么必须使用ngrams做一些更复杂的事情。这是一个例子(注意数据的差异):
DELETE /test_index
PUT /test_index
{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"field1":{
"type": "string",
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"field2":{
"type": "string",
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
}
}
}
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"field1": "121223234848","field2": "1234567890"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"field1": "000000000000","field2": "0987654321"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"field1": "111122223333","field2": "6867454590"}
POST test_index/_search
{
"query": {
"filtered": {
"filter": {
"or": [
{
"term": { "field1": "4848" }
},
{
"term": { "field2": "6867" }
}
]
}
}
}
}
...
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"field1": "121223234848",
"field2": "1234567890"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"_source": {
"field1": "111122223333",
"field2": "6867454590"
}
}
]
}
}
这里有很多内容,所以我不会在这篇文章中解释它。如果您想要更多解释,我建议您阅读此博文:http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams。希望你能原谅这个无耻的插件。 ;)
希望有所帮助。