索引:
{
"settings": {
"index.percolator.map_unmapped_fields_as_text": true,
},
"mappings": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}
此测试过滤器查询有效
{
"query": {
"match": {
"message": "blah"
}
}
}
此查询无效
{
"query": {
"simple_query_string": {
"query": "bl*"
}
}
}
结果:
{"took":15,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.13076457,"hits":[{"_index":"my-index","_type":"_doc","_id":"1","_score":0.13076457,"_source":{"query":{"match":{"message":"blah"}}},"fields":{"_percolator_document_slot":[0]}}]}}
为什么这个simple_query_string查询与文档不匹配?
答案 0 :(得分:3)
我也不明白你在问什么。可能是您不太了解渗滤器? 这是我现在刚刚尝试的示例。
让我们假设您有一个索引-称为test
-您要在其中索引某些文档。该索引具有以下映射(只是我在测试设置中拥有的随机测试索引):
{
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"code": {
"type": "long"
},
"date": {
"type": "date"
},
"part": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"val": {
"type": "long"
},
"email": {
"type": "text",
"analyzer": "email"
}
}
}
}
您会注意到它有一个自定义的email
分析器,该分析器将类似foo@bar.com
的内容拆分为以下令牌:foo@bar.com
,foo
,bar.com
,{{1} },bar
。
如文档所述,您可以创建一个单独的过滤器索引,该索引将仅容纳您的过滤器查询,而不包含文档本身。而且,即使percolator索引本身不包含文档,它也应该包含应该保存文档的索引的映射(在我们的例子中为com
)。
这是过滤器索引(我称之为test
)的映射,该索引也具有用于拆分percolator_index
字段的特殊分析器:
email
它的映射和设置与我的原始索引几乎相同,唯一的区别是添加到映射中的{
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"query": {
"type": "percolator"
},
"code": {
"type": "long"
},
"date": {
"type": "date"
},
"part": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"val": {
"type": "long"
},
"email": {
"type": "text",
"analyzer": "email"
}
}
}
}
类型的附加query
字段。
您感兴趣的查询-percolator
-应该进入simple_query_string
内的文档中。像这样:
percolator_index
为了使它更有趣,我在其中添加了PUT /percolator_index/_doc/1?refresh
{
"query": {
"simple_query_string" : {
"query" : "month foo@bar.com",
"fields": ["part", "email"]
}
}
}
字段,以便在查询中进行专门搜索(默认情况下,将搜索所有字段)。
现在,我们的目标是针对渗透过滤器索引中的email
查询来测试最终应进入test
索引的文档。例如:
simple_query_string
显然,GET /percolator_index/_search
{
"query": {
"percolate": {
"field": "query",
"document": {
"date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo@bar.com"
}
}
}
}
下的内容是您将来的文档(尚不存在)。这将与上面定义的document
相匹配,并且将导致匹配:
simple_query_string
如果我要对这份文件进行渗透怎么办:
{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.39324823,
"hits": [
{
"_index": "percolator_index",
"_type": "_doc",
"_id": "1",
"_score": 0.39324823,
"_source": {
"query": {
"simple_query_string": {
"query": "month foo@bar.com",
"fields": [
"part",
"email"
]
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]
}
}
(请注意,电子邮件仅为{
"query": {
"percolate": {
"field": "query",
"document": {
"date":"2004-07-31T11:57:52.000Z","part":"month","code":109,"val":0,"email":"foo"
}
}
}
}
)
结果是:
foo
请注意,分数略低于第一个经过过滤的文档。大概是这样的,因为{
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.26152915,
"hits": [
{
"_index": "percolator_index",
"_type": "_doc",
"_id": "1",
"_score": 0.26152915,
"_source": {
"query": {
"simple_query_string": {
"query": "month foo@bar.com",
"fields": [
"part",
"email"
]
}
}
},
"fields": {
"_percolator_document_slot": [
0
]
}
}
]
}
}
(我的电子邮件)仅与我分析的foo
中的一个词相匹配,而foo@bar.com
会与所有它们匹配(因此得分更高) / p>
不确定您在说什么分析器。我认为上面的示例涵盖了唯一的“分析器”问题/未知,我认为这可能有点令人困惑。