我想在我的查询中实施同义词和停用词过滤器。为此,我创建了两个分析器,两者都可以单独工作。但我想同时使用它们,我该怎么办?
GET my_index/_search/
{
"query": {
"match": {
"_all": {
"query": "Good and Bad",
"analyzer": [
"stop_analyzer",
"synonym"
]
}
}
}
}
上面的查询引发了一个错误:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[match] unknown token [START_ARRAY] after [analyzer]",
"line": 6,
"col": 26
}
],
"type": "parsing_exception",
"reason": "[match] unknown token [START_ARRAY] after [analyzer]",
"line": 6,
"col": 26
},
"status": 400
}
我认为我不能在那里使用数组或对象,因为当我使用像"analyzer": "stop_analyzer"
或"analyzer": "synonym"
这样的单一分析器时,它很有效。所以我的问题是如何同时使用它们?
答案 0 :(得分:1)
您可以定义custom analyzer,它可以将这两个简单的分析器合并为一个复合体。
假设您已按以下方式创建索引:
PUT my_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"stopwordsSynonym": {
"filter": [
"lowercase",
"my_synonym",
"english_stop"
],
"tokenizer": "standard"
}
},
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"my_synonym": {
"type": "synonym",
"synonyms": [
"nice => good",
"poor => bad"
]
}
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"my_text": {
"type": "text",
"analyzer": "stopwordsSynonym"
}
}
}
}
}
并添加了一条记录:
POST my_index/my_type
{
"my_text": "People aren’t born good or bad. Maybe they’re born with tendencies either way, but it’s the way you live your life that matters."
}
现在,默认情况下,my_text
上的搜索将使用stopwordsSynonym
分析器。此查询将与文档匹配,因为nice
是good
的同义词:
GET my_index/_search
{
"query": {
"match": {
"my_text": "nice and ugly"
}
}
}
您也可以像这样测试您的分析仪:
GET my_index/_analyze
{
"analyzer": "stopwordsSynonym",
"text": "nice or ugly"
}
{
"tokens": [
{
"token": "good",
"start_offset": 0,
"end_offset": 4,
"type": "SYNONYM",
"position": 0
},
{
"token": "ugly",
"start_offset": 8,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 2
}
]
}
将其与standard
分析器输出进行比较:
GET my_index/_analyze
{
"analyzer": "standard",
"text": "nice or ugly"
}
{
"tokens": [
{
"token": "nice",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "or",
"start_offset": 5,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "ugly",
"start_offset": 8,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 2
}
]
}
事实上,stopwordsSynonym
将nice
令牌替换为good
(其type
为SYNONYM
),并从列表中删除or
令牌,因为它是一个常见的英语禁用词。
为了对给定查询使用不同的分析器,可以使用query_string
查询:
GET /_search
{
"query": {
"query_string": {
"query": "my_text:nice and poor",
"analyzer": "stopwordsSynonym"
}
}
}
或match_phrase
查询:
GET my_index/_search
{
"query": {
"match_phrase" : {
"my_standard_text" : {
"query" : "nice and poor",
"analyzer": "stopwordsSynonym"
}
}
}
}
在任何情况下,都应在创建时将analyzer
添加到索引的设置中(请参阅答案的开头)。
还可以查看search analyzer,它允许使用不同的分析器进行搜索。