我有一个忽略空格的分析器。当我搜索没有空格的字符串时,它会返回正确的结果。这是分析仪:
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"word_joiner": {
"type": "word_delimiter",
"catenate_all": true
}
},
"analyzer": {
"word_join_analyzer": {
"type": "custom",
"filter": [
"word_joiner"
],
"tokenizer": "keyword"
}
}
}
}
}
这是它的工作原理:
curl -XGET "http://localhost:9200/cake/_analyze?analyzer=word_join_analyzer&pretty" -d 'ONE"\ "TWO'
结果:
{
"tokens" : [ {
"token" : "ONE",
"start_offset" : 1,
"end_offset" : 5,
"type" : "word",
"position" : 0
}, {
"token" : "ONETWO",
"start_offset" : 1,
"end_offset" : 13,
"type" : "word",
"position" : 0
}, {
"token" : "TWO",
"start_offset" : 7,
"end_offset" : 13,
"type" : "word",
"position" : 1
} ]
}
我想要的是我从这个分析仪中得到"token" : "ONE TWO"
。我怎么能这样做?
谢谢!
答案 0 :(得分:2)
您需要启用preserve_original
设置,默认情况下为false
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"word_joiner": {
"type": "word_delimiter",
"catenate_all": true,
"preserve_original": true <--- add this
}
},
"analyzer": {
"word_join_analyzer": {
"type": "custom",
"filter": [
"word_joiner"
],
"tokenizer": "keyword"
}
}
}
}
}
这将产生:
{
"tokens": [
{
"token": "ONE TWO",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "ONE",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "ONETWO",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "TWO",
"start_offset": 4,
"end_offset": 7,
"type": "word",
"position": 1
}
]
}