我试图从Elasticsearch检索标记的聚合(带有计数),但是在我有连字标记的情况下,它们会被拆分作为单独的标记返回。
E.g。
{
"tags": ['foo', 'foo-bar', 'cheese']
}
我回来了(删节):
{
'foo': 8,
'bar': 3,
'cheese' : 2
}
当我期待得到:
{
'foo': 5,
'foo-bar': 3,
'cheese' : 2
}
我的映射是:
{
"asset" : {
"properties" : {
"name" : {"type" : "string"},
"path" : {"type" : "string", "index" : "not_analyzed"},
"url": {"type" : "string"},
"tags" : {"type" : "string", "index_name" : "tag"},
"created": {"type" : "date"},
"updated": {"type" : "date"},
"usages": {"type" : "string", "index_name" : "usage"},
"meta": {"type": "object"}
}
}
}
有人能指出我正确的方向吗?
答案 0 :(得分:1)
尝试使用另一个分析器,而不是在遇到某些字符时分割单词的标准分析器:
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword_lowercase": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim"
]
}
}
}
},
"mappings": {
"asset" : {
"properties" : {
"name" : {"type" : "string"},
"path" : {"type" : "string", "index" : "not_analyzed"},
"url": {"type" : "string"},
"tags" : {"type" : "string", "index_name" : "tag", "analyzer":"my_keyword_lowercase"},
"created": {"type" : "date"},
"updated": {"type" : "date"},
"usages": {"type" : "string", "index_name" : "usage"},
"meta": {"type": "object"}
}
}
}
}