我有一个ES索引,其中包含来自某些科学实验的参数数据。
我有以下术语聚合:
{
"aggs": {
"variables": {
"terms": {
"field": "value",
"size": 100
}
}
},
"size": 0
}
返回如下结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 9928,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"variables" : {
"buckets" : [ {
"key" : "00",
"doc_count" : 158
}, {
"key" : "1",
"doc_count" : 158
}, {
"key" : "2",
"doc_count" : 158
}, {
"key" : "pressure",
"doc_count" : 158
}, {
"key" : "seconds",
"doc_count" : 158
}, {
"key" : "since",
"doc_count" : 158
}, {
"key" : "s",
"doc_count" : 156
}, {
"key" : "speed",
"doc_count" : 127
}, {
"key" : "sample",
"doc_count" : 121
}, {
"key" : "a",
"doc_count" : 104
} ]
}
}
}
我想要做的是告诉ElasticSearch忽略长度小于5的所有密钥;
e.g。过滤掉"key": "a"
,"key": "s"
等。
这可能吗?
答案 0 :(得分:1)
我认为你应该使用Regexp Filter获得想要的结果:
"filter": {
"regexp":{
"value" : ".{2,}"
}
}
答案 1 :(得分:1)
PUT $host/$index
:
{
"settings": {
"analysis": {
"filter": {
"min_length_5_filter": {
"type": "length",
"min": 5,
"max": 256
}
},
"analyzer": {
"variable_name_analyzer": {
"type": "custom",
"tokenizer": "lowercase",
"filter": ["min_length_5_filter"]
}
}
}
}
}
然后在索引映射中:
PUT $host/$index/_mapping/$mapping_name
:
...
"parameters": {
"properties": {
"name": {
"type": "string",
"analyzer": "variable_name_analyzer"
},
"value": {
"type": "string",
"analyzer": "variable_name_analyzer"
}
}
},
...
使用上面的方法,使用最小长度过滤标记化字符串允许我删除大量垃圾值,现在"术语聚合"工作得很好。希望这有助于某人!