基于Elastic Documents,除文本(经分析的字符串)外,每种类型都支持doc_values
,我认为该类型在可用时应在聚合中完全省略fielddata
。
但是,对我而言并非如此,每当我基于keyword
或ip
类型进行术语汇总时,我都会看到它们以fieldata
的形式加载,尽管其他情况并未发生类型(例如 session_id 作为long
类型)
这是正确的行为吗?如果为true,如何防止创建fielddata
?
我正在使用Elasticsearch 6.5,这是我的映射
{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 0,
"codec": "best_compression"
}
},
"mappings": {
"_doc": {
"properties": {
"time": {
"type": "date",
"format": "epoch_millis"
},
"session_token": {
"type": "keyword"
},
"session_ref": {
"type": "keyword"
},
"session_id": {
"type": "long"
},
"src": {
"type": "ip"
},
"version": {
"type": "byte"
}
}
}
}
}
这是一个示例聚合,导致fielddata
被加载
GET test_ind/_search?size=0
{
"aggs" : {
"by_token":{
"terms":{
"field": "token",
"size": 100
}
}
}
}
这是聚合后的字段数据状态
"test_ind" : {
"uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
"primaries" : {
"fielddata" : {
"memory_size_in_bytes" : 1564696,
"evictions" : 0,
"fields" : {
"session_ref" : {
"memory_size_in_bytes" : 0
},
"session_token" : {
"memory_size_in_bytes" : 1564696
}
}
}
},
"total" : {
"fielddata" : {
"memory_size_in_bytes" : 1564696,
"evictions" : 0,
"fields" : {
"session_ref" : {
"memory_size_in_bytes" : 0
},
"session_token" : {
"memory_size_in_bytes" : 1564696
}
}
}
}
}
这是细分统计
"test_ind" : {
"uuid" : "DiB6d7EgSXm7jeiSgoo-mQ",
"primaries" : {
"segments" : {
"count" : 8,
"memory_in_bytes" : 472939,
"terms_memory_in_bytes" : 423365,
"stored_fields_memory_in_bytes" : 3504,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 0,
"points_memory_in_bytes" : 41598,
"doc_values_memory_in_bytes" : 4472,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
}
},
"total" : {
"segments" : {
"count" : 8,
"memory_in_bytes" : 472939,
"terms_memory_in_bytes" : 423365,
"stored_fields_memory_in_bytes" : 3504,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 0,
"points_memory_in_bytes" : 41598,
"doc_values_memory_in_bytes" : 4472,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
}
}
}
答案 0 :(得分:0)
Apparently Global Ordinals memory usage are shown in fielddata
。
可以在映射中将“全局顺序”设置为“急切”或“延迟”,前者将在刷新时强制加载它们,而后者在查询时(默认)强制加载
为防止在术语聚合中使用全局序号,我们可以使用"execution_hint": "map"
,在我的情况下为:
GET test_ind/_search?size=0
{
"aggs" : {
"by_token":{
"terms":{
"field": "token",
"execution_hint": "map"
"size": 100
}
}
}
}
尽管它有其自身的警告,但使用更多的内存来执行查询,并且运行速度较慢。