我有一个AWS Elasticsearch集群,具有以下设置:
curl -s 'https://***..es.amazonaws.com/_cluster/settings' | jq SIGINT(2)|SIGINT(2)|0 ↵ 10017 14:38:29
{
"persistent": {
"cluster": {
"routing": {
"allocation": {
"cluster_concurrent_rebalance": "2",
"node_concurrent_recoveries": "2",
"disk": {
"watermark": {
"low": "15.0gb",
"flood_stage": "5.0gb",
"high": "10.0gb"
}
},
"node_initial_primaries_recoveries": "4"
}
},
"blocks": {
"create_index": "false"
}
},
"indices": {
"recovery": {
"max_bytes_per_sec": "40mb"
}
}
},
"transient": {
"cluster": {
"routing": {
"allocation": {
"cluster_concurrent_rebalance": "2",
"node_concurrent_recoveries": "2",
"disk": {
"watermark": {
"low": "15.0gb",
"flood_stage": "5.0gb",
"high": "10.0gb"
}
},
"exclude": {
"_ip": "10.212.32.62,10.212.31.186"
},
"node_initial_primaries_recoveries": "4",
"awareness": {}
}
}
},
"indices": {
"recovery": {
"max_bytes_per_sec": "40mb"
}
}
}
}
健康回报
curl -s 'https://***..es.amazonaws.com/_cluster/health?pretty' | jq ✔ 10018 14:38:50
{
"cluster_name": "***",
"status": "red",
"timed_out": false,
"number_of_nodes": 13,
"number_of_data_nodes": 10,
"active_primary_shards": 3116,
"active_shards": 3562,
"relocating_shards": 0,
"initializing_shards": 16,
"unassigned_shards": 9214,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 83,
"number_of_in_flight_fetch": 1974,
"task_max_waiting_in_queue_millis": 49831498,
"active_shards_percent_as_number": 27.84552845528455
}
我进一步挖掘
curl -s 'https://***..es.amazonaws.com/_nodes/stats' | jq ✔ 10019 14:43:41
{
"_nodes": {
"total": 13,
"successful": 12,
"failed": 1,
"failures": [
{
"type": "failed_node_exception",
"reason": "Failed node [o3Fb21UVQx2rwwm2ZiVu7w]",
"caused_by": {
"type": "exception",
"reason": "failed to refresh store stats",
"caused_by": {
"type": "file_system_exception",
"reason": "/hdd1/mnt/env/root/apollo/env/swift-eu-west-1-prod-ES_6_3AMI-ES2-p001/var/es/data/nodes/0/indices/wvMTt2eiSfSDFVQbuGDEeQ/1/index: Too many open files"
}
}
}
]
},
和打开的文件:
curl -s -XGET 'https://***..es.amazonaws.com/_cat/nodes?v&h=ip,fdc,fdm'
ip fdc fdm
x.x.x.x 70014 128000
x.x.x.x 950 128000
x.x.x.x 915 128000
x.x.x.x 949 128000
x.x.x.x 950 128000
x.x.x.x 954 128000
x.x.x.x 9124 128000
x.x.x.x
x.x.x.x 36916 128000
x.x.x.x 951 128000
x.x.x.x 919 128000
x.x.x.x 948 128000
x.x.x.x 950 128000
任何有关如何解决此问题的建议,将不胜感激。