AWS Elasticsearch Service集群处于红色状态-节点上打开的文件过多

时间:2019-05-20 15:57:41

标签: amazon-web-services elasticsearch aws-elasticsearch

我有一个AWS Elasticsearch集群,具有以下设置:

 curl -s 'https://***..es.amazonaws.com/_cluster/settings' | jq                                      SIGINT(2)|SIGINT(2)|0 ↵  10017  14:38:29
{
  "persistent": {
    "cluster": {
      "routing": {
        "allocation": {
          "cluster_concurrent_rebalance": "2",
          "node_concurrent_recoveries": "2",
          "disk": {
            "watermark": {
              "low": "15.0gb",
              "flood_stage": "5.0gb",
              "high": "10.0gb"
            }
          },
          "node_initial_primaries_recoveries": "4"
        }
      },
      "blocks": {
        "create_index": "false"
      }
    },
    "indices": {
      "recovery": {
        "max_bytes_per_sec": "40mb"
      }
    }
  },
  "transient": {
    "cluster": {
      "routing": {
        "allocation": {
          "cluster_concurrent_rebalance": "2",
          "node_concurrent_recoveries": "2",
          "disk": {
            "watermark": {
              "low": "15.0gb",
              "flood_stage": "5.0gb",
              "high": "10.0gb"
            }
          },
          "exclude": {
            "_ip": "10.212.32.62,10.212.31.186"
          },
          "node_initial_primaries_recoveries": "4",
          "awareness": {}
        }
      }
    },
    "indices": {
      "recovery": {
        "max_bytes_per_sec": "40mb"
      }
    }
  }
}

健康回报

 curl -s 'https://***..es.amazonaws.com/_cluster/health?pretty' | jq                                                       ✔  10018  14:38:50
{
  "cluster_name": "***",
  "status": "red",
  "timed_out": false,
  "number_of_nodes": 13,
  "number_of_data_nodes": 10,
  "active_primary_shards": 3116,
  "active_shards": 3562,
  "relocating_shards": 0,
  "initializing_shards": 16,
  "unassigned_shards": 9214,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 83,
  "number_of_in_flight_fetch": 1974,
  "task_max_waiting_in_queue_millis": 49831498,
  "active_shards_percent_as_number": 27.84552845528455
}

我进一步挖掘

 curl -s 'https://***..es.amazonaws.com/_nodes/stats' | jq                                                                 ✔  10019  14:43:41
{
  "_nodes": {
    "total": 13,
    "successful": 12,
    "failed": 1,
    "failures": [
      {
        "type": "failed_node_exception",
        "reason": "Failed node [o3Fb21UVQx2rwwm2ZiVu7w]",
        "caused_by": {
          "type": "exception",
          "reason": "failed to refresh store stats",
          "caused_by": {
            "type": "file_system_exception",
            "reason": "/hdd1/mnt/env/root/apollo/env/swift-eu-west-1-prod-ES_6_3AMI-ES2-p001/var/es/data/nodes/0/indices/wvMTt2eiSfSDFVQbuGDEeQ/1/index: Too many open files"
          }
        }
      }
    ]
  },

和打开的文件:

 curl -s -XGET 'https://***..es.amazonaws.com/_cat/nodes?v&h=ip,fdc,fdm'
ip              fdc    fdm
x.x.x.x  70014 128000
x.x.x.x   950 128000
x.x.x.x    915 128000
x.x.x.x    949 128000
x.x.x.x    950 128000
x.x.x.x    954 128000
x.x.x.x   9124 128000
x.x.x.x
x.x.x.x  36916 128000
x.x.x.x    951 128000
x.x.x.x    919 128000
x.x.x.x   948 128000
x.x.x.x   950 128000

任何有关如何解决此问题的建议,将不胜感激。

0 个答案:

没有答案