我正在运行一个Logstash实例,该实例连接到负载均衡器后面的ES群集。 负载均衡器的空闲超时为5分钟。 Logstash配置有与负载均衡器ip对应的ES网址。
通常情况下一切正常,但是发生的情况是,在一段时间的请求不活动之后,由LS处理的下一个请求由于以下原因而出错:
[2018-10-30T08:15:00,757][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://10.100.24.254:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://10.100.24.254:9200/, :error_message=>"Elasticsearch Unreachable: [http://10.100.24.254:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-10-30T08:15:00,759][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://10.100.24.254:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2018-10-30T08:15:02,760][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-10-30T08:15:02,760][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-10-30T08:15:05,651][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://10.100.24.254:9200/, :path=>"/"}
LS最终会恢复,但是需要超过1分钟的时间,这对于我们的SLA是不可接受的。
我怀疑这是由于负载均衡器在闲置5分钟后关闭了连接。
我尝试设置:
timeout => 3
这使事情变得更好。 3秒后重试该请求,但这仍然不够好。 可以用来确保连接在尝试请求之前始终保持正常状态并且可以正常工作的最佳配置选项是什么?
答案 0 :(得分:0)
尝试按照here所述的validate_after_inactivity
设置
或者您可以尝试在Logstash服务器上启用保持活动状态,以便Logstash知道当LB达到空闲超时时,该连接已断开,并且它开始新的连接,而不是在旧的旧连接上发送请求。