我正在实施ECS健康检查功能,而且我正在思考最好的方法。
现在我找到了几个解决方案:
ECSHealthAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmDescription: Alarm for ECS StatusCheckFailed Metric
ComparisonOperator: GreaterThanOrEqualToThreshold
EvaluationPeriods: 2
Statistic: Maximum
MetricName: StatusCheckFailed
Namespace: AWS/ECS
Period: 30
Threshold: 1.0
AlarmActions:
- !Ref AlarmTopic
InsufficientDataActions:
- !Ref AlarmTopic
Dimensions:
- Name: ClusterName
Value: !Ref ClusterName
- Name: ServiceName
Value: !GetAtt service.Name
EventRule:
Type: "AWS::Events::Rule"
Properties:
Name: CloudWatchRMExtensionECSStoppedRule
Description: "Notify when ECS container stopped"
EventPattern:
source: ["aws.ecs"]
detail-type: ["ECS Task State Change", "ECS Container Instance State Change"]
detail:
clusterArn: [ 'clusterArn' ]
lastStatus: [ "STOPPED" ]
stoppedReason: [ "Essential container in task exited" ]
group: [ 'service-group' ]
State: "ENABLED"
Targets:
- Arn: !Ref ECSAlarmSNSTopic
Id: "PublishAlarmTopic"
InputTransformer:
InputPathsMap:
stopped-reason: "$.detail.stoppedReason"
InputTemplate: '"This micro-service has been stopped with the following reason: <stopped-reason>"'
请问您是否可以建议这些变体是否正确,还是有其他方法可以提高效率?谢谢你的帮助!
答案 0 :(得分:0)
我无法发表评论,所以这里有一些想法。无论您是从EC2服务器级别状态检查还是从每个ECS服务任务级别寻找警报,我都不清楚您的要求。我在这里添加所有可能的选项。
https://docs.aws.amazon.com/autoscaling/ec2/userguide/healthcheck.html
我们还可以将AWS ecs-agent docker容器日志也发送到CloudWatch,并基于错误或已过滤的事件获取一些SNS通知。
在启动/停止每个服务任务时,我们也可以从ECS事件流中订阅CW。参考-https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch_event_stream.html https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwet.html
示例事件条目位于下面的链接– https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html
中有关基于日志事件设置警报的参考。
明智地为每个ECS服务添加运行状况检查,并在容器运行不正常时重新启动容器。 https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_healthcheck
也请让我知道您的想法:)。