Question

我正在实施ECS健康检查功能，而且我正在思考最好的方法。

现在我找到了几个解决方案：

使用AWS ECS metrics and Dimensions并检查某个指标是否值不足
使用CloudWatch警报：

ECSHealthAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
  AlarmDescription: Alarm for ECS StatusCheckFailed Metric
  ComparisonOperator: GreaterThanOrEqualToThreshold
  EvaluationPeriods: 2
  Statistic: Maximum
  MetricName: StatusCheckFailed
  Namespace: AWS/ECS
  Period: 30
  Threshold: 1.0
  AlarmActions:
  - !Ref AlarmTopic
  InsufficientDataActions:
  - !Ref AlarmTopic
  Dimensions:
  - Name: ClusterName
    Value: !Ref ClusterName
  - Name: ServiceName
    Value: !GetAtt service.Name

使用CloudWatch事件：

EventRule:
Type: "AWS::Events::Rule"
Properties:
  Name: CloudWatchRMExtensionECSStoppedRule
  Description: "Notify when ECS container stopped"
  EventPattern:
    source: ["aws.ecs"]
    detail-type: ["ECS Task State Change", "ECS Container Instance State Change"]
    detail:
      clusterArn: [ 'clusterArn' ]
      lastStatus: [ "STOPPED" ]
      stoppedReason: [ "Essential container in task exited" ]
      group: [ 'service-group' ]
  State: "ENABLED"
  Targets:
    - Arn: !Ref ECSAlarmSNSTopic
      Id: "PublishAlarmTopic"
      InputTransformer:
        InputPathsMap:
          stopped-reason: "$.detail.stoppedReason"
        InputTemplate: '"This micro-service has been stopped with the following reason: <stopped-reason>"'

请问您是否可以建议这些变体是否正确，还是有其他方法可以提高效率？谢谢你的帮助！

Answer 1

我无法发表评论，所以这里有一些想法。无论您是从EC2服务器级别状态检查还是从每个ECS服务任务级别寻找警报，我都不清楚您的要求。我在这里添加所有可能的选项。

我将在Auto-Scaling组下运行ECS集群EC2实例，并基于ASG CloudWatch指标，在添加/删除实例时设置SNS通知。

https://docs.aws.amazon.com/autoscaling/ec2/userguide/healthcheck.html

我们还可以将AWS ecs-agent docker容器日志也发送到CloudWatch，并基于错误或已过滤的事件获取一些SNS通知。
在启动/停止每个服务任务时，我们也可以从ECS事件流中订阅CW。参考-https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch_event_stream.html https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwet.html

示例事件条目位于下面的链接– https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html

中

有关基于日志事件设置警报的参考。

https://medium.com/@martatatiana/insufficient-data-cloudwatch-alarm-based-on-custom-metric-filter-4e41c1f82050

明智地为每个ECS服务添加运行状况检查，并在容器运行不正常时重新启动容器。 https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_healthcheck

也请让我知道您的想法：）。

实施AWS ECS健康检查的最佳方式

1 个答案: