Cloudwatch 警报未将丢失的数据视为 notBreached

时间:2021-05-28 10:18:49

标签: amazon-cloudwatch cloudwatch-alarms

鉴于 Cloudwatch 警报来监控 API 网关错误率,Cloudwatch 不会将丢失的数据点视为 notBreaching
我想在 5 分钟间隔内错误率 > 25% 时触发警报。
警报详情:
时间:1 分钟
要报警的数据点: 5 个中的 3 个
缺失数据处理:将缺失数据视为良好(不超过阈值)

我注意到因以下原因触发了 cloudwatch 警报:

<块引用>

阈值越过:最后 5 个数据点中的 3 个 [100.0 (27/05/21 21:56:00)、100.0 (27/05/21 21:54:00)、100.0 (27/05/21 21:49:00)] 分别是 大于或等于阈值 (25.0) 和 2 个缺失数据点 被视为 [NonBreaching](OK -> ALARM 的最少 3 个数据点 过渡)。

我希望数据点每分钟计算一次 ie 27/05/21 21:50:00, 27/05/21 21:51:00, 27/05/21 21:52 :00, 27/05/21 21:53:00, 27/05/21 21:55:00 应该标记为 Good。所以最近的 5 个数据点应该是
27/05/21 21:56:00:警报
27/05/21 21:55:00 :好的(由于未破坏而丢失数据)
27/05/21 21:54:00:警报
27/05/21 21:53:00 :好的(由于未破坏而丢失数据)
27/05/21 21:52:00 :好的(由于未破坏而丢失数据)
在最近的 5 个数据点中,只有 2 个应该处于 ALARM 状态并且最终结果不应该触发警报。
想知道我错过了什么吗?

Terraform 代码片段:

resource "aws_cloudwatch_metric_alarm" "api_error_spike" {
  alarm_name = "API error rate exceeding threshold"
  alarm_description = "API error rate has exceeded allowed 25% threshold over 5 minutes"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods = "5"
  datapoints_to_alarm = "3" // 3 out of 5 data points should be in ALARM state to trigger alarm
  treat_missing_data = "notBreaching"

  threshold = 25

  metric_query {
    id = "e1"
    expression = "(m1+m2)*100"
    label = "API Error Rate"
    return_data = "true"
  }

  metric_query {
    id = "m1"
    metric {
      metric_name = "5XXError"
      period = "60" // 60 seconds is the lowest precision for standard (in AWS/ namespace) metrics
      stat = "Average" // Average represents Error rate. Sum represents total errors
      unit = "Count"
      namespace = "AWS/ApiGateway"
      dimensions = {
        ApiName = "foo"
      }
    }
  }

  metric_query {
    id = "m2"
    metric {
      metric_name = "4XXError"
      period = "60"
      stat = "Average" // Average represents Error rate. Sum represents total errors
      unit = "Count"
      namespace = "AWS/ApiGateway"
      dimensions = {
        ApiName = "foo"
      }
    }
  }
}

0 个答案:

没有答案
相关问题