Question

我有一个EC2自动扩展组，其扩展由分步策略触发。似乎可以正确检测到策略阈值，但是操作延迟了4-5分钟。它似乎反映在警报日志中：

"newState": {
    "stateValue": "ALARM",
    "stateReason": "Threshold Crossed: 1 out of the last 1 datapoints [548.0 (03/01/19 20:27:00)] was greater than the threshold (160.0) (minimum 1 datapoint for OK -> ALARM transition).",
    "stateReasonData": {
      "version": "1.0",
      "queryDate": "2019-01-03T20:31:31.936+0000",
      "startDate": "2019-01-03T20:27:00.000+0000",
      "statistic": "Sum",
      "period": 60,
      "recentDatapoints": [
        548
      ],
      "threshold": 160
    }
  }

ASG活动日志：

At 2019-01-03T20:31:31Z a monitor alarm scale-out-alarm in state ALARM triggered policy scale-out-policy changing the desired capacity from 1 to 4. At 2019-01-03T20:32:01Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 1 to 4.

请注意，在20:27:00超过了阈值，而在20:31:31采取了行动。尽管我还没有找到有关这些属性的任何文档，但这些似乎与日志中的“ startDate”和“ queryDate”相关。

这仅仅是CloudWatch中的随机延迟问题，还是还有其他原因导致这种延迟？

在此之前的很长一段时间内，ASG都没有得到扩展，因此它似乎与预热/冷却无关。

EvaluationPeriods和DatapointsToAlarm均为1。

经过一些进一步的调查，似乎该警报基于ALB的RequestCount指标时，延迟要比对EC2 CPUUtilization的延迟大得多。有道理吗？

CloudWatch警报触发的AutoScaling操作延迟

0 个答案: