我有一个EC2自动扩展组,其扩展由分步策略触发。似乎可以正确检测到策略阈值,但是操作延迟了4-5分钟。它似乎反映在警报日志中:
"newState": {
"stateValue": "ALARM",
"stateReason": "Threshold Crossed: 1 out of the last 1 datapoints [548.0 (03/01/19 20:27:00)] was greater than the threshold (160.0) (minimum 1 datapoint for OK -> ALARM transition).",
"stateReasonData": {
"version": "1.0",
"queryDate": "2019-01-03T20:31:31.936+0000",
"startDate": "2019-01-03T20:27:00.000+0000",
"statistic": "Sum",
"period": 60,
"recentDatapoints": [
548
],
"threshold": 160
}
}
ASG活动日志:
At 2019-01-03T20:31:31Z a monitor alarm scale-out-alarm in state ALARM triggered policy scale-out-policy changing the desired capacity from 1 to 4. At 2019-01-03T20:32:01Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 1 to 4.
请注意,在20:27:00超过了阈值,而在20:31:31采取了行动。尽管我还没有找到有关这些属性的任何文档,但这些似乎与日志中的“ startDate”和“ queryDate”相关。
这仅仅是CloudWatch中的随机延迟问题,还是还有其他原因导致这种延迟?
在此之前的很长一段时间内,ASG都没有得到扩展,因此它似乎与预热/冷却无关。
EvaluationPeriods和DatapointsToAlarm均为1。
经过一些进一步的调查,似乎该警报基于ALB的RequestCount指标时,延迟要比对EC2 CPUUtilization的延迟大得多。有道理吗?