AWS CloudWatch警报,帮助解决错误 - 未选中:初始警报创建

时间:2018-05-08 20:50:00

标签: amazon-web-services amazon-cloudwatch autoscaling

我经常处于INSUFFICENT_DATA状态,因为我的缩小了云计算警报。 cloudwatch警报附加到我的自动缩放组。在过去的3天里,我已经让我的缩小警报处于这种状态,因此它已经完成了初始化。

在我的警报中,它给出了

的原因
  

未选中:初始警报创建

这是来自aws文档: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html

  

INSUFFICIENT_DATA - 警报刚刚启动,指标不可用,或者指标没有足够的数据来确定警报状态

以下是我的cloudformation模板中的一个片段,它可以让云观察报警。这是对流层语法,但它应该很容易阅读:

template.add_resource(Alarm(
    "CPUUtilizationLowAlarm",
    ActionsEnabled=True,
    AlarmDescription="Scale down for average CPUUtilization <= 30%",
    Namespace="AWS/EC2",
    MetricName="CPUUtilization",
    Statistic="Average",
    Period="300",
    EvaluationPeriods="3",
    Threshold="30",
    Unit="Percent",
    ComparisonOperator="LessThanOrEqualToThreshold",
    AlarmActions=[Ref("ScaleDownPolicy")],
    Dimensions=[
        MetricDimension(
            Name="AutoscalingGroupName",
            Value=Ref("AutoScalingGroup")
        )
    ]
))
template.add_resource(ScalingPolicy(
    "ScaleDownPolicy",                                                      #Simple reference value, nothing special
    AdjustmentType="ChangeInCapacity",                                      #Modify the asg capacity
    AutoScalingGroupName=Ref("AutoScalingGroup"),                           #What asg to modify capacity
    PolicyType="SimpleScaling",                                             #Read about it here: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scale-based-on-demand.html
    Cooldown="1700",                                                        #How long to wait before checking status' again
    ScalingAdjustment=Ref("DownscalingCount")                               #(Must be a negative number!!) How much to scale down
))

正如您所看到的,我正在根据CPUUtilization&lt; = 30%进行缩减。这是我能看到的有效指标。我已经阅读了这个堆栈溢出答案,但它似乎并不适用于这种情况: Amazon EC2 AutoScaling CPUUtilization Alarm- INSUFFICIENT DATA

我做的几乎完全相同,但使用&#34; Step Scaling&#34;而不是&#34;简单缩放&#34;,如上所述,但它实际上适合我。以下是我的云计算模板的片段,用于我的步长缩放警报(向上扩展):

template.add_resource(Alarm(
    "CPUUtilizationHighAlarm",
    ActionsEnabled=True,
    AlarmDescription="Scale up for average CPUUtilization >= 50%",
    MetricName="CPUUtilization",
    Namespace="AWS/EC2",
    Statistic="Average",
    Period="300",
    EvaluationPeriods="1",
    Threshold="50",
    Unit="Percent",
    ComparisonOperator="GreaterThanOrEqualToThreshold",
    AlarmActions=[Ref("ScaleUpPolicy")],
    Dimensions=[
        MetricDimension(
            Name="AutoScalingGroupName",
            Value=Ref("AutoScalingGroup")
        )
    ]
))
template.add_resource(ScalingPolicy(
    "ScaleUpPolicy",
    AdjustmentType="ChangeInCapacity",
    AutoScalingGroupName=Ref("AutoScalingGroup"),                           #What group to attach this to
    EstimatedInstanceWarmup="1700",                                         #How long it will take before instance is ready for traffic
    MetricAggregationType="Average",                                        #Breach if average is above threshold
    PolicyType="StepScaling",                                               #Read above step scaling here: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scale-based-on-demand.html
    StepAdjustments=[                                                       
        StepAdjustments(
            MetricIntervalLowerBound="0",                                   #From 50 (Defined in alarm as 50% CPU)
            MetricIntervalUpperBound="20",                                  #To 70%
            ScalingAdjustment="1"                                           #Scale up 1 instance
        ),
        StepAdjustments(
            MetricIntervalLowerBound="20",                                  #from 70%
            MetricIntervalUpperBound="40",                                  #to 90%
            ScalingAdjustment="2"                                           #Scale up 2 instances
        ),
        StepAdjustments(
            MetricIntervalLowerBound="40",                                  #From 90% or above (Defined in alarm)
            ScalingAdjustment="3"                                           #Scale up 2 instances
        )
    ]
))

我在缩小警报时失去了我配置错误的内容。如果有人有任何建议或帮助那就太棒了。

1 个答案:

答案 0 :(得分:1)

我发现了问题...

错误在于:

MetricDimension(
        Name="AutoscalingGroupName",
        Value=Ref("AutoScalingGroup")
    )

Name应为AutoScalingGroupName NOT AutoscalingGroupName。它将尝试生成新维度,而不是从自动缩放组中正确拉出。所以它不会抛出错误并且会将所有内容都旋转好吧它只是没有数据可以提取。因此,直到时间结束时,它将保持INSUFFICENT_DATA状态。

资本“S”......