Question

我在ECS群集上配置了针对服务和容量提供商的目标跟踪策略，用于管理ASG自动缩放

在我的群集中，服务中的最小和最大任务数与ASG中的最小和最大容量数相同。

按比例执行操作时，任务减少到最小数量。但是ASG仍然有1个或多个未使用的（任务未放置在此EC2实例上）ec2实例

如何使用容量提供程序配置群集以执行扩展以最小化ASG容量？


# CLUSTER
resource "aws_ecs_cluster" "default" {
  name               = local.name
  capacity_providers = [aws_ecs_capacity_provider.asg.name]
  tags               = local.tags

  default_capacity_provider_strategy {
    base = 0
    capacity_provider = aws_ecs_capacity_provider.asg.name
    weight = 1
  }
}

# SERVICE
resource "aws_ecs_service" "ecs_service" {
  name            = "${local.name}-service"
  cluster         = aws_ecs_cluster.default.id
  task_definition = aws_ecs_task_definition.ecs_task.arn
  health_check_grace_period_seconds = 60

  deployment_maximum_percent         = 50
  deployment_minimum_healthy_percent = 100


  load_balancer {
    target_group_arn = element(module.aws-alb-common-module.target_group_arns, 1)
    container_name   = local.name
    container_port   = 8080
  }

  lifecycle {
    ignore_changes = [desired_count, task_definition]
  }


}

# CAPACITY PROVIDER
resource "aws_ecs_capacity_provider" "asg" {
  name = aws_autoscaling_group.ecs_nodes.name

  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.ecs_nodes.arn
    managed_termination_protection = "DISABLED"

    managed_scaling {
      maximum_scaling_step_size = 10
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      target_capacity           = 100
    }
  }
}

# SERVICE AUTOSCALING POLICY

resource "aws_appautoscaling_target" "ecs_target" {
  max_capacity       = 20
  min_capacity       = 2
  resource_id        = "service/${local.name}/${aws_ecs_service.ecs_service.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "ecs_policy" {
  name = "${local.name}-scale-policy"
  policy_type = "TargetTrackingScaling"
  resource_id = aws_appautoscaling_target.ecs_target.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
  service_namespace = aws_appautoscaling_target.ecs_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }

    target_value = 2

  }


# ASG
resource "aws_autoscaling_group" "ecs_nodes" {
  name_prefix           = "${local.name}-node"
  max_size              = 20
  min_size              = 2
  vpc_zone_identifier   = local.subnets_ids
  protect_from_scale_in = false

  mixed_instances_policy {
    instances_distribution {
      on_demand_percentage_above_base_capacity = local.spot
    }
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.node.id
        version            = "$Latest"
      }

      dynamic "override" {
        for_each = local.instance_types
        content {
          instance_type     = override.key
          weighted_capacity = override.value
        }
      }
    }
  }

  lifecycle {
    create_before_destroy = true
  }

  tag {
    key                 = "AmazonECSManaged"
    propagate_at_launch = true
    value               = ""
  }
}

Answer 1

原因可能是predefined_metric_specification块target_value = 2是CPU使用率触发级别（百分比），而不是最小容量。实例可能由使用少量CPU的后台进程保持活动状态。

顺便说一句，managed_termination_protection设置可能值得重新启用。

根据25/09的评论进行更新：

好吧，我完全有可能在这里错了（特别是因为我自己还没有使用过此功能），如果是这样，我很乐意向其学习。

但这是我阅读与您的配置有关的文档的方式：关键短语是目标容量值用作Amazon中使用的 CloudWatch指标的目标值ECS管理的目标跟踪扩展策略。您选择的cloudwatch指标是ECSServiceAverageCPUUtilization，将在How is ECSServiceAverageCPUUtilization metric caluclated?中进行讨论。因此，您配置的target = 2意味着平均CPU使用率为2％。

我承认我错误地认为CPU指标是EC2实例级别的平均值。但是无论哪种情况，将触发值设置为2％CPU都可能在不需要的情况下导致/保持横向扩展。

您也可能找到了所见行为的简单解释，即，但并非始终保证此行为语句。但是，我怀疑这种说法更适用于目标为100％的极端示例，在该示例中，人们可以预期看到异常，就像在类似的极端2％时可以预期的那样。

ECS与容量提供者一起扩展到ASG的最小容量

1 个答案: