我们要构建具有以下特征的ECS集群:
我们已经在stackoverflow中读过this post,其中说我们需要建立一个私有子网,该私有子网的路由表指向在公共子网中配置的NAT网关,并且该公共子网应指向互联网网关。我们已经有此配置。我们还在路由表中配置了一个S3 vpc端点。
以下,您可以在terraform中看到群集的一些相关配置(为简单起见,我只放置了相关部分):
# Launch template
resource "aws_launch_template" "train-launch-template" {
name_prefix = "{var.project_name}-launch-template-${var.env}"
image_id = "ami-01f62a207c1d180d2"
instance_type = "m5.large"
key_name="XXXXXX"
iam_instance_profile {
name = aws_iam_instance_profile.ecs-instance-profile.name
}
user_data = base64encode(data.template_file.user_data.rendered)
network_interfaces {
associate_public_ip_address = false
security_groups = [aws_security_group.ecs_service.id]
}
}
# Task definition
resource "aws_ecs_task_definition" "task" {
family = "${var.project_name}-${var.env}-train-task"
execution_role_arn = data.aws_iam_role.ecs_task_execution_role.arn
task_role_arn = aws_iam_role.ecs_train_task_role.arn
requires_compatibilities = ["EC2"]
cpu = var.ecs_cpu
network_mode = "awsvpc"
memory = var.ecs_memory
container_definitions = data.template_file.app_definition.rendered
tags = {
Stage = var.env_tag
Project = var.project_name_tag
}
}
# Cluster
resource "aws_ecs_cluster" "cluster" {
name = "${var.project_name}-${var.env}-train-ecs-cluster"
capacity_providers = [aws_ecs_capacity_provider.train-capacity-provider.name]
default_capacity_provider_strategy {
capacity_provider = aws_ecs_capacity_provider.train-capacity-provider.name
}
tags = {
Project = var.project_name_tag
Stage = var.env_tag
}
}
我们还配置了实例所需的所有角色以及访问所需资源(S3,ECR,ECS)的任务。
AMI与ECS优化实例相对应(这是eu-west-1中当前发布的最新版本)。
中的解释,在启动模板中,我们已将公共IP删除到实例中我们已经演化为尝试使之工作的配置,但一次又一次遇到相同的问题:触发任务时,容量提供者启动一个实例,但任务从未放置在容器中实例并无限期地处于PROVISIONING状态。
使用相同的配置,但是将实例放置在公共子网中,任务被放置在容器实例中,但是,正如the first link所警告的那样,任务无法访问Internet。
我们需要一些启示或跟踪。预先谢谢你。
更新:根据要求,我添加了与自动缩放有关的其余部分
resource "aws_autoscaling_group" "train-autoscaling" {
availability_zones = ["eu-west-1b"]
desired_capacity = 0
max_size = 10
min_size = 0
protect_from_scale_in = true
launch_template {
id = aws_launch_template.train-launch-template.id
version = "$Latest"
}
tags = [
{
key = "Project",
value = var.project_name_tag
propagate_at_launch = true
},
{
key = "Stage",
value = var.env_tag
propagate_at_launch = true
}
]
}
resource "aws_ecs_capacity_provider" "train-capacity-provider" {
name = "${var.project_name}-${var.env}-train-capacity-provider"
auto_scaling_group_provider {
auto_scaling_group_arn = aws_autoscaling_group.train-autoscaling.arn
managed_termination_protection = "ENABLED"
managed_scaling {
status = "ENABLED"
target_capacity = 100
maximum_scaling_step_size = 1
minimum_scaling_step_size = 1
}
}
}
data "template_file" "user_data" {
template = "${file("${path.module}/user_data.sh")}"
vars = {
cluster_name = "${var.project_name}-${var.env}-train-ecs-cluster"
}
}
更新2(AWS控制台信息):
更新3:
更新4:
来自容器实例的日志。 ecs-agent.log
level=info time=2020-08-28T11:09:21Z msg="Loading configuration" module=agent.go
level=info time=2020-08-28T11:09:21Z msg="Amazon ECS agent Version: 1.44.1, Commit: 1f05fbf0" module=agent.go
level=info time=2020-08-28T11:09:21Z msg="Image excluded from cleanup: amazon/amazon-ecs-pause:0.1.0" module=docker_image_manager.go
level=info time=2020-08-28T11:09:21Z msg="Image excluded from cleanup: amazon/amazon-ecs-pause:0.1.0" module=docker_image_manager.go
level=info time=2020-08-28T11:09:21Z msg="Image excluded from cleanup: amazon/amazon-ecs-agent:latest" module=docker_image_manager.go
level=info time=2020-08-28T11:09:21Z msg="Creating root ecs cgroup: /ecs" module=init_linux.go
level=info time=2020-08-28T11:09:21Z msg="Creating cgroup /ecs" module=cgroup_controller_linux.go
level=info time=2020-08-28T11:09:21Z msg="Event stream ContainerChange start listening..." module=eventstream.go
level=info time=2020-08-28T11:09:21Z msg="Loading state!" module=state_manager.go
level=info time=2020-08-28T11:09:23Z msg="Registering Instance with ECS" module=agent.go
level=info time=2020-08-28T11:09:23Z msg="Remaining mem: 7680" module=client.go
level=info time=2020-08-28T11:09:23Z msg="Registered container instance with cluster!" module=client.go
level=info time=2020-08-28T11:09:23Z msg="Registration completed successfully. I am running as 'arn:aws:ecs:eu-west-1:XXXXXXXXXXXXXXXX:container-instance/foqum-read-dev-train-ecs-cluster/95559f936f8d44de9373595009fcd588' in cluster 'foqum-read-dev-train-ecs-cluster'" module=agent.go
level=info time=2020-08-28T11:09:23Z msg="Beginning Polling for updates" module=agent.go
level=info time=2020-08-28T11:09:23Z msg="Initializing stats engine" module=engine.go
level=info time=2020-08-28T11:09:23Z msg="Event stream DeregisterContainerInstance start listening..." module=eventstream.go
level=info time=2020-08-28T11:09:23Z msg="Establishing a Websocket connection to https://ecs-t-X.eu-west-1.amazonaws.com/ws?agentHash=1f05fbf0&agentVersion=1.44.1&cluster=XXXXXXXXX-cluster&containerInstance=arn%3Aaws%3Aecs%3Aeu-west-1%3AXXXXXXXX%3Acontainer-instance%2FXXXXXXXX-cluster%2F95559fXXXXXXde9373595009fcd588&dockerVersion=19.03.6-ce" module=client.go
level=info time=2020-08-28T11:09:23Z msg="NO_PROXY set:XXX.254.169.XXXX,XXXX.254.XXX.2,/var/run/docker.sock" module=client.go
level=info time=2020-08-28T11:09:23Z msg="Establishing a Websocket connection to https://ecs-a-X.eu-west-1.amazonaws.com/ws?agentHash=1f05fbf0&agentVersion=1.44.1&clusterArn=XXXXX-ecs-cluster&containerInstanceArn=arn%3Aaws%3Aecs%3Aeu-west-1%XXXXXX%3Acontainer-instance%2FXXXXX-ecs-cluster%2F9XXXXX6f8d44de9373595009fcd588&dockerVersion=DockerVersion%3A+19.03.6-ce&sendCredentials=true&seqNum=1" module=client.go
level=info time=2020-08-28T11:09:23Z msg="Connected to TCS endpoint" module=handler.go
level=info time=2020-08-28T11:09:23Z msg="Connected to ACS endpoint" module=acs_handler.go
level=info time=2020-08-28T11:20:04Z msg="TCS Websocket connection closed for a valid reason" module=handler.go
level=info time=2020-08-28T11:20:04Z msg="Establishing a Websocket connection to https://ecs-t-X.eu-west-1.amazonaws.com/ws?agentHash=1f05fbf0&agentVersion=1.44.1&cluster=XXXXXXXecs-cluster&containerInstance=arn%3Aaws%3Aecs%3Aeu-west-1%3AXXXXXX3Acontainer-instance%2FZZZXXXXX-ecs-cluster%2F95XXX936f8d44de9373595009fcd588&dockerVersion=19.03.6-ce" module=client.go
level=info time=2020-08-28T11:20:04Z msg="Connected to TCS endpoint" module=handler.go
ecs-init.log
2020-08-28T11:09:19Z [INFO] pre-start
2020-08-28T11:09:20Z [INFO] start
2020-08-28T11:09:20Z [INFO] No existing agent container to remove.
2020-08-28T11:09:20Z [INFO] Starting Amazon Elastic Container Service Agent
答案 0 :(得分:2)
最后!解决了这个谜!
问题不在群集配置中。通过ECS API调用run_task时,您需要指定任务应运行的子网。
我们的代码在此字段中设置了公共子网之一的值。因此,当我们将容器实例更改为与该公共子网相对应的可用区域时,任务就被放置了。
从代码中更改此调用即可正确放置任务,并且可以访问Internet。