我正在尝试创建ECS集群,并且我手动构建了具有3个公共子网和3个私有子网的VPC。所有3个公用子网都以0.0.0.0/0附加了IGW,所有3个私有子网在路由表中均附加有0.0.0.0/0的NAT网关。 3个NAT网关分别在每个公共子网中。
我已经用我现在尝试使用的相同CloudFormation模板创建了另一个ECS集群,并且一切正常。
我已经比较了第一和第二VPC之间的设置(失败),并且所有设置(IGW,NAT网关,路由表,NACL,SG)都是相同的,当然IP已被调整为第二VPC的IP。当我尝试在第二个VPC中创建ECS(失败)时,专用子网中的EC2实例无法连接到Amazon存储库,随后整个堆栈都回滚了,因为从未将EC2实例的信号发送到Auto Scaling Group。
之后,我检查了EC2实例的系统日志,他们无法安装Amazon Agent。以下是日志摘录:
Starting cloud-init: Cloud-init v. 0.7.6 running 'modules:config' at Mon, 20 Aug 2018 06:38:04 +0000. Up 10.06 seconds.
Loaded plugins: priorities, update-motd, upgrade-helper
One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Disable the repository, so yum won't use it by default. Yum will then
just ignore the repository until you permanently enable it again or use
--enablerepo for temporary usage:
yum-config-manager --disable <repoid>
4. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5001 milliseconds')
Aug 20 06:38:20 cloud-init[2116]: util.py[WARNING]: Package upgrade failed
Aug 20 06:38:20 cloud-init[2116]: cc_package_update_upgrade_install.py[WARNING]: 1 failed with exceptions, re-raising the last one
Aug 20 06:38:20 cloud-init[2116]: util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_update_upgrade_install' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_package_update_upgrade_install.pyc'>) failed
Generating SSH2 ED25519 host key: [ OK ]
Starting sshd: [ OK ]
ntpdate: Synchronizing with time server: [ OK ]
Starting ntpd: [ OK ]
Starting sendmail: [ OK ]
Starting sm-client: [ OK ]
Starting crond: [ OK ]
Starting cgconfig service: [ OK ]
Starting docker: .[ OK ]
Starting cloud-init: Cloud-init v. 0.7.6 running 'modules:final' at Mon, 20 Aug 2018 06:38:25 +0000. Up 29.91 seconds.
Loaded plugins: priorities, update-motd, upgrade-helper
Examining /var/tmp/yum-root-i85tqq/amazon-ssm-agent.rpm: amazon-ssm-agent-2.3.13.0-1.x86_64
Marking /var/tmp/yum-root-i85tqq/amazon-ssm-agent.rpm to be installed
Resolving Dependencies
One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Disable the repository, so yum won't use it by default. Yum will then
just ignore the repository until you permanently enable it again or use
--enablerepo for temporary usage:
yum-config-manager --disable <repoid>
4. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5000 milliseconds')
Loaded plugins: priorities, update-motd, upgrade-helper
[ 53.291581] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 53.297948] Bridge firewalling registered
[ 53.304776] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[ 53.318481] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 53.510300] Initializing XFRM netlink socket
[ 53.515251] Netfilter messages via NETLINK v0.30.
[ 53.518920] ctnetlink v0.93: registering with nfnetlink.
[ 53.688086] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Disable the repository, so yum won't use it by default. Yum will then
just ignore the repository until you permanently enable it again or use
--enablerepo for temporary usage:
yum-config-manager --disable <repoid>
4. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5000 milliseconds')
Loaded plugins: priorities, update-motd, upgrade-helper
One of the configured repositories failed (Unknown),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Disable the repository, so yum won't use it by default. Yum will then
just ignore the repository until you permanently enable it again or use
--enablerepo for temporary usage:
yum-config-manager --disable <repoid>
4. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: amzn-main/latest
Could not retrieve mirrorlist http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list error was
12: Timeout on http://repo.eu-central-1.amazonaws.com/latest/main/mirror.list: (28, 'Connection timed out after 5001 milliseconds')
/var/lib/cloud/instance/scripts/part-001: line 9: /opt/aws/bin/cfn-init: No such file or directory
/var/lib/cloud/instance/scripts/part-001: line 10: /opt/aws/bin/cfn-signal: No such file or directory
Aug 20 06:39:13 cloud-init[2286]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [127]
Aug 20 06:39:13 cloud-init[2286]: cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
Aug 20 06:39:13 cloud-init[2286]: util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
我已经检查了NACL,对于“入站”和“出站”,所有内容均设置为0.0.0.0/0和ALLOW。
对于第一个VPC,我正在使用ECS优化的AMI和 t2.large (没有任何问题),而对于第二个 c5.xlarge (导致问题)。
是什么仍然导致EC2无法访问Amazon存储库?
因此,后来我发现第二台VPC附加了S3端点。经过更多研究后,我在LinkedIn上发现了一篇不错的文章,内容如下:
Amazon Linux存储库托管在S3上,因此 必须在S3端点策略中允许对其进行访问。
因此,当您启动yum时,它将使用本地DNS欺骗的魔术来 路由到内部S3端点
我继续更新CloudFormation模板,并在下面的LaunchConfiguration中添加了其他策略,但这没有帮助:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:Get*",
"s3:List*"
],
"Resource": [
"arn:aws:s3:::repo.eu-central-1.amazonaws.com",
"arn:aws:s3:::repo.eu-central-1.amazonaws.com/*"
],
"Effect": "Allow"
}
]
}
“端点策略”如下所示:
{
"Statement": [
{
"Action": "*",
"Effect": "Allow",
"Resource": "*",
"Principal": "*"
}
]
}