我通过CLOUDFORMATION创建了具有集合EC2,Redshift,VPC等的AWS基础设施。现在,我想以特定的逆序删除它。 Exa。所有资源均取决于VPC。 VPC应该在最后删除。但是不知何故,每个堆栈都在删除,但VPC堆栈却没有通过python BOTO3删除,它显示了一些子网或网络接口相关性错误。但是,当我尝试通过控制台删除时,它会成功删除它。 有人遇到过这个问题吗?
我试图删除附属于负载平衡器的所有内容。但是VPC仍未删除。
答案 0 :(得分:2)
AWS CloudFormation基于模板中的DependsOn
引用和资源之间的引用在资源之间创建依赖关系图。
然后尝试并行部署资源,但要考虑依赖关系。
例如,子网可以定义为:
Subnet1:
Type: AWS::EC2::Subnet
Properties:
CidrBlock: 10.0.0.0/24
VpcId: !Ref ProdVPC
在这种情况下,有一个对ProdVPC
的明确引用,因此CloudFormation仅在创建Subnet1
之后创建ProdVPC
。
删除 CloudFormation堆栈时,将应用相反的逻辑。在这种情况下,Subnet1
将在删除ProdVPC
之前被删除。
但是, CloudFormation无法识别在堆栈外部创建的资源。这意味着,如果在子网内创建资源(例如Amazon EC2实例),则堆栈删除将失败,因为当有一个使用EC2实例的子网(或者更确切地说,将一个ENI连接到该实例)时,不能删除该子网。
在这种情况下,您将需要手动删除导致“删除失败”的资源,然后再次尝试删除命令。
查找此类资源的一种好方法是查看EC2管理控制台的网络接口部分。确保没有接口连接到VPC。
答案 1 :(得分:0)
正如您指定的那样,删除包含Lambda的堆栈中的VPC本身存在问题,而Lambda本身在VPC中,这很可能是因为Lambda生成了连接VPC中其他资源的网络接口。
从技术上讲,当从堆栈中取消部署lambda时,应自动删除这些网络接口,但是根据我的经验,我观察到孤立的ENI不会使VPC取消部署。
由于这个原因,我创建了一个自定义资源支持的lambda,它在取消部署VPC中的所有lambda之后清除了ENI。
这是云形成部分,您可以在其中设置自定义资源并传递VPC ID
##############################################
# #
# Custom resource deleting net interfaces #
# #
##############################################
NetInterfacesCleanupFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: src
Handler: cleanup/network_interfaces.handler
Role: !GetAtt BasicLambdaRole.Arn
DeploymentPreference:
Type: AllAtOnce
Timeout: 900
PermissionForNewInterfacesCleanupLambda:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:invokeFunction
FunctionName:
Fn::GetAtt: [ NetInterfacesCleanupFunction, Arn ]
Principal: lambda.amazonaws.com
InvokeLambdaFunctionToCleanupNetInterfaces:
DependsOn: [PermissionForNewInterfacesCleanupLambda]
Type: Custom::CleanupNetInterfacesLambda
Properties:
ServiceToken: !GetAtt NetInterfacesCleanupFunction.Arn
StackName: !Ref AWS::StackName
VPCID:
Fn::ImportValue: !Sub '${MasterStack}-Articles-VPC-Ref'
Tags:
'owner': !Ref StackOwner
'task': !Ref Task
这是对应的lambda。此lambda尝试3次分离和删除孤立的网络接口,如果失败则失败,这意味着仍有一个lambda生成新的网络接口,您需要为此进行调试。
import boto3
from botocore.exceptions import ClientError
from time import sleep
# Fix this wherever your custom resource handler code is
from common import cfn_custom_resources as csr
import sys
MAX_RETRIES = 3
client = boto3.client('ec2')
def handler(event, context):
vpc_id = event['ResourceProperties']['VPCID']
if not csr.__is_valid_event(event, context):
csr.send(event, context, FAILED, validate_response_data(result))
return
elif event['RequestType'] == 'Create' or event['RequestType'] == 'Update':
result = {'result': 'Don\'t trigger the rest of the code'}
csr.send(event, context, csr.SUCCESS, csr.validate_response_data(result))
return
try:
# Get all network intefaces for given vpc which are attached to a lambda function
interfaces = client.describe_network_interfaces(
Filters=[
{
'Name': 'description',
'Values': ['AWS Lambda VPC ENI*']
},
{
'Name': 'vpc-id',
'Values': [vpc_id]
},
],
)
failed_detach = list()
failed_delete = list()
# Detach the above found network interfaces
for interface in interfaces['NetworkInterfaces']:
detach_interface(failed_detach, interface)
# Try detach a second time and delete each simultaneously
for interface in interfaces['NetworkInterfaces']:
detach_and_delete_interface(failed_detach, failed_delete, interface)
if not failed_detach or not failed_delete:
result = {'result': 'Network interfaces detached and deleted successfully'}
csr.send(event, context, csr.SUCCESS, csr.validate_response_data(result))
else:
result = {'result': 'Network interfaces couldn\'t be deleted completely'}
csr.send(event, context, csr.FAILED, csr.validate_response_data(result))
# print(response)
except Exception:
print("Unexpected error:", sys.exc_info())
result = {'result': 'Some error with the process of detaching and deleting the network interfaces'}
csr.send(event, context, csr.FAILED, csr.validate_response_data(result))
def detach_interface(failed_detach, interface):
try:
if interface['Status'] == 'in-use':
detach_response = client.detach_network_interface(
AttachmentId=interface['Attachment']['AttachmentId'],
Force=True
)
# Sleep for 1 sec after every detachment
sleep(1)
print(f"Detach response for {interface['NetworkInterfaceId']}- {detach_response}")
if 'HTTPStatusCode' not in detach_response['ResponseMetadata'] or \
detach_response['ResponseMetadata']['HTTPStatusCode'] != 200:
failed_detach.append(detach_response)
except ClientError as e:
print(f"Exception details - {sys.exc_info()}")
def detach_and_delete_interface(failed_detach, failed_delete, interface, retries=0):
detach_interface(failed_detach, interface)
sleep(retries + 1)
try:
delete_response = client.delete_network_interface(
NetworkInterfaceId=interface['NetworkInterfaceId'])
print(f"Delete response for {interface['NetworkInterfaceId']}- {delete_response}")
if 'HTTPStatusCode' not in delete_response['ResponseMetadata'] or \
delete_response['ResponseMetadata']['HTTPStatusCode'] != 200:
failed_delete.append(delete_response)
except ClientError as e:
print(f"Exception while deleting - {str(e)}")
print()
if retries <= MAX_RETRIES:
if e.response['Error']['Code'] == 'InvalidNetworkInterface.InUse' or \
e.response['Error']['Code'] == 'InvalidParameterValue':
retries = retries + 1
print(f"Retry {retries} : Interface in use, deletion failed, retrying to detach and delete")
detach_and_delete_interface(failed_detach, failed_delete, interface, retries)
else:
raise RuntimeError("Code not found in error")
else:
raise RuntimeError("Max Number of retries exhausted to remove the interface")
到lambda的链接为https://gist.github.com/revolutionisme/8ec785f8202f47da5517c295a28c7cb5
有关在VPC中配置lambda的更多信息-https://docs.aws.amazon.com/lambda/latest/dg/vpc.html