CloudFormation堆栈删除无法删除VPC

时间:2019-04-11 04:12:41

标签: python-3.x amazon-web-services amazon-cloudformation boto3 amazon-vpc

我通过CLOUDFORMATION创建了具有集合EC2,Redshift,VPC等的AWS基础设施。现在,我想以特定的逆序删除它。 Exa。所有资源均取决于VPC。 VPC应该在最后删除。但是不知何故,每个堆栈都在删除,但VPC堆栈却没有通过python BOTO3删除,它显示了一些子网或网络接口相关性错误。但是,当我尝试通过控制台删除时,它会成功删除它。 有人遇到过这个问题吗?

我试图删除附属于负载平衡器的所有内容。但是VPC仍未删除。

2 个答案:

答案 0 :(得分:2)

AWS CloudFormation基于模板中的DependsOn引用和资源之间的引用在资源之间创建依赖关系图。

然后尝试并行部署资源,但要考虑依赖关系。

例如,子网可以定义为:

Subnet1:
    Type: AWS::EC2::Subnet
    Properties:
      CidrBlock: 10.0.0.0/24
      VpcId: !Ref ProdVPC

在这种情况下,有一个对ProdVPC的明确引用,因此CloudFormation仅在创建Subnet1之后创建ProdVPC

删除 CloudFormation堆栈时,将应用相反的逻辑。在这种情况下,Subnet1将在删除ProdVPC之前被删除。

但是, CloudFormation无法识别在堆栈外部创建的资源。这意味着,如果在子网内创建资源(例如Amazon EC2实例),则堆栈删除将失败,因为当有一个使用EC2实例的子网(或者更确切地说,将一个ENI连接到该实例)时,不能删除该子网。

在这种情况下,您将需要手动删除导致“删除失败”的资源,然后再次尝试删除命令。

查找此类资源的一种好方法是查看EC2管理控制台的网络接口部分。确保没有接口连接到VPC。

答案 1 :(得分:0)

正如您指定的那样,删除包含Lambda的堆栈中的VPC本身存在问题,而Lambda本身在VPC中,这很可能是因为Lambda生成了连接VPC中其他资源的网络接口。

从技术上讲,当从堆栈中取消部署lambda时,应自动删除这些网络接口,但是根据我的经验,我观察到孤立的ENI不会使VPC取消部署。

由于这个原因,我创建了一个自定义资源支持的lambda,它在取消部署VPC中的所有lambda之后清除了ENI。

这是云形成部分,您可以在其中设置自定义资源并传递VPC ID

##############################################
#                                            #
#  Custom resource deleting net interfaces   #
#                                            #
##############################################

  NetInterfacesCleanupFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src
      Handler: cleanup/network_interfaces.handler
      Role: !GetAtt BasicLambdaRole.Arn
      DeploymentPreference:
        Type: AllAtOnce
      Timeout: 900

  PermissionForNewInterfacesCleanupLambda:
    Type: AWS::Lambda::Permission
    Properties:
      Action: lambda:invokeFunction
      FunctionName:
        Fn::GetAtt: [ NetInterfacesCleanupFunction, Arn ]
      Principal: lambda.amazonaws.com

  InvokeLambdaFunctionToCleanupNetInterfaces:
    DependsOn: [PermissionForNewInterfacesCleanupLambda]
    Type: Custom::CleanupNetInterfacesLambda
    Properties:
      ServiceToken: !GetAtt NetInterfacesCleanupFunction.Arn
      StackName: !Ref AWS::StackName
      VPCID:
        Fn::ImportValue: !Sub '${MasterStack}-Articles-VPC-Ref'
      Tags:
        'owner': !Ref StackOwner
        'task': !Ref Task

这是对应的lambda。此lambda尝试3次分离和删除孤立的网络接口,如果失败则失败,这意味着仍有一个lambda生成新的网络接口,您需要为此进行调试。

import boto3
from botocore.exceptions import ClientError
from time import sleep

# Fix this wherever your custom resource handler code is
from common import cfn_custom_resources as csr
import sys

MAX_RETRIES = 3
client = boto3.client('ec2')


def handler(event, context):

    vpc_id = event['ResourceProperties']['VPCID']

    if not csr.__is_valid_event(event, context):
        csr.send(event, context, FAILED, validate_response_data(result))
        return
    elif event['RequestType'] == 'Create' or event['RequestType'] == 'Update':
        result = {'result': 'Don\'t trigger the rest of the code'}
        csr.send(event, context, csr.SUCCESS, csr.validate_response_data(result))
        return
    try:
        # Get all network intefaces for given vpc which are attached to a lambda function
        interfaces = client.describe_network_interfaces(
            Filters=[
                {
                    'Name': 'description',
                    'Values': ['AWS Lambda VPC ENI*']
                },
                {
                    'Name': 'vpc-id',
                    'Values': [vpc_id]
                },
            ],
        )

        failed_detach = list()
        failed_delete = list()

        # Detach the above found network interfaces
        for interface in interfaces['NetworkInterfaces']:
            detach_interface(failed_detach, interface)

        # Try detach a second time and delete each simultaneously
        for interface in interfaces['NetworkInterfaces']:
            detach_and_delete_interface(failed_detach, failed_delete, interface)

        if not failed_detach or not failed_delete:
            result = {'result': 'Network interfaces detached and deleted successfully'}
            csr.send(event, context, csr.SUCCESS, csr.validate_response_data(result))
        else:
            result = {'result': 'Network interfaces couldn\'t be deleted completely'}
            csr.send(event, context, csr.FAILED, csr.validate_response_data(result))
            # print(response)
    except Exception:
        print("Unexpected error:", sys.exc_info())
        result = {'result': 'Some error with the process of detaching and deleting the network interfaces'}
        csr.send(event, context, csr.FAILED, csr.validate_response_data(result))


def detach_interface(failed_detach, interface):
    try:

        if interface['Status'] == 'in-use':
            detach_response = client.detach_network_interface(
                AttachmentId=interface['Attachment']['AttachmentId'],
                Force=True
            )

            # Sleep for 1 sec after every detachment
            sleep(1)

            print(f"Detach response for {interface['NetworkInterfaceId']}- {detach_response}")

            if 'HTTPStatusCode' not in detach_response['ResponseMetadata'] or \
                    detach_response['ResponseMetadata']['HTTPStatusCode'] != 200:
                failed_detach.append(detach_response)
    except ClientError as e:
        print(f"Exception details - {sys.exc_info()}")


def detach_and_delete_interface(failed_detach, failed_delete, interface, retries=0):

    detach_interface(failed_detach, interface)

    sleep(retries + 1)

    try:
        delete_response = client.delete_network_interface(
            NetworkInterfaceId=interface['NetworkInterfaceId'])

        print(f"Delete response for {interface['NetworkInterfaceId']}- {delete_response}")
        if 'HTTPStatusCode' not in delete_response['ResponseMetadata'] or \
                delete_response['ResponseMetadata']['HTTPStatusCode'] != 200:
            failed_delete.append(delete_response)
    except ClientError as e:
        print(f"Exception while deleting - {str(e)}")
        print()
        if retries <= MAX_RETRIES:
            if e.response['Error']['Code'] == 'InvalidNetworkInterface.InUse' or \
                    e.response['Error']['Code'] == 'InvalidParameterValue':
                retries = retries + 1
                print(f"Retry {retries} : Interface in use, deletion failed, retrying to detach and delete")
                detach_and_delete_interface(failed_detach, failed_delete, interface, retries)
            else:
                raise RuntimeError("Code not found in error")
        else:
            raise RuntimeError("Max Number of retries exhausted to remove the interface")

到lambda的链接为https://gist.github.com/revolutionisme/8ec785f8202f47da5517c295a28c7cb5

有关在VPC中配置lambda的更多信息-https://docs.aws.amazon.com/lambda/latest/dg/vpc.html