使用ARM模板偶尔部署AKS群集失败,并显示PutNetworkSecurityGroupOperation错误

时间:2018-12-10 20:43:09

标签: azure azure-resource-manager azure-kubernetes

我正在使用Azure模板部署AKS群集。大多数情况下,部署AKS群集都会成功。但是有时,使用相同的输入,部署将失败,并显示Operation PutNetworkSecurityGroupOperation (XXXXXXXX) was canceled and superseded by operation PutNetworkSecurityGroupOperation。天蓝色的模板和部署错误包括在下面。什么原因可能导致此问题?

模板

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "resourceGroupName": {
      "type": "string",
      "metadata": {
        "description": "The resource group name."
      }
    },
    "subscriptionId": {
      "type": "string",
      "metadata": {
        "description": "The subscription id."
      }
    },
    "region": {
      "type": "string",
      "metadata": {
        "description": "The region of AKS resource."
      }
    },
    "gbPerNode": {
      "type": "int",
      "defaultValue": 20,
      "metadata": {
        "description": "Disk size (in GB) to provision for each of the agent pool nodes. This value ranges from 0 to 1023. Specifying 0 will apply the default disk size for that agentVMSize."
      },
      "minValue": 1,
      "maxValue": 1023
    },
    "numNodes": {
      "type": "int",
      "defaultValue": 3,
      "metadata": {
        "description": "The number of agent nodes for the cluster."
      },
      "minValue": 1,
      "maxValue": 50
    },
    "machineType": {
      "type": "string",
      "defaultValue": "Standard_D2_v2",
      "metadata": {
        "description": "The size of the Virtual Machine."
      }
    },
    "servicePrincipalClientId": {
      "metadata": {
        "description": "Client ID (used by cloudprovider)"
      },
      "type": "securestring"
    },
    "servicePrincipalClientSecret": {
      "metadata": {
        "description": "The Service Principal Client Secret."
      },
      "type": "securestring"
    },
    "osType": {
      "type": "string",
      "defaultValue": "Linux",
      "allowedValues": [
        "Linux"
      ],
      "metadata": {
        "description": "The type of operating system."
      }
    },
    "kubernetesVersion": {
      "type": "string",
      "defaultValue": "1.11.5",
      "metadata": {
        "description": "The version of Kubernetes."
      }
    },
    "maxPods": {
      "type": "int",
      "defaultValue": 30,
      "metadata": {
        "description": "Maximum number of pods that can run on a node."
      }
    }
  },
  "variables": {
    "deploymentEventTopic": "deploymenteventtopic",
    "resourceGroupName": "[parameters('resourceGroupName')]",
    "omswsName": "[concat('omsws-', parameters('resourceGroupName'))]",
    "clustername": "cluster"
  },
  "resources": [
    {
      "apiVersion": "2018-03-31",
      "type": "Microsoft.ContainerService/managedClusters",
      "location": "[parameters('region')]",
      "name": "[variables('clustername')]",
      "properties": {
        "kubernetesVersion": "[parameters('kubernetesVersion')]",
        "enableRBAC": true,
        "dnsPrefix": "clust",
        "addonProfiles": {
          "httpApplicationRouting": {
            "enabled": true
          },
          "omsagent": {
            "enabled": false
          }
        },
        "agentPoolProfiles": [
          {
            "name": "agentpool",
            "osDiskSizeGB": "[parameters('gbPerNode')]",
            "count": "[parameters('numNodes')]",
            "vmSize": "[parameters('machineType')]",
            "osType": "[parameters('osType')]",
            "storageProfile": "ManagedDisks"
          }
        ],
        "servicePrincipalProfile": {
          "ClientId": "[parameters('servicePrincipalClientId')]",
          "Secret": "[parameters('servicePrincipalClientSecret')]"
        },
        "networkProfile": {
          "networkPlugin": "kubenet"
        }
      }
    }
  ]
}

错误

{
   "code":"DeploymentFailed",
   "message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.",
   "details":[
      {
         "code":"Conflict",
         "message":"{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n \"details\": [\r\n {\r\n \"code\": \"Canceled\",\r\n \"message\": \"Operation was canceled.\",\r\n \"details\": [\r\n {\r\n \"code\": \"Canceled\",\r\n \"message\": \"Operation was canceled.\",\r\n \"details\": [\r\n {\r\n \"code\": \"CanceledAndSupersededDueToAnotherOperation\",\r\n \"message\": \"Operation PutNetworkSecurityGroupOperation (XXXXXXX) was canceled and superseded by operation PutNetworkSecurityGroupOperation (XXXXX).\"\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n ]\r\n }\r\n}"
      }
   ]
}

1 个答案:

答案 0 :(得分:0)

原因似乎是在启用 httpApplicationRouting 路由的情况下部署AKS群集。要解决此问题,我在模板中部署了没有 httpApplicationRouting 的群集,然后在使用Azure Java SDK部署群集后以编程方式启用了群集。

onEditClick() {
  this.editing = true;
  this.changeDetectorRef.detectChanges();
  this.input.nativeElement.focus();
}

我与Azure一起开了一张支持票,支持工程师确认这是AKS团队正在修复的错误。因此,如果您不想实施此替代方法,应该尽快解决。