Question

我正在尝试使用Deployment Manager配置创建Stackdriver alert policy。相同的配置首先创建一个resource group和一个notification channel，然后创建基于这些规则的策略：

resources:
- name: test-group
  type: gcp-types/monitoring-v3:projects.groups
  properties:
    displayName: A test group
    filter: >-
        resource.metadata.cloud_account="aproject-id" AND
        resource.type="gce_instance" AND
        resource.metadata.tag."managed"="yes"

- name: test-email-notification
  type: gcp-types/monitoring-v3:projects.notificationChannels
  properties:
    displayName: A test email channel
    type: email
    labels:
      email_address: incidents@example.com

- name: test-alert-policy
  type: gcp-types/monitoring-v3:projects.alertPolicies
  properties:
    enabled: true
    displayName: A test alert policy
    documentation:
      mimeType: text/markdown
      content: "Test incident"
    notificationChannels:
      - $(ref.test-email-notification.name)
    combiner: OR
    conditions:
    - conditionAbsent:
        aggregations:
        - alignmentPeriod: 60s
          perSeriesAligner: ALIGN_RATE
        duration: 300s
        filter: metric.type="compute.googleapis.com/instance/uptime" group.id="$(ref.test-group.id)"
        trigger:
          count: 1
      displayName: The instance is down

该策略的唯一条件是基于资源组的过滤器，即只有该组的成员才能触发此警报。

我正在尝试使用对该网上论坛ID的引用，但它不起作用-"The reference 'id' is invalid, reason: The field 'id' does not exists on the reference schema.

当我尝试使用$(ref.test-group.selfLink)时，也会得到The reference 'selfLink' is invalid, reason: The field 'selfLink' does not exists on the reference schema.

我可以获取组的名称（例如“ projects / aproject-id / groups / 3691870619975147604” ），但是filters仅接受group IDs（例如，仅“ 3691870619975147604 “部分）：

'{"ResourceType":"gcp-types/monitoring-v3:projects.alertPolicies","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"message":"Field alert_policy.conditions[0].condition_absent.filter had an invalid value of \"metric.type=\"compute.googleapis.com/instance/uptime\" group.id=\"projects/aproject-id/groups/3691870619975147604\"\": must specify a restriction on \"resource.type\" in the filter; see \"https://cloud.google.com/monitoring/api/resources\" for a list of available resource types.","status":"INVALID_ARGUMENT","statusMessage":"Bad Request","requestPath":"https://monitoring.googleapis.com/v3/projects/aproject-id/alertPolicies","httpMethod":"POST"}}'

Answer 1

尝试将警报策略替换为以下内容：

- name: test-alert-policy
  type: gcp-types/monitoring-v3:projects.alertPolicies
  properties:
    enabled: true
    displayName: A test alert policy
    documentation:
      mimeType: text/markdown
      content: "Test incident"
    notificationChannels:
      - $(ref.test-email-notification.name)
    combiner: OR
    conditions:
    - conditionAbsent:
        aggregations:
        - alignmentPeriod: 60s
          perSeriesAligner: ALIGN_RATE
        duration: 300s
        filter: metric.type="compute.googleapis.com/instance/uptime" $(ref.test-group.filter)
        trigger:
          count: 1
      displayName: The instance is down
  metadata:
    dependsOn:
    - test-group

这将1）使用dependsOn子句添加到test-group的显式依赖项，并将2）$(ref.test-group.filter)添加到度量过滤器，以便它不严格链接到test-group ，最终包含与test-group相同的所有资源。

由于Deployment Manager资源是并行运行的，因此在尝试创建dependsOn之前必须使用test-group来确保test-alert-policy实例化；显然，Deployment Manager不够聪明，仅凭引用就无法做到这一点。

在GCP部署管理器配置中使用Stackdriver资源组的ID

1 个答案: