谷歌云监控和每个事件的通知数量

时间:2021-03-18 20:55:42

标签: google-cloud-platform terraform monitoring policy google-cloud-monitoring

我正在尝试通过 terraform 设置谷歌云作曲家监视器,这是我的“helloworld”代码(有效但不符合我的接受标准):

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "3.5.0"
    }
  }
}

provider "google" {

  credentials = "some_credentials"

  project = "some_project"
  region  = "some_region"
  zone    = "some_zone"
}

resource "google_monitoring_notification_channel" "basic" {
  display_name = "Test name"
  type         = "email"
  labels = {
    email_address = "some@email.com"
  }
}

resource "google_monitoring_alert_policy" "cloud_composer_job_fail_monitor" {
  combiner              = "OR"
  display_name          = "Fails testing on cloud composer tasks"
  notification_channels = [google_monitoring_notification_channel.basic.id]
  conditions {
    display_name = "Failures count"
    condition_threshold {
      filter          = "resource.type=\"cloud_composer_workflow\" AND metric.type=\"composer.googleapis.com/workflow/task/run_count\" AND resource.label.\"project_id\"=\"some_project\" AND metric.label.\"state\"=\"failed\" AND resource.label.\"location\"=\"some_region\""
      duration        = "60s"
      comparison      = "COMPARISON_GT"
      threshold_value = 0
      aggregations {
        alignment_period   = "3600s"
        per_series_aligner = "ALIGN_COUNT"
        
      }
    }
    
  }
  documentation  {
        content = "Please checkout current incident"
    }
}

问题:默认情况下,在触发或解决提醒政策 (google doc) 时会发送通知。

我的问题:我希望每 30 分钟(例如)在 Cloud Composer 作业失败时收到警报通知,直到我或其他人无法解决事件(或者我需要了解原因当作业停止失败时,事件不会自动解决)

有人可以帮忙解决这个问题吗?

感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

事情是在这些字段中进行更改:

  • per_series_aligner
  • 持续时间
  • alignment_period

因此,这些更改将使您可以在失败状态的情况下获得有关云编辑器任务的警报通知,并实际上更快地将触发器更改为满足条件:

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "3.5.0"
    }
  }
}

provider "google" {

  credentials = "some_credentials"

  project = "some_project"
  region  = "some_region"
  zone    = "some_zone"
}

resource "google_monitoring_notification_channel" "basic" {
  display_name = "Test name"
  type         = "email"
  labels = {
    email_address = "some@email.com"
  }
}

resource "google_monitoring_alert_policy" "cloud_composer_job_fail_monitor" {
  combiner              = "OR"
  display_name          = "Fails testing on cloud composer tasks"
  notification_channels = [google_monitoring_notification_channel.basic.id]
  conditions {
    display_name = "Failures count"
    condition_threshold {
      filter          = "resource.type=\"cloud_composer_workflow\" AND metric.type=\"composer.googleapis.com/workflow/task/run_count\" AND resource.label.\"project_id\"=\"some_project\" AND metric.label.\"state\"=\"failed\" AND resource.label.\"location\"=\"some_region\""
      duration        = "0s"
      comparison      = "COMPARISON_GT"
      threshold_value = 0
      aggregations {
        alignment_period   = "60s"
        per_series_aligner = "ALIGN_DELTA"
        
      }
    }
    
  }
  documentation  {
        content = "Please checkout current incident"
    }
}

没有关于此类设置的连续通知(例如每 30 分钟一次)的信息。

只有在满足您的条件时才会通知您。

相关问题