Terraform:为多个实例创建CloudWatch警报时出错

时间:2018-11-02 21:26:02

标签: terraform amazon-cloudwatch

我正在两个区域中创建多个ec2实例。我想将CloudWatch警报与状态检查和CPU利用率相关联。

在下面,我提到了cloudwatch和main.tf的目录结构以及代表模块调用的代码。

我遇到2个问题,包括创建Cloudwatch警报的逻辑。

目录结构:

├── main.tf
├── modules
│   ├── alb
│   │   ├── aws_alb.tf
│   │   ├── aws_instance.tf
│   │   ├── bootstrap.sh
│   │   ├── cloudwatch.tf
│   │   ├── main.tf
│   │   ├── output.tf
│   │   ├── security-group.tf
│   │   ├── sns.tf
│   │   └── variables.tf
│   └── route53
│       ├── main.tf
│       └── variables.tf
└── variables.tf

main.tf

module "north-virginia" {
  source          = "./modules/alb"
  region          = "us-east-1"
  az              = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

module "oregon" {
  source          = "./modules/alb"
  region          = "us-west-2"
  az              = ["us-west-2a", "us-west-2b", "us-west-2c"]
}

modules / alb / aws_instance.tf

resource "aws_instance" "web" {
  ami               = "${data.aws_ami.amzn2.id}"
  instance_type     = "${var.instance_type}"
  count             = 3
  availability_zone = "${element(var.az, count.index)}"
  tags {
    Name = "${count.index}"
  }
}

modules / alb / cloudwatch.tf

resource "aws_cloudwatch_metric_alarm" "cpu_utilization" {
  count               = "${length(local.instance_id_var)}"
  alarm_name          = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = "120"
  statistic           = "Average"
  threshold           = "60"
  alarm_description   = "This metric monitors ec2 cpu utilization"

  dimensions {
    InstanceId = "${element(aws_instance.web.*.id, count.index)}"
  }
}

resource "aws_cloudwatch_metric_alarm" "status_check" {
  count               = 3
  alarm_name          = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "2"
  metric_name         = "StatusCheckFailed"
  namespace           = "AWS/EC2"
  period              = "120"
  statistic           = "Average"
  threshold           = "1"
  alarm_description   = "This metric monitors ec2 status check."

  dimensions {
    InstanceId = "${element(aws_instance.web.*.id, count.index)}"
  }
}

预期的行为: 我希望每个实例在每个区域都应附有2个以上的警报。

错误行为: 它在每个实例中创建并附加3个警报。

  • 对于弗吉尼亚北部地区-一个用于CPU,两个用于StatusCheck。
  • 在俄勒冈州-两个用于StatusCheck,一个用于CPU利用率。

每次应用都会创建警报,反之亦然。

如果我等待2分钟以更新警报,或者如果我使用terraform apply -parallelism=1

,我将得到解决的错误信息

错误:

4 error(s) occurred:

* module.north-virginia.aws_cloudwatch_metric_alarm.status_check[0]: 1 error(s) occurred:

* aws_cloudwatch_metric_alarm.status_check.0: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ea6c4502-dede-11e8-9262-c55251d6673a
* module.north-virginia.aws_cloudwatch_metric_alarm.cpu_utilization[1]: 1 error(s) occurred:

* aws_cloudwatch_metric_alarm.cpu_utilization.1: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ea6c6c09-dede-11e8-a13f-bbb86ff53045
* module.oregon.aws_cloudwatch_metric_alarm.status_check[1]: 1 error(s) occurred:

* aws_cloudwatch_metric_alarm.status_check.1: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ed198a56-dede-11e8-b95a-9d366b9f2e85
* module.oregon.aws_cloudwatch_metric_alarm.cpu_utilization[3]: 1 error(s) occurred:

* aws_cloudwatch_metric_alarm.cpu_utilization.3: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ed193c4d-dede-11e8-9c63-21cde1551122

在这里我遗漏的任何想法或任何公约,将不胜感激。

1 个答案:

答案 0 :(得分:1)

首先,我将通过删除/注释 ID Total_Amt date1 date2 Name1 Name2 1 16 11/1/12 6/30/12 BOB JON 1 17 11/1/13 7/12/13 BILL JACK 1 18 11/1/14 11/5/14 JEFF ALAN 来简化您的测试。正确获得 virginia 后,请重新添加。

第二,我将切换模块中的代码以将计数计算为module "oregon"的长度。这应该适用于您拥有的3个资源:aws_instance和2个CloudWatch警报。例如:

var.az

这样,您可以在调用模块的代码中更改可用区的数量,并动态更改创建的实例的数量。

第三,您给CloudWatch警报的count = "${length(var.az)}" 看起来是一样的。尝试区分它们。例如:

name

PS>在两次测试之间,请确保已清除所有可能已创建的资源,以确保您正在运行干净的测试。