在普罗米修斯,我如何获得现在开火的警报数量?

时间:2017-07-06 17:08:02

标签: prometheus

查看与prometheus相关的指标列表,我看到了

prometheus_build_info
prometheus_config_last_reload_success_timestamp_seconds
prometheus_config_last_reload_successful
prometheus_engine_queries
prometheus_engine_queries_concurrent_max
prometheus_engine_query_duration_seconds
prometheus_engine_query_duration_seconds_count
prometheus_engine_query_duration_seconds_sum
prometheus_evaluator_duration_seconds
prometheus_evaluator_duration_seconds_count
prometheus_evaluator_duration_seconds_sum
prometheus_evaluator_iterations_missed_total
prometheus_evaluator_iterations_skipped_total
prometheus_evaluator_iterations_total
prometheus_local_storage_checkpoint_duration_seconds_count
prometheus_local_storage_checkpoint_duration_seconds_sum
prometheus_local_storage_checkpoint_last_duration_seconds
prometheus_local_storage_checkpoint_last_size_bytes
prometheus_local_storage_checkpoint_series_chunks_written_count
prometheus_local_storage_checkpoint_series_chunks_written_sum
prometheus_local_storage_checkpointing
prometheus_local_storage_chunk_ops_total
prometheus_local_storage_chunks_to_persist
prometheus_local_storage_fingerprint_mappings_total
prometheus_local_storage_inconsistencies_total
prometheus_local_storage_indexing_batch_duration_seconds
prometheus_local_storage_indexing_batch_duration_seconds_count
prometheus_local_storage_indexing_batch_duration_seconds_sum
prometheus_local_storage_indexing_batch_sizes
prometheus_local_storage_indexing_batch_sizes_count
prometheus_local_storage_indexing_batch_sizes_sum
prometheus_local_storage_indexing_queue_capacity
prometheus_local_storage_indexing_queue_length
prometheus_local_storage_ingested_samples_total
prometheus_local_storage_maintain_series_duration_seconds
prometheus_local_storage_maintain_series_duration_seconds_count
prometheus_local_storage_maintain_series_duration_seconds_sum
prometheus_local_storage_memory_chunkdescs
prometheus_local_storage_memory_chunks
prometheus_local_storage_memory_dirty_series
prometheus_local_storage_memory_series
prometheus_local_storage_non_existent_series_matches_total
prometheus_local_storage_open_head_chunks
prometheus_local_storage_out_of_order_samples_total
prometheus_local_storage_persist_errors_total
prometheus_local_storage_persistence_urgency_score
prometheus_local_storage_queued_chunks_to_persist_total
prometheus_local_storage_rushed_mode
prometheus_local_storage_series_chunks_persisted_bucket
prometheus_local_storage_series_chunks_persisted_count
prometheus_local_storage_series_chunks_persisted_sum
prometheus_local_storage_series_ops_total
prometheus_local_storage_started_dirty
prometheus_local_storage_target_heap_size_bytes
prometheus_notifications_dropped_total
prometheus_notifications_errors_total
prometheus_notifications_latency_seconds
prometheus_notifications_latency_seconds_count
prometheus_notifications_latency_seconds_sum
prometheus_notifications_queue_capacity
prometheus_notifications_queue_length
prometheus_notifications_sent_total
prometheus_rule_evaluation_duration_seconds
prometheus_rule_evaluation_duration_seconds_count
prometheus_rule_evaluation_duration_seconds_sum
prometheus_rule_evaluation_failures_total
prometheus_sd_azure_refresh_duration_seconds
prometheus_sd_azure_refresh_duration_seconds_count
prometheus_sd_azure_refresh_duration_seconds_sum
prometheus_sd_azure_refresh_failures_total
prometheus_sd_consul_rpc_duration_seconds
prometheus_sd_consul_rpc_duration_seconds_count
prometheus_sd_consul_rpc_duration_seconds_sum
prometheus_sd_consul_rpc_failures_total
prometheus_sd_dns_lookup_failures_total
prometheus_sd_dns_lookups_total
prometheus_sd_ec2_refresh_duration_seconds
prometheus_sd_ec2_refresh_duration_seconds_count
prometheus_sd_ec2_refresh_duration_seconds_sum
prometheus_sd_ec2_refresh_failures_total
prometheus_sd_file_read_errors_total
prometheus_sd_file_scan_duration_seconds
prometheus_sd_file_scan_duration_seconds_count
prometheus_sd_file_scan_duration_seconds_sum
prometheus_sd_gce_refresh_duration
prometheus_sd_gce_refresh_duration_count
prometheus_sd_gce_refresh_duration_sum
prometheus_sd_gce_refresh_failures_total
prometheus_sd_kubernetes_events_total
prometheus_sd_marathon_refresh_duration_seconds
prometheus_sd_marathon_refresh_duration_seconds_count
prometheus_sd_marathon_refresh_duration_seconds_sum
prometheus_sd_marathon_refresh_failures_total
prometheus_sd_triton_refresh_duration_seconds
prometheus_sd_triton_refresh_duration_seconds_count
prometheus_sd_triton_refresh_duration_seconds_sum
prometheus_sd_triton_refresh_failures_total
prometheus_target_interval_length_seconds
prometheus_target_interval_length_seconds_count
prometheus_target_interval_length_seconds_sum
prometheus_target_scrape_pool_sync_total
prometheus_target_scrapes_exceeded_sample_limit_total
prometheus_target_skipped_scrapes_total
prometheus_target_sync_length_seconds
prometheus_target_sync_length_seconds_count
prometheus_target_sync_length_seconds_sum
prometheus_treecache_watcher_goroutines
prometheus_treecache_zookeeper_failures_total

他们中的任何人都不会直接给我我想要的号码。

我最接近的是rate(prometheus_notifications_sent_total[1m])

它似乎在1分钟的时间间隔内给出了发送通知的数量 - 这不是我想要的,因为有些通知以不同的间隔触发,而且数据中还有更多的噪音比我喜欢。

prometheus_notifications_sent_total graph

我想在grafana仪表板上显示当前正在发射的prometheus通知的数量。我能用prometheus表达吗?如果是这样,表达式应该是什么样的?

编辑:

通过"解雇"我的意思是,在普罗米修斯的警报仪表板中列出的活动警报数量。

E.g:

prometheus alerts dashboard

如果您打开下拉列表,则会为每个活动警报获取一个条目,并且该状态显示" FIRING"。我认为这就是我得到术语"解雇"。

alert state

2 个答案:

答案 0 :(得分:1)

警报是名为ALERTS的特殊度量。我对Grafana并不熟悉,因此我个人将使用http API来计算当前触发警报的次数,如下所示:

curl -s 'http://prometheus-002:9090/api/v1/query?query=ALERTS{alertstate="firing"}' \
  |grep -o '"__name__":' |wc -l

也许您可以制定一条记录规则来制定一个元度量标准,并告诉Grafana进行度量。

答案 1 :(得分:1)

要立即查看所有处于活动状态的警报:

count(ALERTS{alertstate="firing"})

要查看特定警报THE_NAME_OF_THE_ALERT的编号,请执行以下操作:

count(ALERTS{alertname="THE_NAME_OF_THE_ALERT",alertstate="firing"})

另一种选择,如果您想在触发警报之前查看故障原因(可能是在故障10秒后触发):

count(probe_success == 0)