我在Amazon linux 2实例上安装了prometheus,这是我在用户数据中使用的配置:
cat << EOF > /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Restart=on-failure
#Change this line if you download the
#Prometheus on different path user
ExecStart=/home/prometheus/prometheus/prometheus --config.file=/home/prometheus/prometheus/prometheus.yml --storage.tsdb.path=/app/prometheus/data
[Install]
WantedBy=multi-user.target
EOF
cat << EOF > /home/prometheus/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global evaluation_interval.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label job=<job_name> to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9100']
- job_name: 'grafana'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
# mettre ALB grafana
- targets: ['${grafana_dns}']
- job_name: 'sqs_exporter'
scrape_interval: 30s
scrape_timeout: 30s
static_configs:
- targets: ['localhost:9434']
- job_name: 'cloudwatch_exporter'
scrape_interval: 5m
scrape_timeout: 60s
static_configs:
- targets: ['localhost:9106']
- job_name: '_metrics'
metric_relabel_configs:
relabel_configs:
- source_labels:
- __meta_ec2_platform
action: keep
regex: .*windows.*
- action: labelmap
regex: __meta_ec2_tag_(.*)
replacement: \$1
ec2_sd_configs:
- region: eu-west-1
port: 9543
- job_name: 'cadvisor'
static_configs:
- targets: ['localhost:8080']
- job_name: 'elasticbeanstalk_exporter'
static_configs:
- targets: ['localhost:9552']
EOF
systemctl daemon-reload
systemctl enable prometheus
systemctl start prometheus
当我检查prometheus是否正在运行时,我得到以下信息:
[ec2-user@ip-10-193-192-49 ~]$ sudo systemctl status prometheus
● prometheus.service - Prometheus Server
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Mon 2019-12-02 11:12:33 UTC; 4s ago
Docs: https://prometheus.io/docs/introduction/overview/
Process: 22507 ExecStart=/home/prometheus/prometheus/prometheus --config.file=/home/prometheus/prometheus/prometheus.yml --storage.tsdb.path=/app/prometheus/data (code=exited, status=2)
Main PID: 22507 (code=exited, status=2)
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: prometheus.service failed.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: prometheus.service holdoff time over, scheduling restart.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: start request repeated too quickly for prometheus.service
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: Failed to start Prometheus Server.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: prometheus.service failed.
[ec2-user@ip-10-193-192-49 ~]$
我安装了Prometheus版本2.14.0。有什么帮助吗?
我评论了文件Restart=on-failure
中的行/etc/systemd/system/prometheus.service
,然后:
systemctl daemon-reload
systemctl status prometheus
我明白了:
Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: start request repeated too quickly for prometheus.service
Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: Failed to start Prometheus Server.
Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: prometheus.service failed.
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: Started Prometheus Server.
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: Starting Prometheus Server...
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.686Z caller=main.go:296 msg="no time or size retention was set so
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:332 msg="Starting Prometheus" version="(versio
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:333 build_context="(go=go1.13.4, user=root@df2
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:334 host_details="(Linux 4.14.77-81.59.amzn2.x
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:335 fd_limits="(soft=1024, hard=4096)"
Dec 02 12:58:03 ip-10-193-192-58.service.app lor prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:336 vm_limits="(soft=unlimited, hard=unlimited
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=error ts=2019-12-02T12:58:03.692Z caller=query_logger.go:85 component=activeQueryTracker msg="
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: prometheus.service failed.
答案 0 :(得分:1)
我遇到了同样的问题,问题在于/ data / prometheus的权限应设置为prometheus用户和组。
因此解决方案是: 须藤chown -R prometheus:prometheus / data / prometheus /
实际上,您的路径是/ app / prometheus / data
答案 1 :(得分:0)
有同样的错误,我的是由于缩进错误,请检查您的 Prometheus.yml 缩进。
也适用于远程机器 http:// 在目标字段中不支持 IP 地址之前。
总是从 venilla/基本配置开始。