无法启动Prometheus Server

时间:2019-12-02 11:17:35

标签: prometheus

我在Amazon linux 2实例上安装了prometheus,这是我在用户数据中使用的配置:

cat << EOF > /etc/systemd/system/prometheus.service 
[Unit] 
Description=Prometheus Server 
Documentation=https://prometheus.io/docs/introduction/overview/ 
Wants=network-online.target
After=network-online.target

[Service] 
User=prometheus 
Restart=on-failure 

#Change this line if you download the  
#Prometheus on different path user 
ExecStart=/home/prometheus/prometheus/prometheus --config.file=/home/prometheus/prometheus/prometheus.yml --storage.tsdb.path=/app/prometheus/data

[Install] 
WantedBy=multi-user.target 
EOF

cat << EOF > /home/prometheus/prometheus/prometheus.yml 
# my global config 
global: 
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. 
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. 
  # scrape_timeout is set to the global default (10s). 

# Alertmanager configuration 
alerting: 
  alertmanagers: 
  - static_configs: 
    - targets: 
      # - alertmanager:9093 

# Load rules once and periodically evaluate them according to the global evaluation_interval. 
rule_files: 
  # - "first_rules.yml" 
  # - "second_rules.yml" 

# A scrape configuration containing exactly one endpoint to scrape: 
# Here it's Prometheus itself. 
scrape_configs: 
  # The job name is added as a label job=<job_name> to any timeseries scraped from this config. 
  - job_name: 'prometheus' 

    # metrics_path defaults to '/metrics' 
    # scheme defaults to 'http'. 

    static_configs: 
    - targets: ['localhost:9090'] 
  - job_name: 'node_prometheus' 

    # metrics_path defaults to '/metrics' 
    # scheme defaults to 'http'. 

    static_configs: 
    - targets: ['localhost:9100'] 
  - job_name: 'grafana' 

    # metrics_path defaults to '/metrics' 
    # scheme defaults to 'http'. 

    static_configs: 
# mettre ALB grafana 
    - targets: ['${grafana_dns}'] 

  - job_name: 'sqs_exporter' 
    scrape_interval: 30s 
    scrape_timeout: 30s 
    static_configs: 
    - targets: ['localhost:9434'] 

  - job_name: 'cloudwatch_exporter' 
    scrape_interval: 5m 
    scrape_timeout: 60s 
    static_configs: 
    - targets: ['localhost:9106'] 

  - job_name: '_metrics' 
    metric_relabel_configs: 
    relabel_configs: 
     - source_labels: 
       - __meta_ec2_platform 
       action: keep 
       regex: .*windows.* 
     - action: labelmap 
       regex: __meta_ec2_tag_(.*) 
       replacement: \$1 
    ec2_sd_configs: 
      - region: eu-west-1 
        port: 9543 

  - job_name: 'cadvisor' 
    static_configs: 
    - targets: ['localhost:8080'] 

  - job_name: 'elasticbeanstalk_exporter' 
    static_configs: 
    - targets: ['localhost:9552'] 

EOF



systemctl daemon-reload 
systemctl enable prometheus
systemctl start prometheus

当我检查prometheus是否正在运行时,我得到以下信息:

[ec2-user@ip-10-193-192-49 ~]$  sudo systemctl status prometheus
● prometheus.service - Prometheus Server
   Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2019-12-02 11:12:33 UTC; 4s ago
     Docs: https://prometheus.io/docs/introduction/overview/
  Process: 22507 ExecStart=/home/prometheus/prometheus/prometheus --config.file=/home/prometheus/prometheus/prometheus.yml --storage.tsdb.path=/app/prometheus/data (code=exited, status=2)
 Main PID: 22507 (code=exited, status=2)

Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: prometheus.service failed.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: prometheus.service holdoff time over, scheduling restart.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: start request repeated too quickly for prometheus.service
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: Failed to start Prometheus Server.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: prometheus.service failed.
[ec2-user@ip-10-193-192-49 ~]$

我安装了Prometheus版本2.14.0。有什么帮助吗?

我评论了文件Restart=on-failure中的行/etc/systemd/system/prometheus.service,然后:

systemctl daemon-reload 
systemctl status prometheus

我明白了:

Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: start request repeated too quickly for prometheus.service
Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: Failed to start Prometheus Server.
Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: prometheus.service failed.
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: Started Prometheus Server.
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: Starting Prometheus Server...
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.686Z caller=main.go:296 msg="no time or size retention was set so
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:332 msg="Starting Prometheus" version="(versio
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:333 build_context="(go=go1.13.4, user=root@df2
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:334 host_details="(Linux 4.14.77-81.59.amzn2.x
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:335 fd_limits="(soft=1024, hard=4096)"
Dec 02 12:58:03 ip-10-193-192-58.service.app lor prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:336 vm_limits="(soft=unlimited, hard=unlimited
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=error ts=2019-12-02T12:58:03.692Z caller=query_logger.go:85 component=activeQueryTracker msg="
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: prometheus.service failed.

2 个答案:

答案 0 :(得分:1)

我遇到了同样的问题,问题在于/ data / prometheus的权限应设置为prometheus用户和组。

因此解决方案是: 须藤chown -R prometheus:prometheus / data / prometheus /

实际上,您的路径是/ app / prometheus / data

答案 1 :(得分:0)

有同样的错误,我的是由于缩进错误,请检查您的 Prometheus.yml 缩进。

也适用于远程机器 http:// 在目标字段中不支持 IP 地址之前。

总是从 venilla/基本配置开始。