Marathon应用程序部署陷入等待状态

时间:2017-09-04 09:36:20

标签: apache-zookeeper mesos marathon mesosphere

我有一个3节点设置运行Marathon,mesos-master,mesos-slave和Zookeeper并启用了HA配置,然后使用mesos-execute测试了一个简单的hello app的部署,并且它按预期工作。

现在一切都很好,所以我连接到Marathon并部署一个简单的应用程序来测试马拉松:( echo“hello”&gt;&gt;&tmp / output.txt)但应用程序被吸入“等待”状态。< / p>

阻止Marathon使用mesos资源进行部署可能会出现什么问题?

来自mesos-master的日志:

I0904 11:23:27.064332 19769 master.cpp:2813] Received SUBSCRIBE call for framework 'marathon' at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324
I0904 11:23:27.064623 19769 master.cpp:2890] Subscribing framework marathon with checkpointing enabled and capabilities [ PARTITION_AWARE ]
I0904 11:23:27.064669 19769 master.cpp:6272] Updating info for framework cb16118a-2257-4020-a907-63aa6294e11b-0000
I0904 11:23:27.064697 19769 master.cpp:2994] Framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324 failed over
I0904 11:23:27.065032 19770 hierarchical.cpp:342] Activated framework cb16118a-2257-4020-a907-63aa6294e11b-0000
I0904 11:23:27.065465 19770 master.cpp:7305] Sending 3 offers to framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324
I0904 11:23:27.907865 19769 http.cpp:1115] HTTP GET for /files/read?_=1504517007920&jsonp=jQuery17109098185077823333_1504516979864&length=50000&offset=352538&path=%2Fmaster%2Flog from 192.168.40.1:53525 with User-Agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
I0904 11:23:28.916651 19768 http.cpp:1115] HTTP GET for /files/read?_=1504517008930&jsonp=jQuery17109098185077823333_1504516979865&length=50000&offset=353797&path=%2Fmaster%2Flog from 192.168.40.1:53525 with User-Agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
E0904 11:23:30.071293 19775 process.cpp:2450] Failed to shutdown socket with fd 39, address 192.168.40.159:58072: Transport endpoint is not connected
I0904 11:23:30.073277 19768 master.cpp:1430] Framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324 disconnected
I0904 11:23:30.073307 19768 master.cpp:3160] Deactivating framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324
I0904 11:23:30.073485 19768 master.cpp:3137] Disconnecting framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324
I0904 11:23:30.073496 19768 master.cpp:1445] Giving framework cb16118a-2257-4020-a907-63aa6294e11b-0000 (marathon) at scheduler-0340362b-0bb6-4fb8-8501-118d976e2cbd@192.168.40.156:36324 1weeks to failover
I0904 11:23:30.073519 19768 hierarchical.cpp:374] Deactivated framework cb16118a-2257-4020-a907-63aa6294e11b-0000

curl -XGET'http://mesosphere2:8098/v2/queue?pretty'| JQ

{
  "queue": [
    {
      "count": 1,
      "delay": {
        "timeLeftSeconds": 0,
        "overdue": true
      },
      "since": "2017-09-04T13:12:42.024Z",
      "processedOffersSummary": {
        "processedOffersCount": 12,
        "unusedOffersCount": 12,
        "lastUnusedOfferAt": "2017-09-04T13:14:52.554Z",
        "rejectSummaryLastOffers": [
          {
            "reason": "UnfulfilledRole",
            "declined": 3,
            "processed": 3
          },
          {
            "reason": "UnfulfilledConstraint",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "NoCorrespondingReservationFound",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "InsufficientCpus",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "InsufficientMemory",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "InsufficientDisk",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "InsufficientGpus",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "InsufficientPorts",
            "declined": 0,
            "processed": 0
          }
        ],
        "rejectSummaryLaunchAttempt": [
          {
            "reason": "UnfulfilledRole",
            "declined": 12,
            "processed": 12
          },
          {
            "reason": "UnfulfilledConstraint",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "NoCorrespondingReservationFound",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "InsufficientCpus",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "InsufficientMemory",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "InsufficientDisk",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "InsufficientGpus",
            "declined": 0,
            "processed": 0
          },
          {
            "reason": "InsufficientPorts",
            "declined": 0,
            "processed": 0
          }
        ]
      },
      "app": {
        "id": "/test03",
        "acceptedResourceRoles": [
          "slave_public"
        ],
        "backoffFactor": 1.15,
        "backoffSeconds": 1,
        "container": {
          "type": "DOCKER",
          "docker": {
            "forcePullImage": false,
            "image": "laghao/hello-marathon",
            "network": "BRIDGE",
            "parameters": [],
            "portMappings": [
              {
                "containerPort": 80,
                "hostPort": 80,
                "labels": {},
                "protocol": "tcp",
                "servicePort": 10003
              }
            ],
            "privileged": false
          },
          "volumes": []
        },
        "cpus": 0.1,
        "disk": 0,
        "executor": "",
        "instances": 1,
        "labels": {},
        "maxLaunchDelaySeconds": 3600,
        "mem": 64,
        "gpus": 0,
        "portDefinitions": [
          {
            "port": 10003,
            "name": "default",
            "protocol": "tcp"
          }
        ],
        "requirePorts": false,
        "upgradeStrategy": {
          "maximumOverCapacity": 1,
          "minimumHealthCapacity": 1
        },
        "version": "2017-09-04T13:12:41.993Z",
        "versionInfo": {
          "lastScalingAt": "2017-09-04T13:12:41.993Z",
          "lastConfigChangeAt": "2017-09-04T13:12:41.993Z"
        },
        "killSelection": "YOUNGEST_FIRST",
        "unreachableStrategy": {
          "inactiveAfterSeconds": 300,
          "expungeAfterSeconds": 600
        }
      }
    }
  ]
}

1 个答案:

答案 0 :(得分:0)

来自documentation

  

应用程序永远处于“等待”状态   这意味着Marathon不会从Mesos接收“资源优惠”,允许它启动此应用程序的任务。最简单的失败是集群中没有足够的可用资源,或者其他框架都没有所有这些资源。您可以在Mesos UI中查看可用资源。请注意,必须在单个主机上提供所需的资源(例如CPU,内存,磁盘)。

     

如果您没有自己找到解决方案并且您创建了GitHub问题,请将Mesos / state端点的输出附加到错误报告中,以便我们可以检查可用的群集资源。

在您的情况下,应用程序角色要求和代理角色存在问题。您可以从UnfulfilledRole推断出它。

Marathon 1.4引入了有关卡住部署的信息。您可以查询/v2/queue并获取有关拒绝优惠的统计信息。