Spark:监控集群模式应用程序

时间:2016-08-10 06:52:22

标签: apache-spark

现在我使用spark-submit以群集模式启动应用程序。来自主服务器的响应为json对象提供了一个submissionId,我用它来识别应用程序并在必要时将其终止。但是,我还没有找到一种简单的方法来从主服务器响应或驱动程序ID中检索worker rest url(可能是网络刮掉主网络ui但这很难看)。相反,我必须等到应用程序完成,然后从历史服务器中查找应用程序统计信息。

有没有办法使用driver-id从集群模式部署的应用程序中识别工作者URL(通常在worker-node:4040)?

16/08/12 11:39:47 INFO RestSubmissionClient: Submitting a request to launch an application in spark://192.yyy:6066.
16/08/12 11:39:47 INFO RestSubmissionClient: Submission successfully created as driver-20160812114003-0001. Polling submission state...
16/08/12 11:39:47 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160812114003-0001 in spark://192.yyy:6066.
16/08/12 11:39:47 INFO RestSubmissionClient: State of driver driver-20160812114003-0001 is now RUNNING.
16/08/12 11:39:47 INFO RestSubmissionClient: Driver is running on worker worker-20160812113715-192.xxx-46215 at 192.xxx:46215.
16/08/12 11:39:47 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
    "action" : "CreateSubmissionResponse",
    "message" : "Driver successfully submitted as driver-20160812114003-0001",
    "serverSparkVersion" : "1.6.1",
    "submissionId" : "driver-20160812114003-0001",
    "success" : true
}

编辑:这是DEBUG中log4j控制台输出的典型输出结果

Spark-submit命令:

./apps/spark-2.0.0-bin-hadoop2.7/bin/spark-submit --master mesos://masterurl:7077 
    --verbose --class MainClass --deploy-mode cluster
    ~/path/myjar.jar args

Spark-submit输出:

Using properties file: null
Parsed arguments:
  master                  mesos://masterurl:7077
  deployMode              cluster
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               MyApp
  primaryResource         file:/path/myjar.jar
  name                    MyApp
  childArgs               [args]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file null:



Main class:
org.apache.spark.deploy.rest.RestSubmissionClient
Arguments:
file:/path/myjar.jar
MyApp
args
System properties:
SPARK_SUBMIT -> true
spark.driver.supervise -> false
spark.app.name -> MyApp
spark.jars -> file:/path/myjar.jar
spark.submit.deployMode -> cluster
spark.master -> mesos://masterurl:7077
Classpath elements:



16/08/17 13:26:49 INFO RestSubmissionClient: Submitting a request to launch an application in mesos://masterurl:7077.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Sending POST request to server at http://masterurl:7077/v1/submissions/create:
{
  "action" : "CreateSubmissionRequest",
  "appArgs" : [ args ],
  "appResource" : "file:/path/myjar.jar",
  "clientSparkVersion" : "2.0.0",
  "environmentVariables" : {
    "SPARK_SCALA_VERSION" : "2.10"
  },
  "mainClass" : "SimpleSort",
  "sparkProperties" : {
    "spark.jars" : "file:/path/myjar.jar",
    "spark.driver.supervise" : "false",
    "spark.app.name" : "MyApp",
    "spark.submit.deployMode" : "cluster",
    "spark.master" : "mesos://masterurl:7077"
  }
}
16/08/17 13:26:49 DEBUG RestSubmissionClient: Response from the server:
{
  "action" : "CreateSubmissionResponse",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}
16/08/17 13:26:49 INFO RestSubmissionClient: Submission successfully created as driver-20160817132658-0004. Polling submission state...
16/08/17 13:26:49 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160817132658-0004 in mesos://masterurl:7077.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Sending GET request to server at http://masterurl:7077/v1/submissions/status/driver-20160817132658-0004.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Response from the server:
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "RUNNING",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}
16/08/17 13:26:49 INFO RestSubmissionClient: State of driver driver-20160817132658-0004 is now RUNNING.
16/08/17 13:26:49 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}

2 个答案:

答案 0 :(得分:3)

主服务器的响应是否未提供应用程序ID?

我相信你需要的只是你的应用程序的master-URL和application-id来解决这个问题。获得application-id后,使用master-URL上的端口4040并将预期的端点附加到它。

例如,如果您的应用程序ID是application_1468141556944_1055

获取所有职位列表

http://<master>:4040/api/v1/applications/application_1468141556944_1055/storage/rdd

获取存储的RDD列表

verbose

但是,如果您没有应用程序ID,我可能会从以下开始:

在启动spark job时设置16/08/12 08:50:53 INFO Client: Application report for application_1468141556944_3791 (state: RUNNING) 模式(--verbose)以在控制台上获取应用程序ID。然后,您可以在日志输出中解析application-id。日志输出通常如下所示:

    client token: N/A
    diagnostics: N/A
    ApplicationMaster host: 10.50.0.33
    ApplicationMaster RPC port: 0
    queue: ns_debug
    start time: 1470992969127
    final status: UNDEFINED
    tracking URL: http://<master>:8088/proxy/application_1468141556944_3799/

因此,application-id为 application_1468141556944_3791

您还可以通过日志输出中的跟踪网址找到master-url和application-id,如下所示

log4j.rootCategory=INFO, console

这些消息处于INFO日志级别,因此请确保在log4j.properties文件中设置{{1}},以便您可以看到它们。

答案 1 :(得分:0)

必须抓取火花主网页ui以获取关闭的应用程序ID(在相同的分钟和相同的后缀内,例如20161010025XXX-0005,其中X为通配符),然后在链接标记中查找工作者URL。不漂亮,可靠或安全,但现在它可以工作。如果某人有另一种方法,请稍等一下。