Bluemix Spark:下载stderr和stdout时spark-submit失败了吗?

时间:2016-04-19 07:11:48

标签: apache-spark ibm-cloud

我在IBM Bluemix中使用Spark服务。我正在尝试使用spark-submit.sh脚本启动一段Java代码来执行一些Spark进程。

我的命令行是:

./spark-submit.sh --vcap ./VCAP.json --deploy-mode cluster --class org.apache.spark.examples.JavaSparkPi \
--master https://169.54.219.20 ~/Documents/Spark/JavaSparkPi.jar 

我正在使用最新的spark-submit.sh版本(截至昨天)。

./spark-submit.sh --version
spark-submit.sh  VERSION : '1.0.0.0.20160330.1'

这几周前工作正常(使用旧的spark-submit.sh),但现在我收到以下错误:

Downloading stdout_1461024849908170118
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0    89    0    89    0     0     56      0 --:--:--  0:00:01 --:--:--   108
Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stdout to     stdout_1461024849908170118

Downloading stderr_1461024849908170118
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0    89    0    89    0     0     50      0 --:--:--  0:00:01 --:--:--   108
Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stderr to     stderr_1461024849908170118

关于我做错什么的任何想法?提前谢谢。

编辑:

通过查看日志文件,我发现问题不是在下载stdout和stderr时,而是在提交作业时。

{
  "action" : "SubmissionStatusResponse",
  "driverState" : "FAILED",
  "message" : "Exception from the cluster:
org.apache.spark.SparkException: Failed to change container CWD
org.apache.spark.deploy.master.EgoApplicationManager.egoDriverExitCallback(EgoApplicationManager.scala:168)
org.apache.spark.deploy.master.MasterScheduleDelegatorDriver.onContainerExit(MasterScheduleDelegatorDriver.scala:144)
org.apache.spark.deploy.master.resourcemanager.ResourceManagerEGOSlot.handleActivityFinish(ResourceManagerEGOSlot.scala:555)
org.apache.spark.deploy.master.resourcemanager.ResourceManagerEGOSlot.callbackContainerStateChg(ResourceManagerEGOSlot.scala:525)
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$$anonfun$callbackContainerStateChg$1.apply(ResourceManager.scala:158)
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$$anonfun$callbackContainerStateChg$1.apply(ResourceManager.scala:157)
scala.Option.foreach(Option.scala:236)
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$.callbackContainerStateChg(ResourceManager.scala:157)",
  "serverSparkVersion" : "1.6.0",
  "submissionId" : "driver-20160420043532-0027-6e579720-2c9d-428f-b2c7-6613f4845146",
  "success" : true
}
driverStatus is FAILED

EDIT2:

最后,通过创建Spark服务的全新实例,解决了提交作业时遇到的问题。我的工作现在执行并在几秒钟后完成。

但是在尝试下载stdout和stderr文件时仍然收到错误。

Downloading stdout_1461156506108609180
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed
  0    90    0    90    0     0     61      0 --:--:--  0:00:01 --:--:--   125
Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stdout to stdout_1461156506108609180

Downloading stderr_1461156506108609180
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Dload  Upload   Total   Spent    Left  Speed
  0    90    0    90    0     0     56      0 --:--:--  0:00:01 --:--:--   109
Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stderr to stderr_1461156506108609180

有什么想法吗?

1 个答案:

答案 0 :(得分:0)

我发现旧的spark-submit试图从workdir文件夹中检索stdout和stderr ......

Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stdout to     stdout_1461024849908170118

虽然新的(昨天下载)spark-submit试图从workdir2文件夹下载...

Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stdout to stdout_1461156506108609180

正在使用的文件夹由变量SS_SPARK_WORK_DIR修复,该变量在spark-submit

中初始化
if [ -z ${SS_SPARK_WORK_DIR} ];  then SS_SPARK_WORK_DIR="workdir2"; fi # Work directory on spark cluster

我将值更改为workdir,现在一切正常。我从Bluemix网站下载了一个新的(今天)spark-submit,这个问题已得到解决。现在该变量指向workdir。

所以,如果有任何失败,请确保你从Bluemix获得了最后一个spark-submit脚本。