我在IBM Bluemix中使用Spark服务。我正在尝试使用spark-submit.sh脚本启动一段Java代码来执行一些Spark进程。
我的命令行是:
./spark-submit.sh --vcap ./VCAP.json --deploy-mode cluster --class org.apache.spark.examples.JavaSparkPi \
--master https://169.54.219.20 ~/Documents/Spark/JavaSparkPi.jar
我正在使用最新的spark-submit.sh版本(截至昨天)。
./spark-submit.sh --version
spark-submit.sh VERSION : '1.0.0.0.20160330.1'
这几周前工作正常(使用旧的spark-submit.sh),但现在我收到以下错误:
Downloading stdout_1461024849908170118
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 89 0 89 0 0 56 0 --:--:-- 0:00:01 --:--:-- 108
Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stdout to stdout_1461024849908170118
Downloading stderr_1461024849908170118
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 89 0 89 0 0 50 0 --:--:-- 0:00:01 --:--:-- 108
Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stderr to stderr_1461024849908170118
关于我做错什么的任何想法?提前谢谢。
编辑:
通过查看日志文件,我发现问题不是在下载stdout和stderr时,而是在提交作业时。
{
"action" : "SubmissionStatusResponse",
"driverState" : "FAILED",
"message" : "Exception from the cluster:
org.apache.spark.SparkException: Failed to change container CWD
org.apache.spark.deploy.master.EgoApplicationManager.egoDriverExitCallback(EgoApplicationManager.scala:168)
org.apache.spark.deploy.master.MasterScheduleDelegatorDriver.onContainerExit(MasterScheduleDelegatorDriver.scala:144)
org.apache.spark.deploy.master.resourcemanager.ResourceManagerEGOSlot.handleActivityFinish(ResourceManagerEGOSlot.scala:555)
org.apache.spark.deploy.master.resourcemanager.ResourceManagerEGOSlot.callbackContainerStateChg(ResourceManagerEGOSlot.scala:525)
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$$anonfun$callbackContainerStateChg$1.apply(ResourceManager.scala:158)
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$$anonfun$callbackContainerStateChg$1.apply(ResourceManager.scala:157)
scala.Option.foreach(Option.scala:236)
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$.callbackContainerStateChg(ResourceManager.scala:157)",
"serverSparkVersion" : "1.6.0",
"submissionId" : "driver-20160420043532-0027-6e579720-2c9d-428f-b2c7-6613f4845146",
"success" : true
}
driverStatus is FAILED
EDIT2:
最后,通过创建Spark服务的全新实例,解决了提交作业时遇到的问题。我的工作现在执行并在几秒钟后完成。
但是在尝试下载stdout和stderr文件时仍然收到错误。
Downloading stdout_1461156506108609180
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 90 0 90 0 0 61 0 --:--:-- 0:00:01 --:--:-- 125
Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stdout to stdout_1461156506108609180
Downloading stderr_1461156506108609180
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 90 0 90 0 0 56 0 --:--:-- 0:00:01 --:--:-- 109
Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stderr to stderr_1461156506108609180
有什么想法吗?
答案 0 :(得分:0)
我发现旧的spark-submit试图从workdir文件夹中检索stdout和stderr ......
Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stdout to stdout_1461024849908170118
虽然新的(昨天下载)spark-submit试图从workdir2文件夹下载...
Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stdout to stdout_1461156506108609180
正在使用的文件夹由变量SS_SPARK_WORK_DIR修复,该变量在spark-submit
中初始化if [ -z ${SS_SPARK_WORK_DIR} ]; then SS_SPARK_WORK_DIR="workdir2"; fi # Work directory on spark cluster
我将值更改为workdir,现在一切正常。我从Bluemix网站下载了一个新的(今天)spark-submit,这个问题已得到解决。现在该变量指向workdir。
所以,如果有任何失败,请确保你从Bluemix获得了最后一个spark-submit脚本。