无法将本地jar提交到spark集群:java.nio.file.NoSuchFileException

时间:2017-06-20 20:49:13

标签: scala apache-spark kubernetes spark-submit

~/spark/spark-2.1.1-bin-hadoop2.7/bin$ ./spark-submit --master spark://192.168.42.80:32141 --deploy-mode cluster file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar

Running Spark using the REST application submission protocol.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/06/20 16:41:30 INFO RestSubmissionClient: Submitting a request to launch an application in spark://192.168.42.80:32141.
17/06/20 16:41:31 INFO RestSubmissionClient: Submission successfully created as driver-20170620204130-0005. Polling submission state...
17/06/20 16:41:31 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20170620204130-0005 in spark://192.168.42.80:32141.
17/06/20 16:41:31 INFO RestSubmissionClient: State of driver driver-20170620204130-0005 is now ERROR.
17/06/20 16:41:31 INFO RestSubmissionClient: Driver is running on worker worker-20170620203037-172.17.0.5-45429 at 172.17.0.5:45429.
17/06/20 16:41:31 ERROR RestSubmissionClient: Exception from the cluster:
java.nio.file.NoSuchFileException: /home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar
    sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
    sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
    java.nio.file.Files.copy(Files.java:1274)
    org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:608)
    org.apache.spark.util.Utils$.copyFile(Utils.scala:579)
    org.apache.spark.util.Utils$.doFetchFile(Utils.scala:664)
    org.apache.spark.util.Utils$.fetchFile(Utils.scala:463)
    org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:154)
    org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:172)
    org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:91)
17/06/20 16:41:31 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20170620204130-0005",
  "serverSparkVersion" : "2.1.1",
  "submissionId" : "driver-20170620204130-0005",
  "success" : true
}

从spark-worker登录:

 2017-06-20T20:41:30.807403232Z 17/06/20 20:41:30 INFO Worker: Asked to launch driver driver-20170620204130-0005
2017-06-20T20:41:30.817248508Z 17/06/20 20:41:30 INFO DriverRunner: Copying user jar file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar to /opt/spark/work/driver-20170620204130-0005/myproj-assembly-0.1.0.jar
2017-06-20T20:41:30.883645747Z 17/06/20 20:41:30 INFO Utils: Copying /home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar to /opt/spark/work/driver-20170620204130-0005/myproj-assembly-0.1.0.jar
2017-06-20T20:41:30.885217508Z 17/06/20 20:41:30 INFO DriverRunner: Killing driver process!
2017-06-20T20:41:30.885694618Z 17/06/20 20:41:30 WARN Worker: Driver driver-20170620204130-0005 failed with unrecoverable exception: java.nio.file.NoSuchFileException: home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar 

知道为什么吗?感谢

更新

以下命令是否正确?

./spark-submit --master spark://192.168.42.80:32141 --deploy-mode cluster file:///home/me/workspace/myproj/target/scala-2.11/myproj-assembly-0.1.0.jar

更新

我想我更了解火花以及我遇到这个问题的原因spark-submit error: ClassNotFoundException。关键点在于虽然这里使用了REST这个词REST URL: spark://127.0.1.1:6066 (cluster mode),但是在提交后,应用程序jar将不会上传到集群,这与我的理解不同。所以,spark集群找不到应用程序jar,也无法加载主类。

我将尝试找到如何设置spark群集并使用群集模式提交应用程序。不知道客户端模式是否会为流媒体作业使用更多资源。

2 个答案:

答案 0 :(得分:0)

  

Blockquote   更新

     

我想我对火花有更多的了解,为什么我会遇到这个问题以及> spark-submit错误:ClassNotFoundException。关键点是,尽管此处使用了> REST一词,但REST URL:spark://127.0.1.1:6066(集群模式),但提交后应用程序> jar将不会上传到集群,这与>我的理解不同。因此,spark集群找不到应用程序jar,并且>无法加载主类。

这就是为什么您必须在主节点中找到jar文件或在提交火花之前将其放入hdfs中的原因。

这是怎么做的: 1.)使用ubuntu命令将文件传输到主节点

$ scp <file> <username>@<IP address or hostname>:<Destination>

例如:

$ scp mytext.txt tom@128.140.133.124:~/

2。)将文件传输到HDFS:

$ hdfs dfs -put mytext.txt

希望我能为您提供帮助。

答案 1 :(得分:-1)

独立模式集群希望将jar文件传递给hdfs,因为驱动程序位于集群中的任何节点上。

hdfs dfs -put xxx.jar /user/
spark-submit --master spark://xxx:7077 \
--deploy-mode cluster \
--supervise \
--driver-memory 512m \
--total-executor-cores 1 \
--executor-memory 512m \
--executor-cores 1 \
--class com.xiyou.bi.streaming.game.common.DmMoGameviewOnlineLogic \
hdfs://xxx:8020/user/hutao/xxx.jar