我使用spark-submit将文件上传到我的工作节点,我想访问此文件。这个文件是二进制文件,我想执行它。我已经知道如何通过scala执行该文件,但我一直在找到#34;文件未找到"例外,我无法找到访问它的方法。
我使用以下命令提交我的工作。
spark-submit --class Main --master yarn --deploy-mode cluster --files las2las myjar.jar
当作业执行时,我注意到它被上传到当前正在运行的应用程序的登台目录,当我尝试运行以下内容时,它没有工作。
val command = "hdfs://url/user/username/.sparkStaging/" + sparkContext.applicationId + "/las2las" !!
这是抛出的异常:
17/10/22 18:15:57 ERROR yarn.ApplicationMaster: User class threw exception: java.io.IOException: Cannot run program "hdfs://url/user/username/.sparkStaging/application_1486393309284_26788/las2las": error=2, No such file or directory
所以,我的问题是,如何访问las2las文件?
答案 0 :(得分:1)
使用SparkFiles
:
val path = org.apache.spark.SparkFiles.get("las2las")
答案 1 :(得分:1)
如何访问las2las文件?
当您转到http://localhost:8088/cluster的YARN用户界面并点击Spark应用程序的应用程序 ID 时,您将被重定向到包含容器日志的页面。点击日志。在 stderr 中,您应找到与以下内容类似的行:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
SPARK_YARN_STAGING_DIR -> file:/Users/jacek/.sparkStaging/application_1508700955259_0002
SPARK_USER -> jacek
SPARK_YARN_MODE -> true
command:
{{JAVA_HOME}}/bin/java \
-server \
-Xmx1024m \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.worker.ui.port=44444' \
'-Dspark.driver.port=55365' \
-Dspark.yarn.app.container.log.dir=<LOG_DIR> \
-XX:OnOutOfMemoryError='kill %p' \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler@192.168.1.6:55365 \
--executor-id \
<executorId> \
--hostname \
<hostname> \
--cores \
1 \
--app-id \
application_1508700955259_0002 \
--user-class-path \
file:$PWD/__app__.jar \
1><LOG_DIR>/stdout \
2><LOG_DIR>/stderr
resources:
__spark_libs__ -> resource { scheme: "file" port: -1 file: "/Users/jacek/.sparkStaging/application_1508700955259_0002/__spark_libs__618005180363157241.zip" } size: 218111116 timestamp: 1508701349000 type: ARCHIVE visibility: PRIVATE
__spark_conf__ -> resource { scheme: "file" port: -1 file: "/Users/jacek/.sparkStaging/application_1508700955259_0002/__spark_conf__.zip" } size: 105328 timestamp: 1508701349000 type: ARCHIVE visibility: PRIVATE
hello.sh -> resource { scheme: "file" port: -1 file: "/Users/jacek/.sparkStaging/application_1508700955259_0002/hello.sh" } size: 33 timestamp: 1508701349000 type: FILE visibility: PRIVATE
===============================================================================
我执行了我的Spark应用程序,如下所示:
YARN_CONF_DIR=/tmp \
./bin/spark-shell --master yarn --deploy-mode client --files hello.sh
所以兴趣点是:
hello.sh -> resource { scheme: "file" port: -1 file: "/Users/jacek/.sparkStaging/application_1508700955259_0002/hello.sh" } size: 33 timestamp: 1508701349000 type: FILE visibility: PRIVATE
你应该找到一个与shell脚本路径类似的行(我的是/Users/jacek/.sparkStaging/application_1508700955259_0002/hello.sh
)。
这个文件是二进制文件,我想执行它。
使用该行,您可以尝试执行它。
import scala.sys.process._
scala> s"/Users/jacek/.sparkStaging/${sc.applicationId}/hello.sh" !!
warning: there was one feature warning; re-run with -feature for details
java.io.IOException: Cannot run program "/Users/jacek/.sparkStaging/application_1508700955259_0003/hello.sh": error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:69)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang(ProcessBuilderImpl.scala:113)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.slurp(ProcessBuilderImpl.scala:129)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang$bang(ProcessBuilderImpl.scala:102)
... 50 elided
Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 54 more
默认情况下不起作用,因为该文件未标记为可执行文件。
$ ls -l /Users/jacek/.sparkStaging/application_1508700955259_0003/hello.sh
-rw-r--r-- 1 jacek staff 33 22 paź 21:51 /Users/jacek/.sparkStaging/application_1508700955259_0003/hello.sh
(我不知道你是否可以通知Spark或YARN使文件可执行)。
让文件可执行。
scala> s"chmod +x /Users/jacek/.sparkStaging/${sc.applicationId}/hello.sh".!!
res2: String = ""
它确实是一个可执行的shell脚本。
$ ls -l /Users/jacek/.sparkStaging/application_1508700955259_0003/hello.sh
-rwxr-xr-x 1 jacek staff 33 22 paź 21:51 /Users/jacek/.sparkStaging/application_1508700955259_0003/hello.sh
让我们执行它。
scala> s"/Users/jacek/.sparkStaging/${sc.applicationId}/hello.sh".!!
+ echo 'Hello world'
res3: String =
"Hello world
"
考虑到以下hello.sh
:
#!/bin/sh -x
echo "Hello world"