So I am running (or trying to) a compiled (fat jar) spark/scala program from the master node of an EMR cluster on aws. I have compiled the jar in my dev environment with all the same dependencies as my prod environment. And I am deploying with the spark-submit script:
SPARK_JAR=./spark/lib/spark-assembly-1.2.1-hadoop2.4.0.jar \
./spark-submit \
--deploy-mode cluster \
--verbose \
--master yarn-cluster \
--class sparkSQLProcessor \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1 \
--num-executors 1 \
/home/hadoop/Spark-SQL-Job.jar args1 args2
The issue that I am running into is that I am getting this config issue: (or I assume it is)
Exception in thread "main" java.io.FileNotFoundException: File file:/home/hadoop/.versions/spark-1.2.1.a/bin/spark/lib/spark-assembly-1.2.1-hadoop2.4.0.jar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:516)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:729)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:506)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:407)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
at org.apache.spark.deploy.yarn.ClientBase$class.copyFileToRemote(ClientBase.scala:102)
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:35)
at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$3.apply(ClientBase.scala:182)
at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$3.apply(ClientBase.scala:176)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.deploy.yarn.ClientBase$class.prepareLocalResources(ClientBase.scala:176)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:35)
at org.apache.spark.deploy.yarn.ClientBase$class.createContainerLaunchContext(ClientBase.scala:308)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:35)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:80)
at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:501)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:35)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
答案 0 :(得分:0)
我一直在EMR上运行spark jobs,但从未遇到过这个错误。您是否使用EMR引导操作来安装spark或使用较新的EMR 4.0版本?
无论哪种方式,您都应该尝试不设置SPARK_JAR环境变量。