如何使用具有main.py的zip提交Spark作业

时间:2019-06-14 13:39:07

标签: python amazon-web-services apache-spark amazon-emr spark-submit

我要提交一个包含zip文件的EMR作业,并且zip文件包含主文件,例如main.py

zip文件位于AWS S3文件夹中。

提交作业时如何使用main.py。

spark-submit --py-files s3://test/spark_test/Test.zip --files s3://test/spark_test/Test.zip/spark_main.py

获取:

Exception in thread "main" java.io.FileNotFoundException: File s3://test/spark_test/Test.zip/spark_main.py does not exist.
        at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.listStatus(S3NativeFileSystem.java:990)
        at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.listStatus(S3NativeFileSystem.java:917)
        at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.listStatus(EmrFileSystem.java:373)
        at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:755)
        at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723)
        at org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:136)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:367)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:367)
        at scala.Option.map(Option.scala:146)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:366)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

0 个答案:

没有答案