我要提交一个包含zip文件的EMR作业,并且zip文件包含主文件,例如main.py
zip文件位于AWS S3文件夹中。
提交作业时如何使用main.py。
spark-submit --py-files s3://test/spark_test/Test.zip --files s3://test/spark_test/Test.zip/spark_main.py
获取:
Exception in thread "main" java.io.FileNotFoundException: File s3://test/spark_test/Test.zip/spark_main.py does not exist.
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.listStatus(S3NativeFileSystem.java:990)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.listStatus(S3NativeFileSystem.java:917)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.listStatus(EmrFileSystem.java:373)
at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:755)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723)
at org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:136)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:367)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:367)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:366)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)