SparkContext addFile导致FileNotFoundException

时间:2016-08-15 13:34:40

标签: apache-spark amazon-s3 emr amazon-emr

我正在尝试使用sparkContext.addFile方法将大文件传递给每个执行程序。

这个大文件的来源是 Amazon S3 (注意:如果源代码是HDFS,一切正常)

val context = stream.context.sparkContext
context.addFile("s3n://bucket-name/file-path")
...
SparkFiles.get(file-name)

这是导致错误的原因:

java.io.FileNotFoundException: File s3n://bucket-name/file-path does not exist.
    at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.listStatus(S3NativeFileSystem.java:945)
    at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.listStatus(S3NativeFileSystem.java:887)
    at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.listStatus(EmrFileSystem.java:343)
    at org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:596)
    at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:566)
    at org.apache.spark.util.Utils$.fetchFile(Utils.scala:356)
    at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$3.apply(Executor.scala:393)
    at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$3.apply(Executor.scala:390)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:390)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

但是当我做" fs -ls"时,文件是可访问的

    hadoop fs -ls s3n://bucket-name/file-path 

可能是什么原因?

PS:Spark Version:1.5.2

1 个答案:

答案 0 :(得分:0)

突然间,这是证件问题。 当我将s3 URL更改为

  s3n://accessKey:secretKey@backet-name/path

问题解决了。