我在内存中有一个numpy数组列表,作为spark应用程序中RDD的一部分。我想做的是将每个rdd(即内容数组)另存为s3文件。因此,在s3中,我将为RDD中的每个值创建一个.npy文件。我不想创建任何中间文件,因为这会使应用程序变慢。
我已经查看了该帖子how to write .npy file to s3 directly? 。但是,当我尝试在spark应用程序中从AWS EMR运行此程序时,出现以下错误:-
OSError: [Errno [Errno 13] Permission denied: '/home/.config'] <function subimport at 0x7f87e0167320>: ('cottoncandy',)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
如何解决这个问题?