火花管在纱线中不起作用(java.io.IOException:无法运行程序" XXX.py":错误= 13,权限被拒绝)

时间:2016-05-18 07:29:12

标签: apache-spark pipe yarn

我是Spark编程新手。 我试图使用Pipe运算符来嵌入外部程序(一组包含已编译的C程序,bash,Python脚本的文件)。 代码如下所示:

sc.addFile("hdfs://afolder",true)
val infile =  sc.textFile("afile.txt").pipe("afolder/abash.sh").take(3)

abash.sh将调用其他脚本和程序在afile.txt上执行以下操作。

此代码在spark本地模式下运行良好。但是当我尝试以纱线模式(客户端或群集)进行部署时,我失败了以下消息。**

  

WARN scheduler.TaskSetManager:阶段1.0中的丢失任务0.0(TID 4,   数据库):java.io.IOException:无法运行程序" afolder / abash.sh":   错误= 13,权限被拒绝

文件夹的所有子目录和文件都已成功下载到本地spark tmp目录中(在我的例子中,这是/ usr / local / hadoop / spark /) 在第一次失败后,我递归地为hdfs中的文件夹设置了777权限。不过,我也有同样的错误。

任何想法如何解决?感谢。

输出错误:

> 16/05/18 16:04:09 INFO storage.MemoryStore: Block broadcast_2 stored
   > as values in memory (estimated size 212.1 KB, free 212.1 KB) 16/05/18
   > 16:04:09 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as
   > bytes in memory (estimated size 19.5 KB, free 231.6 KB) 16/05/18
   > 16:04:09 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in
   > memory on 210.107.197.201:42777 (size: 19.5 KB, free: 511.1 MB)
   > 16/05/18 16:04:09 INFO spark.SparkContext: Created broadcast 2 from
   > textFile at <console>:27 16/05/18 16:04:09 INFO
   > mapred.FileInputFormat: Total input paths to process : 1 16/05/18
   > 16:04:09 INFO spark.SparkContext: Starting job: take at <console>:27
   > 16/05/18 16:04:09 INFO scheduler.DAGScheduler: Got job 1 (take at
   > <console>:27) with 1 output partitions 16/05/18 16:04:09 INFO
   > scheduler.DAGScheduler: Final stage: ResultStage 1 (take at
   > <console>:27) 16/05/18 16:04:09 INFO scheduler.DAGScheduler: Parents
   > of final stage: List() 16/05/18 16:04:09 INFO scheduler.DAGScheduler:
   > Missing parents: List() 16/05/18 16:04:09 INFO scheduler.DAGScheduler:
   > Submitting ResultStage 1 (PipedRDD[5] at pipe at <console>:27), which
   > has no missing parents 16/05/18 16:04:09 INFO storage.MemoryStore:
   > Block broadcast_3 stored as values in memory (estimated size 3.7 KB,
   > free 235.3 KB) 16/05/18 16:04:09 INFO storage.MemoryStore: Block
   > broadcast_3_piece0 stored as bytes in memory (estimated size 2.2 KB,
   > free 237.5 KB) 16/05/18 16:04:09 INFO storage.BlockManagerInfo: Added
   > broadcast_3_piece0 in memory on 210.107.197.201:42777 (size: 2.2 KB,
   > free: 511.1 MB) 16/05/18 16:04:09 INFO spark.SparkContext: Created
   > broadcast 3 from broadcast at DAGScheduler.scala:1006 16/05/18
   > 16:04:09 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from
   > ResultStage 1 (PipedRDD[5] at pipe at <console>:27) 16/05/18 16:04:09
   > INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks 16/05/18
   > 16:04:09 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0
   > (TID 4, database, partition 0,NODE_LOCAL, 2603 bytes) 16/05/18
   > 16:04:11 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in
   > memory on database:51757 (size: 2.2 KB, free: 511.1 MB) 16/05/18
   > 16:04:11 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0
   > (TID 4, database): java.io.IOException: Cannot run program
   > "afolder/abash.sh": error=13, Permission denied
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
   >         at org.apache.spark.rdd.PipedRDD.compute(PipedRDD.scala:119)
   >         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
   >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
   >         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
   >         at org.apache.spark.scheduler.Task.run(Task.scala:89)
   >         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   >         at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=13, Permission denied
   >         at java.lang.UNIXProcess.forkAndExec(Native Method)
   >         at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
   >         at java.lang.ProcessImpl.start(ProcessImpl.java:134)
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   >         ... 9 more
   > 
   > 16/05/18 16:04:11 INFO scheduler.TaskSetManager: Starting task 0.1 in
   > stage 1.0 (TID 5, database, partition 0,NODE_LOCAL, 2603 bytes)
   > 16/05/18 16:04:12 INFO storage.BlockManagerInfo: Added
   > broadcast_3_piece0 in memory on database:52395 (size: 2.2 KB, free:
   > 511.1 MB) 16/05/18 16:04:12 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 1.0 (TID 5) on executor database:    java.io.IOException (Cannot run program "afolder/abash.sh": error=13,    Permission denied)
   > [duplicate 1] 16/05/18 16:04:12 INFO scheduler.TaskSetManager:
   > Starting task 0.2 in stage 1.0 (TID 6, database, partition
   > 0,NODE_LOCAL, 2603 bytes) 16/05/18 16:04:12 INFO
   > scheduler.TaskSetManager: Lost task 0.2 in stage 1.0 (TID 6) on
   > executor database: java.io.IOException (Cannot run program
   > "afolder/abash.sh": error=13, Permission denied) [duplicate 2]
   > 16/05/18 16:04:12 INFO scheduler.TaskSetManager: Starting task 0.3 in
   > stage 1.0 (TID 7, database, partition 0,NODE_LOCAL, 2603 bytes)
   > 16/05/18 16:04:12 INFO scheduler.TaskSetManager: Lost task 0.3 in
   > stage 1.0 (TID 7) on executor database: java.io.IOException (Cannot
   > run program "afolder/abash.sh": error=13, Permission denied)
   > [duplicate 3] 16/05/18 16:04:12 ERROR scheduler.TaskSetManager: Task 0
   > in stage 1.0 failed 4 times; aborting job 16/05/18 16:04:12 INFO
   > cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all
   > completed, from pool 16/05/18 16:04:12 INFO cluster.YarnScheduler:
   > Cancelling stage 1 16/05/18 16:04:12 INFO scheduler.DAGScheduler:
   > ResultStage 1 (take at <console>:27) failed in 2.955 s 16/05/18
   > 16:04:12 INFO scheduler.DAGScheduler: Job 1 failed: take at
   > <console>:27, took 2.963885 s org.apache.spark.SparkException: Job
   > aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most
   > recent failure: Lost task 0.3 in stage 1.0 (TID 7, database):
   > java.io.IOException: Cannot run program "afolder/abash.sh": error=13,
   > Permission denied
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
   >         at org.apache.spark.rdd.PipedRDD.compute(PipedRDD.scala:119)
   >         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
   >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
   >         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
   >         at org.apache.spark.scheduler.Task.run(Task.scala:89)
   >         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   >         at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=13, Permission denied
   >         at java.lang.UNIXProcess.forkAndExec(Native Method)
   >         at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
   >         at java.lang.ProcessImpl.start(ProcessImpl.java:134)
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   >         ... 9 more
   > 
   > Driver stacktrace:
   >         at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
   >         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
   >         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
   >         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   >         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   >         at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
   >         at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
   >         at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
   >         at scala.Option.foreach(Option.scala:236)
   >         at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
   >         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
   >         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
   >         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
   >         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
   >         at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
   >         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
   >         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
   >         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
   >         at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1328)
   >         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
   >         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
   >         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
   >         at org.apache.spark.rdd.RDD.take(RDD.scala:1302)
   >         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
   >         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
   >         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
   >         at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
   >         at $iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
   >         at $iwC$$iwC$$iwC.<init>(<console>:40)
   >         at $iwC$$iwC.<init>(<console>:42)
   >         at $iwC.<init>(<console>:44)
   >         at <init>(<console>:46)
   >         at .<init>(<console>:50)
   >         at .<clinit>(<console>)
   >         at .<init>(<console>:7)
   >         at .<clinit>(<console>)
   >         at $print(<console>)
   >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >         at java.lang.reflect.Method.invoke(Method.java:498)
   >         at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
   >         at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
   >         at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
   >         at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
   >         at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
   >         at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
   >         at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
   >         at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
   >         at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
   >         at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
   >         at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
   >         at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
   >         at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
   >         at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
   >         at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
   >         at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
   >         at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
   >         at org.apache.spark.repl.Main$.main(Main.scala:31)
   >         at org.apache.spark.repl.Main.main(Main.scala)
   >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >         at java.lang.reflect.Method.invoke(Method.java:498)
   >         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
   >         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
   >         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
   >         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
   >         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused    by: java.io.IOException: Cannot run program "afolder/abash.sh":
   > error=13, Permission denied
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
   >         at org.apache.spark.rdd.PipedRDD.compute(PipedRDD.scala:119)
   >         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
   >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
   >         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
   >         at org.apache.spark.scheduler.Task.run(Task.scala:89)
   >         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   >         at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=13, Permission denied
   >         at java.lang.UNIXProcess.forkAndExec(Native Method)
   >         at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
   >         at java.lang.ProcessImpl.start(ProcessImpl.java:134)
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   >         ... 9 more

2 个答案:

答案 0 :(得分:2)

尝试chmod +x afolder/abash.sh

答案 1 :(得分:1)

像这样更改代码:pipe(“./ afolder / abash.sh”) 并确保abash.sh是可删除的