纱线上的pyspark获得管道

时间:2017-05-25 07:49:34

标签: pyspark yarn cloudera-cdh

我正在尝试用CDH在纱线上运行pyspark,在这个pyspark主程序中,有一个类似rdd.pipe(“XXX.sh”)的声明,每次运行时,都会弹出一个权限被拒绝的错误我该怎么办才能解决这个错误?感谢。

错误日志信息就像打击一样:

文件“/data/yarn/nm/usercache/work/appcache/application_1495632173402_0079/container_1495632173402_0079_01_000001/pyspark.zip/pyspark/rdd.py”,第2346行,在pipeline_func中   在pipe_func中输入文件“/data/yarn/nm/usercache/work/appcache/application_1495632173402_0079/container_1495632173402_0079_01_000001/pyspark.zip/pyspark/rdd.py”,第2346行   在pipe_func中输入文件“/data/yarn/nm/usercache/work/appcache/application_1495632173402_0079/container_1495632173402_0079_01_000001/pyspark.zip/pyspark/rdd.py”,第2346行   文件“/data/yarn/nm/usercache/work/appcache/application_1495632173402_0079/container_1495632173402_0079_01_000001/pyspark.zip/pyspark/rdd.py”,第317行,在func中   文件“/data/yarn/nm/usercache/work/appcache/application_1495632173402_0079/container_1495632173402_0079_01_000001/pyspark.zip/pyspark/rdd.py”,第715行,in func   文件“/usr/lib64/python2.6/subprocess.py”,第642行, init     errread,errwrite)   文件“/usr/lib64/python2.6/subprocess.py”,第1234行,在_execute_child中     提出child_exception OSError:[Errno 13]权限被拒绝

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38

1 个答案:

答案 0 :(得分:0)

最后我通过以下方式解决了这个问题 chmod 777 -R / bashPath

问题是权限被拒绝。所以首先我想主人或奴隶可能没有执行bash的权限。但是在我执行命令&#34; chmod + x XXX.sh&#34;并提交任务,同样的错误就在那里。然后我想如果它也需要可读权限。所以我试过这个。这确实有效