我在本地模式下运行就好了。当我在YARN模式下运行时,我收到以下错误:
我收到此错误:
File "/hdfs15/yarn/nm/usercache/jvy234/filecache/11/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar/pyspark/worker.py", line 79, in main
serializer.dump_stream(func(split_index, iterator), outfile)
File "/hdfs15/yarn/nm/usercache/jvy234/filecache/11/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar/pyspark/serializers.py", line 196, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
File "/hdfs15/yarn/nm/usercache/jvy234/filecache/11/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar/pyspark/serializers.py", line 127, in dump_stream
for obj in iterator:
File "/hdfs15/yarn/nm/usercache/jvy234/filecache/11/spark-assembly-1.1.0-cdh5.2.1-hadoop2.5.0-cdh5.2.1.jar/pyspark/serializers.py", line 185, in _batched
for item in iterator:
File "/home/jvy234/globalHawk.py", line 84, in <lambda>
TypeError: 'bool' object is not callable
org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1319)
org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
我脚本中的第84行是:
dataSplit = dataFile.map(lambda line: line.split(deli))
在本地运行:
spark-submit --master local globalHawk.py -i 20140817_011500_offer_init.dat -s kh_offers_schema4.txt4 -o txt.txt -d "|"
运行纱线 - 客户:
spark-submit --master yarn-client globalHawk.py -i 20140817_011500_offer_init.dat -s kh_offers_schema4.txt4 -o txt.txt -d "|"
答案 0 :(得分:4)
这个问题应该是因为在驱动程序和YARN worker中使用不同版本的Python,可以通过在YARN中使用与驱动程序和worker中默认的相同版本的Python来修复。
您还可以通过以下方式指定要在YARN中使用的python版本:
PYSPARK_PYTHON=python2.6 bin/spark-submit xxx
(没有YARN群集可用,未经测试)