'细胞'对象没有属性' iteritems'

时间:2014-12-19 15:18:37

标签: apache-spark

我正在通过Spark的python API运行一个简单的例子:

x = sc.parallelize([("a", 1), ("b", 1), ("a", 1)])
def f(x): return x
def add(a, b): return a + str(b)
sorted(x.combineByKey(str, add, add).collect())

本地模式(Spark 1.0和1.1)都没有问题,但在群集模式下会发生错误。一段问题追溯信息如下。在测试RDD函数cogroup()时,它也会显示类似的问题。这是我第一次通过Spark的API(Python)。

你有什么想法吗?

[duplicate 561]
14/12/19 23:04:53 INFO TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.1.4-1.cdh5.1.4.p0.15/lib/spark/python/pyspark/worker.py", line 77, in main
serializer.dump_stream(func(split_index, iterator), outfile)
File "/opt/cloudera/parcels/CDH-5.1.4-1.cdh5.1.4.p0.15/lib/spark/python/pyspark/rdd.py", line 1404, in pipeline_func
return func(split, prev_func(split, iterator))
File "/opt/cloudera/parcels/CDH-5.1.4-1.cdh5.1.4.p0.15/lib/spark/python/pyspark/rdd.py", line 283, in func
def func(s, iterator): return f(iterator)
File "/opt/cloudera/parcels/CDH-5.1.4-1.cdh5.1.4.p0.15/lib/spark/python/pyspark/rdd.py", line 1118, in combineLocally
combiners = {}
AttributeError: 'cell' object has no attribute 'iteritems'

0 个答案:

没有答案