我使用以下代码迭代cogrouped元素:
def find_animal(x):
animal_list = x[0]
tag_list = x[1]
return animal_list[0] + "_" + tag_list[0]
spark_tag_rdd = spark_tag_df.rdd.map(lambda r:(r[1], r))
cogroup_rdd = spark_animal_df.rdd.map(lambda r:(r[1], r)).cogroup(spark_tag_rdd)
cogroup_rdd = cogroup_rdd.map(lambda r: find_animal(r[1])).flatMap(lambda r: r)
cogroup_rdd.take(5)
然后我收到以下错误:
File "/usr/local/spark-latest/python/pyspark/rdd.py", line 1306, in takeUpToNumLeft
File "<ipython-input-23-b9a8d1714ae6>", line 4, in <lambda>
File "<ipython-input-22-1d284bec23cf>", line 4, in find_animal
TypeError: 'ResultIterable' object does not support indexing
基本上我需要找到一个可迭代的第一个元素,比方说我们称之为current
。然后过了一会儿,我想在current
之后找到下一个元素。我怎么做到这一点?感谢