我一直在尝试从spark Rdd元素中提取密钥,该元素是键值对的字典,但是获取错误管道RDD并不是可迭代的。
简而言之,我已经加载了一个“数据集”,并且想要按照文档字符串中的说明执行,请提供任何帮助。谢谢def get_all_attributes(dataset):
"""
Each element is a dictionary of attributes and their values for a post.
Can you find the set of all attributes used throughout the RDD?
The function dictionary.keys() gives you the list of attributes of a dictionary.
:param dataset: dataset loaded in Spark context
:type dataset: a Spark RDD
:return: all unique attributes collected in a list
"""
attributes=dataset.map(lambda x:x.keys())
attributes=dataset.map(lambda x:x.keys())