SPark RDD Python RDD不可迭代

时间:2019-08-20 08:42:09

标签: rdd pipeline iterable

我一直在尝试从spark Rdd元素中提取密钥,该元素是键值对的字典,但是获取错误管道RDD并不是可迭代的。

简而言之,我已经加载了一个“数据集”,并且想要按照文档字符串中的说明执行,请提供任何帮助。谢谢

def get_all_attributes(dataset):

    """
    Each element is a dictionary of attributes and their values for a post.
    Can you find the set of all attributes used throughout the RDD?
    The function dictionary.keys() gives you the list of attributes of a dictionary.
    :param dataset: dataset loaded in Spark context
    :type dataset: a Spark RDD
    :return: all unique attributes collected in a list
    """

attributes=dataset.map(lambda x:x.keys())

attributes=dataset.map(lambda x:x.keys())

0 个答案:

没有答案