我在python中取得了胜利
{'609232972': 4, '975151075': 4, '14247572': 4, '2987788788': 4, '3064695250': 2}
如何直接在rdd中加载它,而不会丢失键值对?
当我像这样加载
usr_group = sc.parallelize(partition)
print(usr_group.take(5))
我只是分解键值对并给出
['609232972', '975151075', '14247572', '2987788788', '3064695250']
我期待RDD进入
{'609232972': 4, '975151075': 4, '14247572': 4, '2987788788': 4, '3064695250': 2}
这样我就可以一起处理键值对了
答案 0 :(得分:1)
不确定您希望rdd具有哪一行,但这里有三个选项:
my_dict = {'609232972': 4, '975151075': 4, '14247572': 4, '2987788788': 4, '3064695250': 2}
rdd1 = sc.parallelize([my_dict])
rdd2 = sc.parallelize(list(my_dict.iteritems()))
rdd3 = rdd2.map(lambda x: (dict([x])))
print rdd1.collect()
print rdd2.take(4)
print rdd3.take(4)
[{' 2987788788':4,' 975151075':4,' 3064695250':2,' 14247572':4, ' 609232972':4}]
[(' 2987788788',4),(' 975151075',4),(' 3064695250',2),(' 14247572&# 39 ;, 4)]
[{' 2987788788':4},{' 975151075':4},{' 3064695250':2},{' 14247572&# 39 ;: 4}]