子集一个json对象并从中形成一个(key,value)rdd

时间:2019-06-20 07:01:39

标签: python json apache-spark rdd

假设我有一个JSON对象:

obj= [{"name":"Era", "age":45, "sex":"female", "id":2545}  
     {"name":"Patrick", "age":35, "sex":"male", "id":2546}  
     {"name":"Elina", "age":40, "sex":"female", "id":2547}  
     {"name":"Reg", "age":47, "sex":"male", "id":2548}]   

我想仅通过使用'id'和'name'('id'是RDD中的键)来从此数据中创建一个(key,value)RDD。我尝试了this link中给出的解决方案,但收到以下错误:

  

AttributeError:'str'对象没有属性'get'

在这里解释更多是我的代码-

for key in obj:
    my_dict={}
    my_dict['id']=key.get('id')
    my_dict['name']=key.get('name')
    result.append(my_dict)  

我希望对此部分有所帮助,以便我可以进行第二部分,即从中获得rdd。

3 个答案:

答案 0 :(得分:1)

您可以将其写在一行中,如果您的obj是正确的,则该行必须有效

result = [{'id': item.get('id'), 'name': item.get('name')} for item in obj]

我在自己的环境中测试了您的代码,我得到的唯一问题是,在obj中,您需要在字典之间添加逗号

答案 1 :(得分:1)

obj中修复JSON时对我有用:

In [4]: obj= [{"name":"Era", "age":45, "sex":"female", "id":2545},
   ...:      {"name":"Patrick", "age":35, "sex":"male", "id":2546},
   ...:      {"name":"Elina", "age":40, "sex":"female", "id":2547},
   ...:      {"name":"Reg", "age":47, "sex":"male", "id":2548}]

In [6]: result = []

In [7]: for key in obj:
   ...:     my_dict={}
   ...:     my_dict['id']=key.get('id')
   ...:     my_dict['name']=key.get('name')
   ...:     result.append(my_dict)
   ...:

In [8]: result
Out[8]:
[{'id': 2545, 'name': 'Era'},
 {'id': 2546, 'name': 'Patrick'},
 {'id': 2547, 'name': 'Elina'},
 {'id': 2548, 'name': 'Reg'}]

答案 2 :(得分:0)

如果您想要带有键值的简单字典,请尝试以下方法:

obj= [{"name":"Era", "age":45, "sex":"female", "id":2545},  
     {"name":"Patrick", "age":35, "sex":"male", "id":2546},  
     {"name":"Elina", "age":40, "sex":"female", "id":2547},  
     {"name":"Reg", "age":47, "sex":"male", "id":2548}] 

my_dict = {}
for i in obj:
    my_dict[i.get('id')] = i.get('name')

print(my_dict)

# Output: {2545: 'Era', 2546: 'Patrick', 2547: 'Elina', 2548: 'Reg'}