如何使pyspark DF成为可能?
来自这样的输入json:
{ "obj":[
{
"a":"val1",
"b":"val1"
},
{
"a":"val2",
"b":"val2"
}
]
}
到这样的数据框:
+---+---+----+----------+----+
| a | b |
+---+---+----+----------+----+
|val1, val2|val1, val2|
+---+---+----+----------+----+
答案 0 :(得分:-1)
假设您的JSON文件的内容已解析为Python字典,并且假设只有一个“ obj”键,则可以轻松地将数据结构转换为标准2D列表,然后可以将其转换为您喜欢的任何数据框格式:
json = {"obj":[{"a":"val1","b":"val1"},{"a":"val2","b":"val2"}]}
dic = {}
for row in json['obj']:
for key,val in row.items():
if key in dic:
dic[key].append(val)
else:
dic[key] = [val]
table = list(dic.items())
# result:
# [('a', ['val1', 'val2']), ('b', ['val1', 'val2'])]