Spark-将JSON数组对象转换为串联字符串

时间:2019-12-17 12:45:49

标签: python json string dataframe pyspark

如何使pyspark DF成为可能?

来自这样的输入json:

  {  "obj":[ 
          { 
             "a":"val1",
             "b":"val1"
          },
          { 
             "a":"val2",
             "b":"val2"
          }
          ]
 }

到这样的数据框:

+---+---+----+----------+----+

|     a    |     b    |

+---+---+----+----------+----+

|val1, val2|val1, val2|

+---+---+----+----------+----+

1 个答案:

答案 0 :(得分:-1)

假设您的JSON文件的内容已解析为Python字典,并且假设只有一个“ obj”键,则可以轻松地将数据结构转换为标准2D列表,然后可以将其转换为您喜欢的任何数据框格式:

json = {"obj":[{"a":"val1","b":"val1"},{"a":"val2","b":"val2"}]}

dic = {}
for row in json['obj']:
  for key,val in row.items():
    if key in dic:
      dic[key].append(val)
    else:
      dic[key] = [val]

table = list(dic.items())

# result:
# [('a', ['val1', 'val2']), ('b', ['val1', 'val2'])]